SolidStart - Hacker News

DenisM 2 days ago ago

I still don’t see how this is different from iceberg? You don’t need a catalog to use it, atomic replace of metadata.json plus deletion vectors seems to be exactly the same thing.

codedokode 2 days ago ago

S3 is a HTTP API, does it mean that this database would be very slow? Especially if they use immutability and create copies of large files?

[-]

shayonj 2 days ago ago

Yeah, it mentions in few areas - compared to OLTP or similar workloads, this will definitely be slow

ohnoesjmr 2 days ago ago

The sequence diagram seems to have a mistake, the second writer somehow seems to know to create v124, only having observed v122.

[-]

deepsun 2 days ago ago

Fun fact -- try to search for "124" there.

For some reason they thought hard-positioned top-to-bottom SVG is somehow better than adding "white-space: pre" once in CSS ¯\_(ツ)_/¯

[-]

shayonj 2 days ago ago

Thanks! looks like i messed up some css on my last frontend refresh a bit.

[-]

deepsun 2 days ago ago

Wow fast, now it's much better!

akdor1154 2 days ago ago

I know Iceberg has this same issue, but you state deletion in this way (recording tombstones) is sufficient for GDPR compliance - but is it really? The 'deleted' data is still trivially readable.

[-]

hodgesrm 19 hours ago ago

It's OK provided there's a garbage collection procedure. But the write-up seems to regard this as optional.

> Deletes accumulate in tombstone files over time. Eventually we would want to coalesce 100 small tombstone files into one and /or rewrite data files if a row group has >50% rows deleted, resulting in further compaction.

The bigger problem for me is that tombstones that remove rows can make reads quite inefficient because they reduce the usefulness of min-max and bloom filter indexes. It can also affect vectorized query if you have to apply predicates within row groups. Finally there are degenerate cases where the tombstones would be bigger than the compressed columns themselves.

Any assertion that this would be performant needs to be backed up by code. ClickHouse took many years to implement so-called lightweight deletes. It's a hard problem to solve in a performant way.

jerrysievert 2 days ago ago

given that it’s parquet, deletes are nice, but what about inserts?

An MVCC-like columnar table on S3 with constant-time deletes