SolidStart - Hacker News

notamy 9 months ago ago

I know the article says "analytics workloads," but I wonder if this pattern could be used for ex. bursty-write workloads.

For context, I have a use-case where I have a Meilisearch server that has all the data in it updated once daily, and then every single request outside of that one bulk update is a read. The only exception is that once in a great while, something breaks and the entire index has to be rebuilt. Since it's a hobby project, my workload is too expensive to fit into Meilisearch Cloud or similar solutions. I keep finding myself wishing that there was a serverless search engine that would free me from having to self-host one, keep a dedicated server running and updated, etc. If I could "just" run a search engine with Cloudflare Workers + R2 (+ probably ancillary CF services) I'd be very happy.

I don't know enough to actually answer my own question unfortunately; I can't intuit the answer just from reading the docs and database implementations aren't my area of expertise.

[-]

eatonphil 9 months ago ago

> I know the article says "analytics workloads," but I wonder if this pattern could be used for ex. bursty-write workloads.

The key point is that Delta Lake and Iceberg concurrency control is so coarse-grained. As I mentioned in the article, they did this because it seemed they wanted to avoid an external highly-available metadata store (e.g. FoundationDB which is what Snowflake uses).

Just about every Database as a Service (OLTP or OLAP) these days supports storage tiering to cloud blob storage. Delta Lake and Iceberg just took cloud blob storage to the extreme by trying to do everything in it (or as much as possible, Iceberg seems to require a Compare-and-Swap operation which I'm not sure is possible to express with atomic PutIfAbsent alone; seems to require an external HA metadata store for their one current metadata pointer). And in doing this they gave up fine-grained concurrency control.

maxmcd 9 months ago ago

I have also been thinking about this:

https://slatedb.io/ is new and KV on object storage. Maybe helpful.

There is lots of similar work over ipfs, I have not had the chance to experiment yet, but I'd like to see what a workflow is like with (eg): summa: https://github.com/izihawa/summa

I tried to implement trigram search on object storage using Zoekt: https://github.com/sourcegraph/zoekt, but I found that common queries load 30-40% of the entire index, so setting it up without some kind of caching strategy felt a little ridiculous.

bigiain 9 months ago ago

Back in the day (and I'm talking about when memcached was still written in Perl here, so late 90s or so) "bursty-write workloads" were a really good fit for us - using SQL queries as keys and storing the responses as values. For every read you first do a (super fast) memcached query using the SQL you're about to run as the key. If it exists, you just send the value back without running the query, if it doesn't exist you run the query and store the response as the value before returning it.

It depends a lot on how big your dataset is, and the distribution of reads across your data. But if you have a "fat head long tail" type distribution it can work really well.

(Of course cache invalidation is always a hard problem, which we worked around by trying to ensure all our code that did database writes knew how to delete relevant memcached entries, force expiring the whole cache at our lowest traffic times, and having a big red panic button that would delete the entire cache if needed.)

[-]

chipdart 9 months ago ago

> For every read you first do a (super fast) memcached query using the SQL you're about to run as the key. If it exists, you just send the value back without running the query, if it doesn't exist you run the query and store the response as the value before returning it.

Wouldn't it have been far simpler to just move the db behind a service that provides a cacheable request-response interface and just adopt the simplest of caching strategies? In the case of HTTP, you can even get local cache for free. No need to throw specialized memory cache services around.

[-]

bigiain 9 months ago ago

Like I said, I was doing that with memcached about 25-30 years back. Nobody had yet built "a service that provides a cacheable request-response interface" to move the db behind.

Just last week I helped one of our devs through "caching" API responses for a "read heavy" use case by saving the responses to S3 and having the web app there instead of hitting the API endpoint and database.

There are a bunch of ways to skin this cat in 2024.

(One brilliant idea I've got bookmarked but haven't (yet) had an opportunity toi use, is a modified SQLite that uses S3 storage and range requests and is compiles to WASM - so a web app can run SQL queries locally and all the usual SQLite stuff "just works" from the warm/javascript's point of view. It _does_ need to be a very read-heavy use case though, because updating even a single row needs to upload the entire SQLite db file into S3.)

[-]

nsenifty 9 months ago ago

> so a web app can run SQL queries locally and all the usual SQLite stuff "just works" from the warm/javascript's point of view

Duckdb-wasm can do this.

[-]

bigiain 9 months ago ago

Does it do the S3 storage and http range requests to seek to the needed byte offsets thing?

I think one day I'm gonna have a thing where I need a few GB or tens of GB of database to be available in a web or mobile app, but where that db rarely changes, and I want to do this without having to support/secure a backend platform or database.

[-]

orf 9 months ago ago

Parquet.

billywhizz 9 months ago ago

i've been playing around with this idea on and off for a while. this was the original demo/implementation afaik: https://phiresky.github.io/blog/2021/hosting-sqlite-database....

this has all sorts of nice properties if it suits your needs, especially if you have lots of small read-only databases that just receive updates occasionally.

there's a little demo i did here: https://billywhizz.io/squeel-demo/

danielheath 9 months ago ago

My approach to this class of issue has been to make the reindex be "Creates a new search server, build the index, then switch the application to the new one and turn off the old one".

At the cost of doubling storage/IO requirements for search, this works around virtually every kind of outage-due-to-index-corruption (don't switch to the new one if it fails healthchecks).

[-]

9 months ago ago

[deleted]

convolvatron 9 months ago ago

I wish we generally used solutions like this instead of always reaching for the 200lb database hammer.

however, I think the article would benefit from distinguishing this approach from a 'real' database. that is MVCC gets you a lot more concurrency because it spreads out the exclusionary state in both time and space.

the optimistic concurrency used in the article works well at low contention and then becomes arbitrarily latent as more and more proposers try to jump into same RTT gap with both their initial attempt and their retries.

[-]

eatonphil 9 months ago ago

> however, I think the article would benefit from distinguishing this approach from a 'real' database. that is MVCC gets you a lot more concurrency because it spreads out the exclusionary state in both time and space.

I thought I already was going to be seen as being unfairly critical of the course-ness of the concurrency control.

[-]

convolvatron 9 months ago ago

in any case thanks for writing it up. I was just thinking about doing this in a broadcast domain to provide the same kind of low-throughput bootstrapping consistency that people would normally use zookeeper or etcd for. its certainly very useful in the cases where it fits.

benreesman 9 months ago ago

For folks interested in this topic I highly recommend this talk by Cliff Click about building highly scalable hash tables via Compare And Swap:

https://youtu.be/HJ-719EGIts

Andys 9 months ago ago

Worth noting S3 recently added conditional writes: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3...

jakozaur 9 months ago ago

Iceberg was acquired for $1Bln to create a similar standard called Iceberg.

It's worth highlighting the pros/cons of this approach.

Pros:

+ straightforward, no running metadata server needed, scales to zero compute

+ zero-copy clones, time travel/versioning and sharable. Git style, but for your data

+ can work with many vendors without the need to copy data

Cons:

- single optimistic thread lock. This does not work if you have many clients doing many small writes concurrently. You need to batch write, ideally in an append-mainly way.

- quite low-level. You still need someone doing compaction, access control, etc.

- you miss some database features (e.g. skip index based on bloomfilter), some other such as materialized views based on data will be hard to implement if you directly write to blob storage

The data lake is amazing for cold data; for fast-changing recent data (e.g. last 1 day), it is better to use different data storage.

yencabulator 9 months ago ago

The filesystem supposedly-atomic file creation is buggy. The putIfAbsent implementation will leaves partial commit files visible to others (forever on errors), making other users see corrupt transactions. For filesystems, the correct answer is to write to a tempfile, sync that, then use link(2) to claim a transaction id.

[-]

hagen1778 9 months ago ago

We use the same approach in time series database I'm working on. While file creation and fsync aren't atomic, rename [1] syscall is. So we create a temporary file, write the data, call fsync and if all is good - rename it atomically to be visible for other users. I had a talk about this [2] a few month ago.

[1] https://man7.org/linux/man-pages/man2/rename.2.html

[2] https://www.youtube.com/watch?v=1gkfmzTdPPI

[-]

yencabulator 9 months ago ago

You can't atomically allocate a unique identifier (e.g. next step of a counter) with rename, it'll overwrite. That's what the link(2) is for.

eatonphil 9 months ago ago

That's a good point but I think the system will still be correct in this scenario just unavailable. If a transaction ever fails to fully write a transaction metadata file all future transactions will crash while reading it. You can recover by deleting the faulty and final transaction file. Link is a better choice though you're right. I didn't remember it will fail if the new file already exists too. I'll rewrite my code. Thanks!

[-]

yencabulator 9 months ago ago

Only by virtue of the code seeming to store everything in JSON. If you used a format where binary data could be stored as-is, you could end up with a zeroed sector in the middle of a large value. If you used a format that allowed zero bytes, you could end up with a structure that could be confused for something legal (e.g. list of TLV with many type=0 len=0 objects ignored).

[-]

eatonphil 9 months ago ago

> Only by virtue of the code seeming to store everything in JSON

Yep.

kthejoker2 9 months ago ago

Just pointing out Delta Lake and Iceberg are based on columnar Parquet format and are designed for OLAP / data warehousing workloads.

So managing write concurrency is definitely a secondary priority since it's not the norm to have multiple writers.

9 months ago ago

[deleted]

rohan_ 9 months ago ago

Can anyone explain the rise in DBs backed by object storage?

[-]

ericb 9 months ago ago

If you're working at a scale where you accumulate a ton of data quickly and don't know what questions you want to ask, it gives you nigh-infinite scalability without nigh-infinite maintenance, so you can ask questions later.

The other use-case is akin to giving a really big house to a hoarder.

chipdart 9 months ago ago

> Can anyone explain the rise in DBs backed by object storage?

I dare say it's a "when the only tool you have is a hammer" problem.

But it might also be the fact that object storage services are reliable, relatively cheap by cloud standards, and do not have a upper bound on resource availability.

devnull3 9 months ago ago

I think the main attraction of object storage like S3 is that it already solves the hard problems (distributed-ness, scalability, durability, etc) with a simple(r) API.

Also, apis like S3 are defacto standard with many vendors supporting a descent subset of it.

cedws 9 months ago ago

One other reason that hasn’t been mentioned is that it eliminates inter-AZ network fees. WarpStream has some numbers on their site that show huge savings by using S3 as the backbone.

tessierashpool9 9 months ago ago

databricks hate this trick

Build a serverless ACID database with this one neat trick (atomic PutIfAbsent)