Memory That Collaborates

(datahike.io)

41 points | by whilo 3 days ago ago

6 comments

  • nine_k 17 hours ago ago

    Immutability indeed gives a lot of features to a data store, because nothing ever can go invalid. Doing anything in parallel is trivial. Caching anything is totally safe. Distributed queries can be certain that the other side did not change during a distributed transaction.

    The cost of it, AFAICT, is either unbounded growth, or garbage collection, if deletion is even supported.

    There are use case for a DB like that, but, unfortunately, it cannot replace most OLTP use cases.

    • layer8 2 hours ago ago

      It’s also inapplicable for anything with sensitive data that has legally limited retention periods or the “right to erasure”.

      • nine_k an hour ago ago

        Why, no, deletion is possible in immutable structures, as long as nothing else references the deleted nodes. This is literally how lists / trees / any complex structures work in Haskell (and apparently in Clojure): a mutation gives you a new immutable structure, but the old immutable structure (or parts thereof) can be forgotten and disposed of.

        • layer8 14 minutes ago ago

          It’s very common, however, that data will reference the data to be deleted, or be derived from it in a way that will become invalid when it’s deleted. Parallel processing becomes less straightforward because you have to make sure that all parallel processes see a consistent state of the deletion. Depending on the nature of the data to be deleted, you may actually have potential mutation, if there are keys that are sensitive data.

  • edinetdb 14 hours ago ago

    The immutability argument resonates strongly for regulatory filings. I work with XBRL data from Japanese government disclosures, where historical documents are frozen the moment they're accepted by the regulator—the 2019 annual report for a company never changes.

    The model that's worked: append-only ingestion with a submitted_at field, treating corrections as new document submissions (which is actually how the regulator models them too—amended filings get new document IDs). Downstream consumers query as-of a timestamp, which makes the cross-team pattern trivial: everyone reads from the same immutable ledger and gets a consistent snapshot.

    The hard case is when a company resubmits corrected data for prior periods. We model this as new records with amendment references rather than in-place updates, preserving the immutability guarantee but requiring consumers to be aware of the supersession chain.

    Curious how this scales to domains with higher correction rates than regulatory filings—for us, amendments are maybe 2-3% of total submissions, which is manageable. What's the correction rate where the immutable model starts to break down practically?

  • readthenotes1 16 hours ago ago

    "This is an idea Rich Hickey introduced with Datomic in 2012: "

    I am pretty sure the difference between online transaction processing and online analysis processing goes back a bit further than 2012.