19 comments

  • irq-1 a day ago ago

    Seems like this only helps large (heavy) websites with consistent content. The real world examples are all large, like YouTube, Amazon, etc...

    Small JSON responses that compress to <1k would fit in a single packet, so I don't see the advantage of going from "65 bytes with normal Zstandard compression, vs 28 bytes when using the past response as a dictionary - 57% smaller."

    • gnotstic a day ago ago

      yes i was under the same impression, but i think this LUT/dictionary solution is counter intuitive to both of our current understandings of the web.

      The "aha" moment for me was that, without this dict, the user is going to always request a full download of the data. For instance, let's say the NYT published an article and you read it. Then an editors note is added to the article. When you go back to read the article, the data transfer is miniscule. Now that is an edge case, but imagine a website that allows comments.. twitter.. reddit.. small text based pages that at first seem incosequential until you think about how we use the web, millions of users, returning to pages over and over again.

      For me, my mental model of this structure is a LUT(key/value pair) wrapped in a Version Control(hash).

      Now i think your comment is correct if we were to add how many requests the webpage is recieving and how frequently changes are happening to said webpage. My blog would recieve no benefits from implementing this tech, and using napkin math, my blog would need 1000 days to break even. Microsofts' blog however... less than a day, in theory.

      • everforward a day ago ago

        If the version control hash changes you have to re-download the dictionary, which is similar to redownloading the whole page.

        Reddit/NYT would have to publish their changes without changing the dictionary, meaning some portions would be largely absent from the dictionary and have worse compression than gzip. Probably fine for NYT, something like Reddit might actually have worse ratios than gzip in that case.

        • superb_dev a day ago ago

          Or you could use the previous version to generate the dictionary for the current version?

          I would assume chunks that didn’t benefit from the dictionary would receive the standard compression, so you can’t get worse than gzip.

          • everforward 16 hours ago ago

            Maybe? That gets sort of awkward for frequently updated things like Reddit where there might be 10 dictionary versions between what you have and the current version. You’d need something that decides whether to get an incremental update or a new dictionary, and the hoster has to store those old dictionaries. Feels like more trouble than it’s worth.

            You could compress things with gzip if the dictionary doesn’t work well, but to my understanding gzip compresses repetition. There’s less repetition in smaller chunks, so worse compression ratios. Eg compressing each comment individually has a worse net ratio than compressing all the comments at once.

            It would also be annoying to merge a bunch of individually compressed blocks back together, but certainly an option

            • superb_dev 12 hours ago ago

              I’m pretty sure the dictionary just gets put on the front of the compression algorithm’s “context” so that it can be referenced just like any other part of the document. You wouldn’t need individual blocks with different compression schemes, it would all get compressed together.

    • cyanydeez a day ago ago

      The toy examples aren't the savings. Do the calculations with a json list of 100 objects and you'll find that compress increase more significantly.

      So yeah, one time object return isn't impressive. Once those objects are in an array, then there's a much more remarkable compression.

      While reading, I started wondering if we'll see an LLM constructor that'll take a API and some actual browser use and create a model that maximizes these types of message-centric compression.

  • jason_s 17 hours ago ago

    Is there any similar ecosystem hook for a zip-like archive? It would be great to have something like .zip file containers for zstd/brotli which can contain a small number of dictionaries and then the decompression utility automatically uses them. For example, suppose you have a lot of .js / .css / .html files. Or Python files. Or whatever. It would be more efficient than individual .zstd files.

  • ghssds a day ago ago
  • bob1029 a day ago ago

    I would be thinking a lot more about JSON APIs than HTML content when considering the potential upside of this.

  • hulitu 14 hours ago ago

    > Dictionary Compression is finally here, and it's ridiculously good

    cough winrar cough

    • pseudohadamard 2 hours ago ago

      It goes back a lot further than that, to at least the early 1970s when it was known as "ad hoc compression". Then once the Ziv-Lempel family appeared it was a standard aspect of the LZ algorithms. So the title should really be "We hacked pre-primed dictionaries into xyz and in the situations where this sort of thing performs really well, it performs really well".

  • AtlasBarfed 15 hours ago ago

    Poison the dictionary, inject the code.

  • dmitrygr a day ago ago

    So this is just LZ with a pre-populated window? Any backreferenceing compression can be used this way - just prepopulate the backreference history on both client and server up front and off you go. Why is this new?

    • jason_s 17 hours ago ago

      Author should have put RFC9842 in the headline. (https://www.rfc-editor.org/rfc/rfc9842)

    • setr a day ago ago

      Per the article, it’s new to browsers, not compression generally, due to the lack of standardization. the future is already here, just not evenly distributed.

  • a_void_sky 2 days ago ago

    [flagged]