Can gzip be a language model?

(nathan.rs)

7 points | by nathan-barry 14 hours ago ago

3 comments

  • eventualcomp an hour ago ago

    Reminds me of this youtube video: https://m.youtube.com/watch?v=jkdWzvMOPuo

    I liked the comments explaining why this worked.

  • nathan-barry 14 hours ago ago

    LLMs are very good at lossless compression via arithmetic coding. But I didn't know that it was possible to go the reverse direction (do language modeling via a compressor). It's not super great quality, but I'm surprised it worked! Other compression algorithms (like PPMd) use variable n-grams under the hood, and should be much better (although less interesting due to already containing basic language models internally).

  • chinallm_ai 13 hours ago ago

    [flagged]