Binary Formats Gallery

(formats.kaitai.io)

101 points | by vitalnodo a day ago ago

24 comments

  • redsparrow 16 hours ago ago

    I had a great experience using Kaitai in a previous job. We were decoding proprietary binary messages from Teltonika OBD GPS trackers. The online editor, https://ide.kaitai.io/, is really nice for developing and testing your definition. You can store multiple binary files in local-storage and you get a nice detailed look at the data and how your definition is parsing it.

  • dtagames a day ago ago

    Interesting. I didn't know anyone had come up with a declarative language for binary files.

  • foobarbecue a day ago ago

    Kaitai was awesome for reverse-engineering the Soloshot session format https://github.com/foobarbecue/soloshot-session-to-gpx-conve...

  • hombre_fatal 15 hours ago ago

    Kaitai is cool but it seems like kind of a waste since you can't roundtrip the data back into binary.

  • pmarreck a day ago ago

    Is this able to represent any binary format? How do things like relative offsets work and such? (basically any non-rigid parts of the format)

    • rpearl a day ago ago
    • frizlab a day ago ago

      It can represent an UTF-8 string, so it can probably represent anything.

      • jcranmer a day ago ago

        As binary formats go, UTF-8 is extremely tame. Some of the complexities that binary formats love to throw at you:

        * Things may be non-byte-aligned bitstreams.

        * Arrays of structures that go "read until id is 5, but if id is 5, nothing else of the structure is emitted."

        * Fields that may be optional if some parent of the current record has some weird value.

        * Files may be composed of records at arbitrary, random offsets that essentially require seeking to make any sense of it.

        * The metadata of your structure may depend on some early parameter (for example, is this field big-endian or little-endian?)

        and so on.

        File formats like ELF (supporting ELF32, ELF64, and both little-endian and big-endian, all in a single format definition) or Java class files (long and double entries in the constant pool take up two slots, not one) are a better guideline for how powerful the format is in handling weirder idiosyncracies.

      • mananaysiempre a day ago ago

        UTF-8 is a regular language (as a subset of all octet strings), so that doesn’t feel like much of a benchmark? Something like TIFF or PECOFF would seem to be a more reasonable standard. (PDF is probably too much to ask, seeing as understanding the structure requires a full Deflate decoder among other things.)