Full Reverse Engineering of the TI-84 Plus Operating System

(siraben.github.io)

146 points | by siraben a day ago ago

26 comments

  • asveikau a day ago ago

    > TI-BASIC programs are stored as tokens, not text: every command, function, and variable is a token of 1 or 2 bytes. The OS detokenizes (token→display string) to show a program and tokenizes (keypress/text→token) on entry; the parser walks tokens to execute.

    From my memory of using a TI-83 in the late 90s, I would not be surprised if the keypad UI injects tokens directly based on your keypress, rather than "tokenizing the text". I seem to recall, for example, you could not position the cursor in the middle of a BASIC token, and if you managed to type out the tokens it would not work; you needed to find the right menu item to inject the correct token.

    • duskwuff a day ago ago

      I can confirm that. On the TI-83, many of the TI-BASIC tokens contained lowercase characters which couldn't be typed at all - you could only type uppercase letters on the keyboard. (There were a few lowercase letters available as tokens for special purposes, but it wasn't a full set.)

      Interestingly, you could print tokens in strings - e.g. you could Disp "Disp ".

      • suburban_strike a day ago ago

        The 83+ let you type the full set of lowerchase chars as well, but they used 2x as many bytes per character for storage.

      • 7jjjjjjj a day ago ago

        There's actually a hidden lowercase feature, you can use an assembly program to enable it.

    • jamesfinlayson a day ago ago

      Ah makes sense. I remember a younger me trying to open .8xp files back in the day and seeing gibberish, and eventually finding the TI IDE which... felt like it had been written a long time ago (the file select dialog capped the display of file names at 8.3 I think and used ~1 and ~2 etc as "the rest of the file name").

    • siraben a day ago ago

      Yes, to type a TI-BASIC program you have to go through the calculator menus which directly insert the tokenized input into the buffer.

      The weird thing about TI-BASIC is how seemingly innocent changes in the input can cause huge performance regressions e.g. https://siraben.github.io/ti84p-re/sub-tibasic-for-paren.htm...

        For(I,1,N
        If 0
        1
        End
      
      is much slower than

        For(I,1,N)
        If 0
        1
        End
      • asveikau a day ago ago

        The open paren being part of the tokens was always weird. I could imagine that doing strange things for the parser; when it sees a close paren it needs to know that several of the preceding tokens may have an open paren even without having a '(' token.

  • analogpixel a day ago ago

    I couldn't tell, is a person doing this? or was this an LLM dissecting it?

    • siraben a day ago ago

      This was made collaboratively by me directing coding agents at the binary, using Ghidra MCP extensively, disassembly and also dynamic analysis with an emulator. I don't have a writeup of the process but it was definitely not fully automatable (I wish though). I might prepare a blog post with transcripts and session history and things I learned along the way.

      Broad takeaways:

      - Ghidra MCP is not a silver bullet. Lots of opportunities for mis-decoding especially on older instruction sets (e.g. conflating code + data), which requires user input to flag data layout/structs.

      - Agents still need a lot of user direction otherwise the RE production is just kind of a random walk. With Z80 it's decent at reading code but I expect that it has much worse performance than reading x86 or ARM for instance. The TI-84+ has a bunch of hardware quirks as well.

      - GPT 5.5 is better than Opus 4.8 at RE. Opus 4.8 loves plausible-sounding RE'd logic without any checking. The gold standard is actually dynamically executing the binary and comparing the logic against the prose.

      - Maintaining consistency in style and prose is a PITA across the wiki. Hard to reconcile prose <-> code. Can be somewhat mitigated by agent loops.

      Was also in discussions with people in the TI calculator programming space who helped provide guidance as well. We previously did not have a catalogue of every subsystem in TI-OS yet alone most subroutines in the OS.

      • RgrTheShrubbr a day ago ago

        Having just recently heard about Ghidra and started using it with Claude. I am absolutely blown away how little resistance it has decompiling old Win95/98 binaries. It's turning into a bit of a hobby of mine to take old software, decompile and find hidden treasures like images or messages.

        • Chu4eeno 10 hours ago ago

          There's this unfortunate common misconception (that LLMs luckily don't tend to share) that reverse engineering is illegal or immoral, when it's both a great source of learning, a necessity for things like interop/preservation, and even has explicit carve-outs in the copyright laws of many/sane countries.

          I know my government has a good amount of reverse engineers on the payroll (mostly in the security services).

      • hedgehog a day ago ago

        Do you have plans to generate a buildable version of the sources, and do you know the original implementation language (C?).

        • siraben a day ago ago

          It's highly likely that the original implementation language was assembly. The code is very idiomatic.

          Regarding source build, I think reverse engineering it to the point where you can reconstruct the source is possibly legally problematic, so I don't plan to do this, but maybe for certain subsystems like MathPrint (equation display) which was especially fun to RE. I have a PR up for it and it will be live at

          https://siraben.github.io/ti84p-re/mathprint

          • ndiddy a day ago ago

            Typically the approach taken by people who are concerned about legal issues regarding disassemblies is that they distribute a script file that contains all the code/data annotations, comments, variable names, and labels, and then the user can feed this file and a copy of the original binary into the disassembler to reproduce the disassembly. Here's a random example for a 6502 codebase: https://github.com/TakuikaNinja/FDS-disksys . IDA Pro has this functionality built in, you can export a .idc script file that will reproduce the .idb file if you load the original binary into a fresh instance of IDA Pro and then run the script. Maybe Ghidra has something similar, if not I bet you can get your AI to write export/import scripts for Ghidra.

          • jamesfinlayson a day ago ago

            > It's highly likely that the original implementation language was assembly.

            Agreed. I did a bit of development on a TI-84+ years ago and I was not a skilled programmer back then so only used TI-BASIC, but the fact you could only write apps in assembly makes me think the operating system was the same. ticalc.org had a gcc fork from memory though I don't recall which calculators it targetted.

      • analogpixel a day ago ago

        how much have you spent so far on this (for tokens)?

        • siraben a day ago ago

          The plans are heavily subsidized by the AI companies so I didn't end up needing to do API usage or buy another subscription. I have ChatGPT Pro and Claude Code Max.

      • a day ago ago
        [deleted]
    • xkcd-sucks a day ago ago

      > Confidence is flagged: .....

      > The big picture

      > The structural reverse-engineering is comprehensive (every subsystem mapped, both cross-page mechanisms resolved ...

      > Confidence summary / open items

      Probably an LLM wrote the docs.

      > (the GhidraMCP plugin reconnects for interactive work)

      Probably LLM+Ghidra for the actual RevEng. Ultimately does it matter if the end product is works though

      • markus_zhang a day ago ago

        I think it’s fine as long as it works. Personally I prefer doing everything manually because that’s where the fun is, but everyone has their own fun.

    • a day ago ago
      [deleted]
  • tadfisher a day ago ago

    I love that this project produced so much info, and also I'm disappointed with the prose. You probably didn't mean to explain the typographic nuances of em vs. en-dashes to the reader: https://siraben.github.io/ti84p-re/conventions.html#typograp...

    • siraben a day ago ago

      Thanks for the feedback, fixing.

  • thwgrw a day ago ago

    I am sure you did a lot of hardwork here. But with all the LLM smell in the text, my mind zoned out after few lines. I'd rather read a flawed but human written text than a perfect one written or co-written with an LLM.