Claude is doing the decompilation here, right? Has this been compared against using a traditional decompiler with Claude in the loop to improve decompilation and ensure matched results? I would think that Claude’s training data would include a lot more pseudo-C <-> C knowledge than MIPS assembler from GCC 2.7 and C pairs, and even if the traditional decompiler was kind of bad at N64 it would be more efficient to fix bad decompiler C than assembler.
It's wild to me that they wouldn't try this first. Feeding the asm directly into the model seems like intentionally ignoring a huge amount of work that has gone in traditional decompilation. What LLMs excel at (names, context, searching in high-dimensional space, making shit up) is very different from, e.g. coming up with an actual AST with infix expressions that represents asm code.
"Claude struggles with large functions and more or less gives up immediately on those exceeding 1,000 instructions." Well, yeah, that's the thing, an n64 game, that's C targetting an architecture where compiler optimizations are typically lacking, the idomatic style is lots of small tightly-scoped functions and the system architecture itself is a lot simpler than say a modern amd64 pc... These days I often just feel like, why is this person telling me how easy my job is now when they seemingly don't know much about it. I just find it arrogant and insulting... Perpetually demo season.
IMO this is one of the best use cases for AI today. Each function is like a separate mini problem with an explicit, easy-to-verify solution, and the goal is (essentially) to output text that resembles what humans write -- specifically, C code, which the models have obviously seen a lot of. And no one is harmed by this use of AI; no one's job is being taken. It's just automating an enormous amount of grunt work that was previously impossible to automate.
I'm part of the effort to decompile Super Smash Bros. Melee, and a fellow contributor recently wrote about how we're doing agent-based decompilation: https://stephenjayakar.com/posts/magic-decomp/
Claude is doing the decompilation here, right? Has this been compared against using a traditional decompiler with Claude in the loop to improve decompilation and ensure matched results? I would think that Claude’s training data would include a lot more pseudo-C <-> C knowledge than MIPS assembler from GCC 2.7 and C pairs, and even if the traditional decompiler was kind of bad at N64 it would be more efficient to fix bad decompiler C than assembler.
It's wild to me that they wouldn't try this first. Feeding the asm directly into the model seems like intentionally ignoring a huge amount of work that has gone in traditional decompilation. What LLMs excel at (names, context, searching in high-dimensional space, making shit up) is very different from, e.g. coming up with an actual AST with infix expressions that represents asm code.
"Claude struggles with large functions and more or less gives up immediately on those exceeding 1,000 instructions." Well, yeah, that's the thing, an n64 game, that's C targetting an architecture where compiler optimizations are typically lacking, the idomatic style is lots of small tightly-scoped functions and the system architecture itself is a lot simpler than say a modern amd64 pc... These days I often just feel like, why is this person telling me how easy my job is now when they seemingly don't know much about it. I just find it arrogant and insulting... Perpetually demo season.
I'm really excited about this, especially for games for which the source code was lost like Red Alert 2.
Does this technique limit the LLM to correctness-preserving transforms?
Like all things related to LLMs, semantic correctness is left as an exercise for the reader.
IMO this is one of the best use cases for AI today. Each function is like a separate mini problem with an explicit, easy-to-verify solution, and the goal is (essentially) to output text that resembles what humans write -- specifically, C code, which the models have obviously seen a lot of. And no one is harmed by this use of AI; no one's job is being taken. It's just automating an enormous amount of grunt work that was previously impossible to automate.
I'm part of the effort to decompile Super Smash Bros. Melee, and a fellow contributor recently wrote about how we're doing agent-based decompilation: https://stephenjayakar.com/posts/magic-decomp/
> And no one is harmed by this use of AI; no one's job is being taken
what about: see cool app, decompile it, launch competing app.
(repeat)