Capstone Disassembler Framework

(github.com)

103 points | by xvilka 5 days ago ago

28 comments

  • woodruffw 5 days ago ago

    Capstone supports an impressive breadth of architectures. However, if all you need is x86/AMD64 decoding and disassembly, there are much higher quality (in terms of accurate decoding) libraries out there.

    I wrote a differential fuzzer for x86 decoders a few years ago, and XED and Zydis generally performed far better (in terms of accuracy) than Capstone[1]. And on the Rust side, yaxpeax and iced-x86 perform very admirably.

    [1]: https://blog.trailofbits.com/2019/10/31/destroying-x86_64-in...

    • monads 4 days ago ago

      In my previous job, I've worked on a project that requires disassembling large amounts of x86/amd64 instructions (several billion instructions each running is very common). I've found also that Zydis is much faster than Capstone.

    • meisel 4 days ago ago

      How is there any discrepancy in accuracy? Isn’t it just a matter of following the spec?

      • woodruffw 4 days ago ago

        The spec is very large, not particularly well written, and is not “total” (in the sense that AMD64 and IA32e and other x86-64 flavors are all subtly different). There are a lot of ways to get it wrong; even XED (the reference decoder from Intel) has bugs.

        If I remember correct, the Intel SDM alone is over 3000 pages long.

      • saagarjha 4 days ago ago

        lol, no. For one Capstone has a lot of bugs (it uses some old version of LLVM as its base) but the whole question of how to decode things is complicated because there are a lot of pitfalls and inconsistencies that different disassemblers handle differently. And what the hardware does is a different question entirely: it may not match the spec, or even other processors with the same ISA.

    • canucker2016 5 days ago ago

      Did you mean x86/x64 decoding?

      Looking at the libs, none of them seem to mention ARM64 inst. decoding.

      • woodruffw 5 days ago ago

        Yep, I meant AMD64, fixed.

  • jstrieb 5 days ago ago

    Capstone is very useful!

    Someone (not me) has also cross-compiled Capstone to WebAssembly so it can be used in client-side browser applications.

    https://alexaltea.github.io/capstone.js/

    I've used this in a couple of projects to support disassembly in static web apps with no back end.

  • __alexander 4 days ago ago

    If you find Capstone interesting, check out the Unicorn Engine.

    https://github.com/unicorn-engine/unicorn

    Also, if anyone is interested in an example of using capstone for basic disassembly and analysis, here is a link to my capstool project.

    https://github.com/alexander-hanel/capstool

    • emmanueloga_ 4 days ago ago

      Right, three related multi-platform and multi-architecture frameworks from the same people:

      * Capstone: disassembly.

      * Keystone: assembler.

      * Unicorn: CPU emulator.

    • the_biot 4 days ago ago

      Unicorn is fantastic. I used it to emulate an SoC's boot environment to get around a very weird HAL, and it worked perfectly. Awesome tool!

  • smolsky 5 days ago ago

    It's difficult to find a succinct overview. Here is a slide deck buried among links: http://www.capstone-engine.org/BHUSA2014-capstone.pdf

    • tptacek 5 days ago ago

      Capstone is sort of an "industry standard" open source multi-architectural disassembler library, especially for security tooling.

      This is a useful page to get a sense of what it's about (ie, what you're getting out of it vs. something more like objdump):

      https://www.capstone-engine.org/beyond_llvm.html

  • post-factum 5 days ago ago

    It is also used in one of the Linux kernel debuggers: https://codeberg.org/pf-kernel/crush

  • nicolodev 5 days ago ago

    Another good replacement for capstone/keystone based on LLVM is nyxstone https://github.com/emproof-com/nyxstone

    • xvilka 4 days ago ago

      It's just a wrapper around LLVM. So any project would be forced to ship also the corresponding LLVM version, if it's not present on the system - e.g. for Windows or embedded applications. A bit too much for a simple disassembler. So it's not a direct replacement for Capstone.

    • saagarjha 4 days ago ago

      That's basically what Capstone is? Except not vendoring its own LLVM.

      • xvilka 4 days ago ago

        Capstone doesn't vendor LLVM either. It just contains some pieces of the LLVM-ish infrastructure that were converted from C++ to the pure C and are pretty lean, without any external dependency.

    • ashvardanian 5 days ago ago

      It looks pretty promising! How would you compare the strengths/weaknesses?

      • stuxnot 4 days ago ago

        Full disclosure, I'm one of the nyxstone developers - so I might be biased.

        In comparison to capstone, nyxstone lacks the features of instruction decomposition and providing read/written registers. In addition, nyxstone directly interfaces with LLVM and thus is expected to be a lot slower than capstone, which uses instruction tables generated by a modified LLVM.

        I want to note here that Nyxstone is intended more as a replacement for Keystone than Capstone. We added the disassembler mainly because we could. Compared to Keystone, nyxstone allows precise definition of target triple and ISA extensions, allows definition of external labels, supports structured output with instruction details (address, bytes, assembly), rejects partial and invalid inputs and rejects instructions not supported by the specific core (for example UMAAL is supported by Cortex-M4, but not by Cortex-M3), and is more up to date. Nyxstone does not require patches in the LLVM source tree, and thus is (I'd argue) more maintainable and easier to keep up to date.

  • Cieric 5 days ago ago

    Haven't had a chance to use capstone yet, but a project I really like happens to use it.

    https://github.com/xoreaxeaxeax/sandsifter

  • deoxykev 5 days ago ago

    Imhex is a really great frontend for Capstone. https://github.com/WerWolv/ImHex

  • stonethrowaway 5 days ago ago

    I think it’s incredible this is implemented in C. Well done!