Capstone supports an impressive breadth of architectures. However, if all you need is x86/AMD64 decoding and disassembly, there are much higher quality (in terms of accurate decoding) libraries out there.
I wrote a differential fuzzer for x86 decoders a few years ago, and XED and Zydis generally performed far better (in terms of accuracy) than Capstone[1]. And on the Rust side, yaxpeax and iced-x86 perform very admirably.
In my previous job, I've worked on a project that requires disassembling large amounts of x86/amd64 instructions (several billion instructions each running is very common). I've found also that Zydis is much faster than Capstone.
The spec is very large, not particularly well written, and is not “total” (in the sense that AMD64 and IA32e and other x86-64 flavors are all subtly different). There are a lot of ways to get it wrong; even XED (the reference decoder from Intel) has bugs.
If I remember correct, the Intel SDM alone is over 3000 pages long.
lol, no. For one Capstone has a lot of bugs (it uses some old version of LLVM as its base) but the whole question of how to decode things is complicated because there are a lot of pitfalls and inconsistencies that different disassemblers handle differently. And what the hardware does is a different question entirely: it may not match the spec, or even other processors with the same ISA.
It's just a wrapper around LLVM. So any project would be forced to ship also the corresponding LLVM version, if it's not present on the system - e.g. for Windows or embedded applications. A bit too much for a simple disassembler. So it's not a direct replacement for Capstone.
Capstone doesn't vendor LLVM either. It just contains some pieces of the LLVM-ish infrastructure that were converted from C++ to the pure C and are pretty lean, without any external dependency.
Full disclosure, I'm one of the nyxstone developers - so I might be biased.
In comparison to capstone, nyxstone lacks the features of instruction decomposition and providing read/written registers.
In addition, nyxstone directly interfaces with LLVM and thus is expected to be a lot slower than capstone, which uses
instruction tables generated by a modified LLVM.
I want to note here that Nyxstone is intended more as a replacement for Keystone than Capstone. We added the disassembler
mainly because we could. Compared to Keystone, nyxstone allows precise definition of target triple and ISA extensions,
allows definition of external labels, supports structured output with instruction details (address, bytes, assembly),
rejects partial and invalid inputs and rejects instructions not supported by the specific core
(for example UMAAL is supported by Cortex-M4, but not by Cortex-M3), and is more up to date.
Nyxstone does not require patches in the LLVM source tree, and thus is (I'd argue) more maintainable and easier to keep
up to date.
Capstone supports an impressive breadth of architectures. However, if all you need is x86/AMD64 decoding and disassembly, there are much higher quality (in terms of accurate decoding) libraries out there.
I wrote a differential fuzzer for x86 decoders a few years ago, and XED and Zydis generally performed far better (in terms of accuracy) than Capstone[1]. And on the Rust side, yaxpeax and iced-x86 perform very admirably.
[1]: https://blog.trailofbits.com/2019/10/31/destroying-x86_64-in...
In my previous job, I've worked on a project that requires disassembling large amounts of x86/amd64 instructions (several billion instructions each running is very common). I've found also that Zydis is much faster than Capstone.
How is there any discrepancy in accuracy? Isn’t it just a matter of following the spec?
The spec is very large, not particularly well written, and is not “total” (in the sense that AMD64 and IA32e and other x86-64 flavors are all subtly different). There are a lot of ways to get it wrong; even XED (the reference decoder from Intel) has bugs.
If I remember correct, the Intel SDM alone is over 3000 pages long.
lol, no. For one Capstone has a lot of bugs (it uses some old version of LLVM as its base) but the whole question of how to decode things is complicated because there are a lot of pitfalls and inconsistencies that different disassemblers handle differently. And what the hardware does is a different question entirely: it may not match the spec, or even other processors with the same ISA.
It just updated to the nearly latest LLVM, so that argument is void: https://github.com/capstone-engine/capstone/blob/next/docs/c...
I'll believe it when I see it. If I can go a few years without wasting time during a CTF because of an incorrect decode I'll change my tune.
This has been my experience as well. I’ve had to rip Capstone out of more research projects than I care to admit.
Did you mean x86/x64 decoding?
Looking at the libs, none of them seem to mention ARM64 inst. decoding.
Yep, I meant AMD64, fixed.
Capstone is very useful!
Someone (not me) has also cross-compiled Capstone to WebAssembly so it can be used in client-side browser applications.
https://alexaltea.github.io/capstone.js/
I've used this in a couple of projects to support disassembly in static web apps with no back end.
If you find Capstone interesting, check out the Unicorn Engine.
https://github.com/unicorn-engine/unicorn
Also, if anyone is interested in an example of using capstone for basic disassembly and analysis, here is a link to my capstool project.
https://github.com/alexander-hanel/capstool
Right, three related multi-platform and multi-architecture frameworks from the same people:
* Capstone: disassembly.
* Keystone: assembler.
* Unicorn: CPU emulator.
Unicorn is fantastic. I used it to emulate an SoC's boot environment to get around a very weird HAL, and it worked perfectly. Awesome tool!
It's difficult to find a succinct overview. Here is a slide deck buried among links: http://www.capstone-engine.org/BHUSA2014-capstone.pdf
Capstone is sort of an "industry standard" open source multi-architectural disassembler library, especially for security tooling.
This is a useful page to get a sense of what it's about (ie, what you're getting out of it vs. something more like objdump):
https://www.capstone-engine.org/beyond_llvm.html
It is also used in one of the Linux kernel debuggers: https://codeberg.org/pf-kernel/crush
Haha, I noticed you had this commit https://codeberg.org/pf-kernel/crush/commit/24c19bfacc7fff64...
Upcoming v6 release (current 'next' branch) of the capstone updated SystemZ (S390) significantly, so it should work even better now.
Another good replacement for capstone/keystone based on LLVM is nyxstone https://github.com/emproof-com/nyxstone
It's just a wrapper around LLVM. So any project would be forced to ship also the corresponding LLVM version, if it's not present on the system - e.g. for Windows or embedded applications. A bit too much for a simple disassembler. So it's not a direct replacement for Capstone.
That's basically what Capstone is? Except not vendoring its own LLVM.
Capstone doesn't vendor LLVM either. It just contains some pieces of the LLVM-ish infrastructure that were converted from C++ to the pure C and are pretty lean, without any external dependency.
It looks pretty promising! How would you compare the strengths/weaknesses?
Full disclosure, I'm one of the nyxstone developers - so I might be biased.
In comparison to capstone, nyxstone lacks the features of instruction decomposition and providing read/written registers. In addition, nyxstone directly interfaces with LLVM and thus is expected to be a lot slower than capstone, which uses instruction tables generated by a modified LLVM.
I want to note here that Nyxstone is intended more as a replacement for Keystone than Capstone. We added the disassembler mainly because we could. Compared to Keystone, nyxstone allows precise definition of target triple and ISA extensions, allows definition of external labels, supports structured output with instruction details (address, bytes, assembly), rejects partial and invalid inputs and rejects instructions not supported by the specific core (for example UMAAL is supported by Cortex-M4, but not by Cortex-M3), and is more up to date. Nyxstone does not require patches in the LLVM source tree, and thus is (I'd argue) more maintainable and easier to keep up to date.
Haven't had a chance to use capstone yet, but a project I really like happens to use it.
https://github.com/xoreaxeaxeax/sandsifter
Imhex is a really great frontend for Capstone. https://github.com/WerWolv/ImHex
I think it’s incredible this is implemented in C. Well done!
It uses semi-automatic mechanism[1][2] of generating C code from the LLVM sources (TableGen files).
[1] https://github.com/capstone-engine/capstone/blob/next/suite/...
[2] https://github.com/capstone-engine/capstone/blob/next/docs/A...