Clean-room, portable C++17 implementation of the PlanB IPv6 LPM algorithm.
Includes:
- AVX-512 SIMD path + scalar fallback
- Wait-free lookups with rebuild-and-swap dynamic FIB
- Benchmarks on synthetic data and real RIPE RIS BGP (~254K prefixes)
Interesting result: on real BGP + uniform random lookups, a plain Patricia trie can sometimes match or beat the SIMD tree due to cache locality and early exits.
Would love feedback, especially comparisons with PopTrie / CP-Trie.
254K prefixes with skewed distribution means early exits dominate, and no SIMD throughput advantage survives a branch that terminates at depth 3. The interesting edge case is deaggregation events where prefix counts spike transiently and the rebuild-and-swap FIB has to absorb a table that's temporarily 2x normal size
The obvious question, I guess: How much faster are you than whatever is in the Linux kernel's FIB? (Although I assume they need RCU overhead and such. I have no idea what it all looks like internally.)
These days, when I hear a project owner/manager describe the project as a "clean room reimplementation", I expect that they got an LLM [0] to extrude it. This expectation will not always be correct, but it'll be correct more likely than not.
[0] ...whose "training" data almost certainly contains at least one implementation of whatever it is that it's being instructed to extrude...
As far as LLM-produced correctness goes, it all comes down to the controls that have been put in place (how valid the tests are, does it have a microbenchmark suite, does it have memory leak detection, etc.)
Clean-room, portable C++17 implementation of the PlanB IPv6 LPM algorithm.
Includes: - AVX-512 SIMD path + scalar fallback - Wait-free lookups with rebuild-and-swap dynamic FIB - Benchmarks on synthetic data and real RIPE RIS BGP (~254K prefixes)
Interesting result: on real BGP + uniform random lookups, a plain Patricia trie can sometimes match or beat the SIMD tree due to cache locality and early exits.
Would love feedback, especially comparisons with PopTrie / CP-Trie.
254K prefixes with skewed distribution means early exits dominate, and no SIMD throughput advantage survives a branch that terminates at depth 3. The interesting edge case is deaggregation events where prefix counts spike transiently and the rebuild-and-swap FIB has to absorb a table that's temporarily 2x normal size
The obvious question, I guess: How much faster are you than whatever is in the Linux kernel's FIB? (Although I assume they need RCU overhead and such. I have no idea what it all looks like internally.)
I likewise wonder from time to time whether I should replace WireGuard's allowedips.c trie with something better: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
I use Wireguard rarely enough that the AllowedIPs concept gets me every time. It gets easier when I replace it mentally with “Route=” :-)
It's like a routing table on the way out and an ACL on the way in. Maybe an easier way to think of it.
Sure, but how does this differ from a routing table with RPF (which is default in Linux already)?
It's associated per-peer, so it assures a cryptographic mapping between src ip and public key.
I wonder if this would port nicely over to rustybgp.
IPv6 longest-prefix-match (LPM).
Why detect avx512 in build system instead of using #ifdef ?
It actually does detect it using ifdef [0] but uses cmake stuff to avoid passing "-mavx512" kind of flags to the compiler [1].
[0] https://github.com/esutcu/planb-lpm/blob/748d19d5fbd945cefa3...
[1] https://github.com/esutcu/planb-lpm/blob/748d19d5fbd945cefa3...
I wonder how this would look like in risc-v vector instructions
The lines
would be replaced with: and you set FANOUT to __riscv_vsetvlmax_e32m1() at runtime.Alternatively, if you don't want a dynamic FANOUT you keep the FANOUT=8 (or another constant) and do a stripmining loop
This will take FANOUT/VLEN iterations and the branches will be essentially almost predicted.If you know FANOUT is always 8 and you'll never want to changes it, you could alternatively use select the optimal LMUL:
Sad it is c++.
Why? It is 500 lines of pretty basic code. You can port it if you don't like C++ to any language, assuming you understand what it is.
It does look a bit AI generated though
> It does look a bit AI generated though
These days, when I hear a project owner/manager describe the project as a "clean room reimplementation", I expect that they got an LLM [0] to extrude it. This expectation will not always be correct, but it'll be correct more likely than not.
[0] ...whose "training" data almost certainly contains at least one implementation of whatever it is that it's being instructed to extrude...
As far as LLM-produced correctness goes, it all comes down to the controls that have been put in place (how valid the tests are, does it have a microbenchmark suite, does it have memory leak detection, etc.)
side note, but I hate that we've reach the point where we don't know what's written by a human and what's written by an LLM.
That goes for a lot of comments here accusing each other of being a bot.
I feel like we've known internet trust at its highest and it can only go one way now.