I usually use ChatGPT for such microbenchmarks (of course I design it myself and use LLM only as dumb code generator, so I don't need to remember how to measure time with nanosecond precision. I still have to add workarounds to prevent compiler over-optimizing the code). It's amazing, that when you get curious (for example, what is the fastest way to find an int in a small sorted array: using linear, binary search or branchless full scan?) you can get the answer in a couple minutes instead of spending 20-30 minutes writing the code manually.
By the way, the fastest way was branchless linear scan up to 32-64 elements, as far as I remember.
I usually use ChatGPT for such microbenchmarks (of course I design it myself and use LLM only as dumb code generator, so I don't need to remember how to measure time with nanosecond precision. I still have to add workarounds to prevent compiler over-optimizing the code). It's amazing, that when you get curious (for example, what is the fastest way to find an int in a small sorted array: using linear, binary search or branchless full scan?) you can get the answer in a couple minutes instead of spending 20-30 minutes writing the code manually.
By the way, the fastest way was branchless linear scan up to 32-64 elements, as far as I remember.
> Don’t pass around data of size 4046-4080 bytes or 8161-8176 bytes, by value (at least not on an AMD Ryzen 3900X).
What a fascinating CPU bug. I am quite curious as to how that came to pass.