Fast Sorting, Branchless by Design

(00f.net)

28 points | by jedisct1 4 days ago ago

9 comments

  • rep_lodsb 14 hours ago ago

        const diff = if (order == .asc) b_int - a_int else a_int - b_int;
        const sign_bit: u1 = @truncate(@as(UWInt, @bitCast(diff)) >> @intCast(bits));
        var mask_word = @as(usize, 0) -% @as(usize, sign_bit);
    
    This code in the fallback path (when no constant-time @min/@max is available) will only work if the subtraction doesn't overflow. Or is this not a problem for some reason?
    • jedisct1 14 hours ago ago

      a_int and b_int are signed values.

      • rep_lodsb 13 hours ago ago

        It makes no difference whether they're signed or unsigned. Unless the subtraction is checking for overflow, or using a wider integer type than the numbers being compared, the high bit will not in every case indicate which number is smaller.

        e.g.

            0x8000_0000 < 0x0000_0001 for signed numbers
            0x8000_0000 - 0x0000_0001 = 0x7fff_ffff, high bit clear
        • jedisct1 13 hours ago ago

          They are using a wider type.

          • rep_lodsb 13 hours ago ago

            Yes, looking at the source code on GitHub now cleared that up!

            Didn't see it mentioned in the article though, maybe I missed it. If not, I think that this detail would be a good thing to include, both since it's a common mistake that others with less experience might make, and to get ahead of nitpicky comments like mine :)

  • jstrieb 3 days ago ago

    Wow, this is jam-packed with interesting information. Thanks for writing it! (Also thanks for all of your other great open source work!)

    Are there plans to upstream this into the Zig std library? Seems like it could be useful for more than just the cryptography package, since the benchmarks at the end have it often being faster than std pdqsort. I just checked the issue trackers on Codeberg and GitHub, and didn't see anything mentioning djbsort or NTRU Prime, which leads me to believe there aren't (official) plans to upstream this (yet).

    • tialaramex 17 hours ago ago

      > often being faster than std pdqsort.

      pdqsort is a generic comparison sort. Want to sort employee names, customer email addresses, JSON blobs, or Zebras? No problem, pdqsort just needs an ordering, in both Zig and C++ you write this as a single boolean "less" predicate.

      DJB's speed-up relies on vectorization, which works great for integers or things you can squint at and see an integer - but obviously can't sort your employee names, customer email addresses, JSON blobs or Zebras. You could write these branchless network designs anyway but I'm pretty sure they'd be markedly slower, at least for some common inputs.

    • ozgrakkurt 17 hours ago ago

      The blocksort in stdlib can be faster than pqdsort too in my experience

  • user____name 14 hours ago ago

    Radix sort also seems to fit the bill?