24 comments

  • nly 2 days ago ago

    My goto these days (and afaik the state of the art) is boost::unordered_flat_set paired with rapidhash for hashing (since the GNU std::hash functions based on murmurhash are ridiculously slow)

    The cacheline performance is pretty hard to beat (SIMD optimised linear scan before hopping), which is where all the wins come in the real world.

    But basically any of the faster hash maps from absl, boost or folly are going to wreck the standard library in terms of perf

    • lefty2 13 minutes ago ago

      I tried both unordered_flat_map and hopscotch map with the pathfinding algorithm that my game uses. Both were slower than regular unordered_map. unordered_flat_map about 33% slower and hopscotch was 390% slower

    • spacechild1 2 days ago ago

      > with rapidhash for hashing (since the GNU std::hash functions based on murmurhash are ridiculously slow)

      Doesn't boost::unordered_flat_map use boost::hash by default? How does it compare to rapid hash and std::hash?

      • nly a day ago ago

        It's not great.

        Rapidhash is just insanely fast and provides good distribution, with built-in support for mixing.

  • jll29 2 days ago ago

    google::dense_hash_map is faster than this new implementation according to their benchmark's diagram (google::dense_hash_map has the lowest runtime of all tested methods).

  • einpoklum 2 days ago ago

    The principle of "hopscotch hasing" is described, for example, here: https://en.wikipedia.org/wiki/Hopscotch_hashing

    ----

    An point often missed by people who need to/want to do hashing:

    In practice, with your real workloads, you can often make do with actually "giving up" on the hasing of some fraction of the elements, whose buckets, neighborhoods and such are already occupied - and instead put those aside for separate out-of-band handling. hash table implementations such as this one (or std::unordered_map and all the rest), absolutely _must_ succeed in inserting your values - and so must always allow for more collisions, resizing etc.

  • stevefan1999 2 days ago ago

    Ah, hopscotch hash, I tried using it on my CSGO cheat literally 10 years ago, for the object reflection (retrospection) system based on compiler type ID and unique hashing scheme with function signature. I merely used it for hopefully getting a performance on the "dependency injection" side of things, until I realized it is actually a service locator pattern and performance won't improve due to this architecture anyway.

    It was 3 years later when I was in college I learned advanced data structures and came into Cuckoo Hashing, then Robinhood hash, and the combination of both Cuckoo and Robinhood hash => Hopscotch hashing

    • AlexeyBelov a day ago ago

      > my CSGO cheat

      Why would you openly admit this?

      • stevefan1999 3 hours ago ago

        I learned C++ because of it, what's so shame about it?

  • mgaunard 2 days ago ago

    How does it compare to boost unordered flat map?

    Looks like the benchmarks were last updated in 2019.

    • compiler-guy 2 days ago ago

      https://tessil.github.io/2016/08/29/benchmark-hopscotch-map....

      Has some older benchmarks, including those two.

      • jeffbee 2 days ago ago

        A more recent benchmark is https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

        However, it lacks the newer Boost stuff which is very fast.

        The Hopscotch map was interesting at the time but due to unfortunate timing was immediately outshone by absl::unordered_flat_map A.K.A. "Swiss tables", and there's been even more water under the bridge since then.

        • RossBencina 2 days ago ago

          Abseil Swiss Tables carefully avoids intermediate allocations/copy constructor calls.[1] I'd be wary about inferring underlying algorithm performance from benchmarks that don't explicitly control for these optimisations. (Or maybe everyone is using them and I'm out of touch.)

          [1] https://abseil.io/about/design/swisstables

          • jeffbee 2 days ago ago

            Algorithmically hopscotch has a better strict worst case whereas swiss tables have a degenerate O(N) lookup. But there are a lot of maps like that. robin_hood::flat_hash_map is very fast but I can create insert sequences under which it will call std::abort, which I feel is ridiculous. But if your hash map isn't exposed to hostile inputs then you might not be concerned.

        • utopcell 2 days ago ago

          You probably mean absl::flat_hash_map<>.

          • jeffbee 2 days ago ago

            Yeah. I typed the comments on my phone without bothering with the docs. I probably got all the other classes wrong, too.

        • quadrature 2 days ago ago

          Is there something better than Swiss tables ?.

          • reinitctxoffset 2 days ago ago

            On modern super wide znver5 or SBSA with full-clock scalar 256 or 512 ALUs / SIMD lanes deep pipelines hight BTB pressure eyc. it's just really difficult to make a priori statements about performance for a given workload.

            absl::flat_hash_map (or folly::F14) are great defaults if you can eat the invalidation semantics.

            But if it's really hot you measure by workload and have infrastructure to flag the right ones in.

            This seems promising. I'll start benching it alongside the other likely lads.

          • szmarczak 2 days ago ago

            No. Fundamentally it's not possible to be faster.

            • infamouscow 2 days ago ago

              This is not true. It is fast as a general purpose hash table, but claiming it's the fastest across all datasets and workloads is silly.

              • szmarczak 2 days ago ago

                > claiming it's the fastest across all datasets

                I never claimed so. Please stop stating I said something when I didn't.

                > as a general purpose hash table

                That's what I claimed. The question IS about hash tables. If you want a hash table of any content, it's impossible to get faster. Unless you check all possible keys at once - only this will get you faster.

      • mgaunard 2 days ago ago

        boost unordered flat map didn't exist in 2016 (nor 2019).

  • teo_zero 2 days ago ago

    The concept is very similar to robin hood. In fact most of the performance charts show that the curves of hopscotch and robin hood are very close. I think I'd prefer robin hood as it's well known.