Understanding Std:Shared_mutex from C++17

(cppstories.com)

43 points | by ibobev 5 days ago ago

27 comments

  • stevefan1999 a day ago ago

    The equivalent in Rust is RwLock: https://doc.rust-lang.org/std/sync/struct.RwLock.html

    The more general idea for this is readers-writer lock: https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

  • loeg a day ago ago

    https://www.cppstories.com/2026/shared_mutex/#newer-concurre...

    > Since C++17, the concurrency library has expanded significantly. We now have:

    > ...

    > safe memory reclamation mechanisms (RCU, hazard pointers in C++26)

    > These tools focus mostly on thread lifetime, coordination, cancellation or lock-free programming.

    > std::shared_mutex fills a different role.

    > It is still a mutual-exclusion primitive, explicitly designed to protect a shared state with many readers and few writers. It does not compete with atomics, condition variables, or RCU.

    What does it mean to say shared_mutex does not "compete" with atomics/CVs/RCU? There are many situations where RCU or other safe reclamation mechanism is a good replacement for state + shared_mutex. (And some situations where atomics are a decent replacement.)

    • surajrmal a day ago ago

      If you want the write to be a synchronization point after which all threads will observe only the new value, it's only possible with shared mutex. Of course you can use a barrier to accomplish that instead but using something like hazard pointers or rcu doesn't synchronize by itself.

      • loeg a day ago ago

        This is true, but it is a subset of designs using shared_mutex.

      • foldr a day ago ago

        Not an expert, but can’t you get synchronization like this just by using release/acquire memory order with C11 atomic stores and loads?

        • jpc0 15 hours ago ago

          From my knowledge RCU/epoch/Hazard pointers are useful in data structures and algorithms where raw atomics cannot be used but you still nees lock free or in some cases wait free semantics.

          If you can use an atomic then these are overkill and you should just be using an atomic, but many times things that are atomic does not make it lock free, if there's no hardware support the compiler will add a mutex.

        • jeffbee a day ago ago

          Yes. But if you are tempted to do this in most cases you should just use a mutex anyway.

      • secondcoming a day ago ago

        I think it's possible with an atomic<shared_ptr> too (C++20)?

        A shared_mutex comes in useful when you can't really have multiple copies of the shared data due to perhaps memory usage, so readers fail to acquire it when the writer is updating it.

  • i_am_a_peasant a day ago ago

    You know it often is the case that APIs like this both in C++ and Rust don't offer you enough knobs when your usecase deviates from being trivial.

    It happens with locking APIs, it happens with socket APIs, anything platform dependent.

    Does the C++ standard give you an idiomatic way to set PTHREAD_RWLOCK_PREFER_READER_NP or PTHREAD_RWLOCK_PREFER_WRITER_NP explicitly when initializing a rwlock? Nope. Then you either roll your own or in Rust you reach for a crate where someone did the work of making a smarter primitive for you.

    • VorpalWay a day ago ago

      Yeah, you can't enable priority inheritance for mutexes in std of either C++ or Rust. Which is a show stopper for hard realtime (my dayjob).

      And then you have mutexes internally inside some dependency still (e.g. grpc or what have you). What I would really like is the ability to change defaults for all mutexes created in the program, and have everyone use the same std mutexes.

      By the way: rwlocks are often a bad idea, since you still get cache contention between readers on the counter for number of active readers. Unless the time you hold the lock for is really long (several milliseconds at least) it usually doesn't improve performance compared to mutexes. Consider alternatives like seqlocks, RCU, hazard pointers etc instead, depending on the specifics of your situation (there is no silver bullet when it comes to performance in concurrent primtitves).

      • loeg a day ago ago

        > What I would really like is the ability to change defaults for all mutexes created in the program, and have everyone use the same std mutexes.

        Yes. I imagine subbing out a debug mutex implementation that tracks lock ordering and warns about order inversion (similar to things like witness(4)): https://man.freebsd.org/cgi/man.cgi?witness(4)

        > rwlocks are often a bad idea

        Yes.

        > since you still get cache contention between readers on the counter for number of active readers

        There are rwlock impls that put the reader counts on distinct cache lines per core, or something like that (e.g., folly::SharedMutex), mitigating this particular problem. But it isn't the only problem with rwlocks.

        > Consider alternatives like seqlocks, RCU, hazard pointers etc instead, depending on the specifics of your situation (there is no silver bullet when it comes to performance in concurrent primtitves).

        Yes. :)

        • menaerus a day ago ago

          What are some of the other problems with rwlocks? I genuinely ask because I used them with quite good success for pretty complicated use-case at large scale and very high concurrency.

          • ethin a day ago ago

            I have as well. I find RW locks much easier to use than, say, a recursive mutex. Mainly since it took me a long time to actually understand how a recursive mutex actually works in the first place. When you want to use only the stdlib, you aren't left with many choices. At least in the STL.

          • loeg a day ago ago

            Mostly that if you actually have both readers and writers, they obstruct each other; this is often undesirable. And you have to pick some bias in advance. You can get priority inversion because readers are anonymous.

            • menaerus 16 hours ago ago

              Sure, the workload was with both the readers and the writers, and it was a pretty "bursty" one with high volume of data which had to scale across all the cores. So, not a particularly light workload. It was basically a hi-concurrency cache which was write-mostly in the first-phase (ingestion), and then read-mostly in the second-phase (crunching). It had to support multiple sessions simultaneously at the same time so in the end it was about supporting heavily mixed read-write workloads, e.g. second-phase from session nr. 1 could overlap with the first-phase from session nr. 2.

              To avoid the lock contention I managed to get away with sharding across the array of shared-mutexes and load-balancing the sessions by their UUIDs. And this worked pretty well - after almost ~10 years it's still rock solid, and workloads are basically ever changing.

              I considered the RCU for this use-case too but I figured that it wouldn't be as good fit because the workload is essentially heavily mixed so I thought it would result with a lot of strain to the memory subsystem by having to handle multiple copies of the data (which was not small).

              One thing I don't understand is the priority inversion and how that may happen. I'll think about it, thanks.

      • jcalvinowens a day ago ago

        > What I would really like is the ability to change defaults for all mutexes created in the program, and have everyone use the same std mutexes.

        Assuming you're building the whole userspace at once with something like yocto... you can just patch pthread to change the default to PTHREAD_PRIO_INHERIT and silently ignore attempts to set it to PTHREAD_PRIO_NONE. It's a little evil though.

        > By the way: rwlocks are often a bad idea

        +1

        • VorpalWay a day ago ago

          That is a great terrible idea (I really have to think a bit more on that). Won't help for Rust, since the mutexes there use futex directly, so you would have to patch the standard library itself (and for futex it is more complex than just enabling a flag). Seems plausible that other libraries and language runtimes might do similar things.

          • surajrmal a day ago ago

            The implementation for the rust std mutex when targeting fuchsia does implement priority inheritance by default, but the zircon kernel scheduler and futex implementation are written with priority inheritance in mind as the default approach rather than something ad hoc tacked on. Unfortunately on Linux it seems like there is a large performance tradeoff which may not be worthwhile for the common case. It does seem like it would be nice to set an env variable to change the behavior through rather than require a recompile of libstd. A lot of programs use alternatives to the std library as well like parking_lot, which is indeed a pain.

            Sometimes I feel like trying to use Linux for realtime is an effort in futility. The ecosystem is optimized for throughput over fairness, predictability, and latency.

            • jcalvinowens a day ago ago

              > Unfortunately on Linux it seems like there is a large performance tradeoff

              Implementing transitive priority inheritance is just inherently algorithmically more expensive: there's no avoiding that.

              > Sometimes I feel like trying to use Linux for realtime is an effort in futility.

              If you're not actually using an RT kernel, yeah, it's futile. But if you are, the guarantees are pretty strong... on x86 PCs, the hardware gets in the way much more than the software in my experience. There's a lot of active work upstream.

              • VorpalWay a day ago ago

                The Linux priority inheritance futexes are also fair, which adds unnecessary overhead if you only care about PI, not fairness.

        • i_am_a_peasant a day ago ago

          i think both you guys have the same job as me lol

      • nly a day ago ago

        There are rw lock implementations where waiters (whether or readers or writers) don't contend on a shared cache line (they only touch it once to enqueue themselves, not to spin/wait)

        These are usually called "scaleable locks" and the algorithms for them have been out there for decades. They are optimal from a cache coherence point of view.

        The issue with them is it's impossible to support the same API as you're used to with std::shared_mutex, as every thread needs it's own line.

        • gpderetta 13 hours ago ago

          > it's impossible to support the same API as you're used to with std::shared_mutex

          If I understand the problem correctly, the standard library could specialize the scoped lock objects for the shared mutex, online allocate the waiter object in the lock (so it will be in the stack) and use an internal dedicated API.

          It might be harder (but not impossible) to interoperate threads using direct lock calls with threads using the scoped API.

          In any case it is a moot point, as I don't think any std library does it nor will never do as it would probably be ABI breaking.

    • MaulingMonkey a day ago ago

      One thing I appreciate about Rust's stdlib is that it exposes enough platform details to allow writing the missing knobs without reimplementing the entire wrapper (e.g. File, TcpStream, etc. allows access to raw file descriptors, OpenOptionsExt allows me to use FILE_FLAG_DELETE_ON_CLOSE on windows, etc.)

    • pjmlp a day ago ago

      Because usually that is OS specific and not portable to be part of standard library that is supposed to work everywhere.

      • a day ago ago
        [deleted]
    • surajrmal a day ago ago

      NP means the API is not portable. There are Linux specific extensions for many things but not everything has it. This is also nothing wrong with needing to use an alternative to the standard library if you have more niche requirements.