Was that a later edit? I didn't see it when I read the comment. Also what's the ...

Was that a later edit? I didn't see it when I read the comment. Also what's the point of saying that something is not possible and then "actually it's possible, but I don't like it anyway" in parentheses?

What you're saying is wrong: if the reader section is long, you have no problem amortizing the cache invalidation. Sharded counts are useful when the reader section is small, and the cache miss becomes dominant.

Also I don't get this "RW locks are code smell" dogma, not having RW mutexes forces you to design the shared state in a way that readers can assume immutability of the portion of the state they acquire, which usually means heavily pointer-based data structures with terrible cache locality for readers. That is, in order to solve a non-problem, you sacrifice the thing that really matters, that is read performance.

I've heard this thing from Googlers, who didn't have a default RW mutex for a while, then figured out that they could add support for shared sections for free and suddenly RW mutexes are great.