> This has the whiff of someone discovering the basics of high-performance locking.
Came here to say this.
> So you need a startup calibration that measures it, otherwise it will have a random outcome.
I guess one question is if AMD or Intel plan to make any other CPUs with "fast" PAUSE. Looks like Intel has been consistently using the "slow" version since Skylake including on their little cores, while AMD has gone slow much more recently starting in Zen 2.
A reasonable approach with "slow" PAUSE expected but still with large gen-to-gen variation in timing would be to base your spin on a time period, e.g., measured with RDTSC after every PAUSE, rather than a spin count, which should generally preserve the spin interval at least measured in wall clock time.
This would have been a bad solution with "fast" PAUSE though since the cost of the RDTSC would have dwarfed the pause, so you might lose the "minimize load on the hardware thread" part of the PAUSE effect. Though I would question how good the "fast" PAUSE was at that anyway: perhaps RDTSC alone would be a fine substitute: it executes even slightly fewer uops/cycle than PAUSE on Haswell!
It would be nice if slow PAUSE returned a cycle counter or RDTSC-like time counter to enable this kind of spinning. Or maybe UMWAIT just obsoletes all of these spinning approaches? I haven't gotten to play with it yet.
Came here to say this.
> So you need a startup calibration that measures it, otherwise it will have a random outcome.
I guess one question is if AMD or Intel plan to make any other CPUs with "fast" PAUSE. Looks like Intel has been consistently using the "slow" version since Skylake including on their little cores, while AMD has gone slow much more recently starting in Zen 2.
A reasonable approach with "slow" PAUSE expected but still with large gen-to-gen variation in timing would be to base your spin on a time period, e.g., measured with RDTSC after every PAUSE, rather than a spin count, which should generally preserve the spin interval at least measured in wall clock time.
This would have been a bad solution with "fast" PAUSE though since the cost of the RDTSC would have dwarfed the pause, so you might lose the "minimize load on the hardware thread" part of the PAUSE effect. Though I would question how good the "fast" PAUSE was at that anyway: perhaps RDTSC alone would be a fine substitute: it executes even slightly fewer uops/cycle than PAUSE on Haswell!
It would be nice if slow PAUSE returned a cycle counter or RDTSC-like time counter to enable this kind of spinning. Or maybe UMWAIT just obsoletes all of these spinning approaches? I haven't gotten to play with it yet.