Note that MinGW uses libwinpthread, which is known to have slow TLS behavior anyway (I've observed a 100% overhead compared to running the same program under WSL using a linux-native GCC). c.f. https://github.com/msys2/MINGW-packages/discussions/13259
then last year clang-cl also added ways to disable this (if need to), probably this hit some internal issue and had to be resolved. Maybe "thread_local" have become more widely used (unlike OS specific "TlsAlloc")
Do you happen to have a link to the original MSVC bug report (i.e. the wrong thread locals, not the performance regression)?