Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven't fully dugged through the article, just skimmed over, looks like all good info all around.

I'm also mostly familiar with Windows, and on Windows until recently (few years ago), dynamically loading (e.g. with "LoadLibrary", e.g. "dlopen") a .dll caused issue with the .dll's own thread_locals. Microsoft fixed this, but folks have observed slower code

https://developercommunity.visualstudio.com/t/5x-performance...

To quote only the observed case there:

``` In VS2017 15.9.26 this executes in ~270ms and with VS2019 16.7.1 it takes ~1450ms. ```

Here are the notes too - https://learn.microsoft.com/en-us/cpp/overview/cpp-conforman...



> dynamically loading (e.g. with "LoadLibrary", e.g. "dlopen") a .dll caused issue with the .dll's own thread_locals.

Do you know what these issues were? I'm curious because I'm working on Pd (https://en.wikipedia.org/wiki/Pure_Data), which uses lots of thread local variables internally when built as a multi-instance library. Libpd itself may be loaded dynamically when embedded in an audio plugin. I'm not aware of any problems so far...


It used to be that .dlls loaded by the .exe on startup (e.g. implicitly listed there) would get their thread local vars correctly (TLS), but dlls loaded later (like /DELAYLOAD or through LoadLibrary) would not. (the workaround was to initialize these through TlsAlloc/TlsFree, and have hook in DllMain to clean up)

But then Microsoft added /Zc:tlsGuards - https://learn.microsoft.com/en-us/cpp/build/reference/zc-tls... - which is now the default that fixes the issue, but with some significant performance penalty (e.g. the "bug" that I've listed).

I guess you can't have it both ways easy...

On the clang/clang-cl side, there is https://clang.llvm.org/docs/ClangCommandLineReference.html#c...

to support this.

So check your compiler version and options :)

Also the notes posted here about CRT mixing might apply to you (not sure though) - https://learn.microsoft.com/en-us/cpp/porting/binary-compat-...

I work in a gamedev world, and plugins, ffi, delay loaded dlls etc. are constant pain that one needs to look and solve issues around.


So this was only a MSVC bug? Most people compile Pd with MinGW, which would explain why we never ran into this issue.

Do you happen to have a link to the original MSVC bug report (i.e. the wrong thread locals, not the performance regression)?


Note that MinGW uses libwinpthread, which is known to have slow TLS behavior anyway (I've observed a 100% overhead compared to running the same program under WSL using a linux-native GCC). c.f. https://github.com/msys2/MINGW-packages/discussions/13259


I haven't looked into it, but going through the release notes for tlsGuards showed this - though not directly a bug report

https://learn.microsoft.com/en-us/cpp/overview/cpp-conforman...

and also the implementation in "clang" (for "clang-cl" being conformant with MSVC) - https://reviews.llvm.org/D115456#3217595

then last year clang-cl also added ways to disable this (if need to), probably this hit some internal issue and had to be resolved. Maybe "thread_local" have become more widely used (unlike OS specific "TlsAlloc")


Thanks! Fortunately, this issue does not affect us because our thread locals are all zero initialized integers or pointers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: