Remove dependency on libwinpthread-1.dll and GCC runtime library for mingw-w64 ports#11586
Remove dependency on libwinpthread-1.dll and GCC runtime library for mingw-w64 ports#11586xavierleroy merged 6 commits intoocaml:trunkfrom
Conversation
|
If we go this way (it's a big "if"), all the Windows hackery should be hidden behind functions defined in runtime/win32.c. We cannot pollute With apologies in advance for a silly question: why can't we use |
|
The problem looks well-researched and the result looks nicer than it sounds. As a writer of C library without a Windows development environment, I prefer indeed if there is no additional hoops to go through to have it working on the various Windows ports. The |
|
Here is a proof of concept : https://github.com/xavierleroy/ocaml/tree/refactor-domain-state . In a nutshell, Three implementations of the getter and the setter are provided:
I wonder whether the third implementation (static Let me know what you think and feel free to use this code in the current PR. |
|
This looks nice but since you change the implementation of Caml_state for MacOS, it is worth trying to replace |
|
Just an update from my side: I'd been benchmarking this approach over the weekend, but I hadn't marked the function as pure. What I had found was that across the board on Windows it's slower (virtually every test in the testsuite runs more slowly, and both the sequential and parallel build are noticeably slower). There's one outlier in the sequential build which is why I hadn't yet posted any more detail, but I'll redo the comparisons with the function marked |
Then you'll like my next proposal even more... (Teaser!) |
|
Proof of concept, take 2: https://github.com/xavierleroy/ocaml/tree/refactor-domain-state-2 Now simpler and faster! It starts with the assumption that The resulting code is quite simple and should be faster than the existing code on macOS (no Let me know what you think. If your thoughts are positive I'll submit it as a PR. |
|
My thoughts are very positive - the Windows-specific bits are gone! 🙂 I'll rebase what I've got here (switching to static linking) on to your TLS implementation and see what the performance numbers are looking like. |
This reverts commit f1c290d.
34b1218 to
ea0255a
Compare
Enables the `-l:` linking requirement for mingw-w64.
|
Rebased over #11598 (also running through precheck#784) |
xavierleroy
left a comment
There was a problem hiding this comment.
Looks good to us (@Octachron & @xavierleroy), but there's one thing we'd like to understand better, see below.
| #if defined(_WIN32) && !defined(NATIVE_CODE) | ||
| /* Ensure that pthread.h marks symbols __declspec(dllimport) so that they can be | ||
| picked up from the runtime (which will have linked winpthreads statically) */ | ||
| #define DLL_EXPORT | ||
| #endif |
There was a problem hiding this comment.
@Octachron and I don't quite understand what this is doing. Which pthread.h file is being affected by this #define?
There was a problem hiding this comment.
pthread.h from the winpthreads library in mingw-w64 (/usr/x86_64-w64-mingw32/sys-root/mingw/include/ptjhread.h). If DLL_EXPORT is defined and IN_WINPTHREAD is not defined, then it decorates all the API functions with __declspec(dllimport) (as we would if they were declared CAMLextern in our headers).
That doesn't appear to be expressly documented - if for some reason the header changed that, the possible impact would be the delightful "Cannot relocate" errors.
It has to be done here because caml/domain.h #include's caml/platform.h which #include's <pthread.h> (i.e. by the time st_pthreads.h does #include <pthread.h> it has already happened).
There was a problem hiding this comment.
Thanks for the explanations. We were looking at the wrong pthread.h file.
This trick is kind of fragile indeed, but no need to think of alternatives until it breaks.
|
CI precheck looks happy, so let me merge this in trunk and @Octachron will choose when to pick it in 5.0. |
Remove dependency on libwinpthread-1.dll and GCC runtime library for mingw-w64 ports (cherry picked from commit cd36f15)
|
This PR combined with #11598 is going through precheck#785 on a test branch for pushing to 5.0 |
|
Since precheck is non-red, it seems fine to cherry-pick those changes to 5.0 . |
This PR alters the mingw-w64 build to link the winpthreads library statically, removing the need for additional runtime DLLs (or, another way, restoring the OCaml 4.x lack of need for additional runtime DLLs).
Doing this proved slightly more complicated than I'd expected, as it turns out that GCC's implementation of implicit TLS (
__thread) pulls in both its runtime library and also dynamically links the winpthreads library when building DLLs. This is something we've avoided in the past in the i686 port by using-static-libgccbut not only is that slightly worrying to have to do for the 64-bit port as well, but it also doesn't work for the pthreads linking.This PR consists of three very closely linked parts:
PTHREAD_machinery fromotherlibs/systhreads/Makefileis removed as pthreads is now always linked as part of the runtime and it would result in double-linking when switching to static winpthreads.Caml_statenow uses a similar technique topthread_getspecificon macOS, rather than implicit TLS via__thread. As a very minor aside, theSET_Caml_statemacro is shifted inside aCAML_INTERNALSblock, since it's only used indomain.cwhere we can actually load the required Windows headers to useTlsSetValue(more on this below).-l:library linking specification flexdll#106 (this PR teaches flexlink to understand-l:syntax).For background, there is no better explanation of Windows TLS than Ken Johnson's Nynaeve Adventures in Windows debugging blog (one should also, of course, read Raymond Chen on the same subject).
The way I propose implementing
Caml_statewarrants (considerable) further explanation. The problem withCaml_stateis that it's in a public header. Our headers and codebase do not interoperate well withwindows.h(cf. the trickery required aroundATOM) and it seems best to keep it that way, as it's likely that user code suffers the same problem. This already means that the macro implementation ofCaml_statewants to avoid using Windows API functions and types, because we have to declare compatible versions in the header, and it's really awkward. Coupled with that is the fact thatTlsGetValuesomewhat inexplicably insists on clearingGetLastError(see the related previous fix in #10220), I've elected here to expand the macro definitions fromwinternl.hforNtCurrentTeband then havecaml_make_domain_state_keyboth reserve a TLS slot and also compute the offset in the TEB where that TLS slot will be.I did implement a version which instead included
windows.h, but this involves some really ugly pre-processor mucking with symbol names in various other files (both in the debugger and in the interpreter). I also looked at the alternative with manual declarations forTlsGetValue,SetLastErrorandGetLastErrorand it's similarly hideous and, frankly, looks more brittle than just computing the TEB offset for the TLS slot. It's also fundamentally upsetting to have to make 4 system calls and 2 branches just to dereference a single pointer! I also experimented just doing it the macOS way and using winpthread'spthread_getspecificbut this also suffers from the fact that we're in a public header and directly exposing pthreads in this way makes linking potentially awkward for user C code. The stub DLL for systhreads has to jump through a minor hoop to do this, but that seems acceptable for that one library. For Windows, winpthreads provides a nice abstraction for simpler code in most of the runtime, but I don't think it's a good idea for the fact we do this to be exposed in the API (the sync module, etc. hides the implementation of the primitives).Back in Windows 9x days, manually reading the TEB would have been utter madness - the 16 bit implementations of TLS have actually changed between versions of 16-bit Windows. However, since the very beginning of Windows NT, both the C compiler and loader have colluded on the "internal" format of the TEB in order to implement implicit TLS (with quite a few infamous limitations along the way, which is why GCC uses some emulation). A side-effect of this collusion is that the TLS layout in the TEB pretty much cannot change (but you of course read all 8 parts of Nynaeve and therefore have already read that!). So what I'm proposing here is actually what the C compiler should be doing (and, when MSVC gets resurrected, is possibly what the MSVC port will actually do, although I confess I'll be surprised if it actually works across DLL loading and flexdll). I have chosen to restrict the implementation to the first 64 TLS slots only (later in the TEB is a pointer to the other 1024 slots added with Windows Vista). This should be no problem at all for ocamlopt-compiled programs (I think there are 3 other TLS slots in use before this one gets allocated); some particularly strange "OCaml called from C" implementation should hopefully be able to call
caml_make_domain_state_keybefore allocating gazillions of TLS slots.Given the DLL hell which can arise on mingw-w64 (see all the fun in opam for manifesting the C++ mingw-w64 runtime DLLs), I hope that one slightly hairy macro definition and a bit of dirty pointer arithmetic is an acceptable cost. As it happens, the combined facts the the TLS slot is set using
TlsSetValueand then retrieved directly from the TEB coupled with quite how fundamentalCaml_stateis to the runtime means that this all gets quite severely tested just building the compiler, let alone running the testsuite!This PR is marked as Draft only to prevent merging before the FlexDLL PR is merged. This branch has been through precheck#778.