Conversation
sadiqj
left a comment
There was a problem hiding this comment.
The motivation for this seems clear and the implementation looks fine. Only a minor comment about the naming in st_stubs.c.
otherlibs/systhreads/st_stubs.c
Outdated
| caml_thread_restore_runtime_state(); | ||
| } | ||
|
|
||
| static int thread_has_lock(void) |
There was a problem hiding this comment.
Is there a reason this isn't consistent with the other hooks' default implementations and prefixed with caml_?
There was a problem hiding this comment.
I did not realise there was such a convention, to me this was just a static function, so without a prefix being needed.
|
Thanks for the review. It turns out that for ocaml-boxroot, we cannot afford the overhead of using something else than a boolean in the TLS (which lets the lock status be obtained with a single instruction on x86-64), or at least something not much more expensive. So I would like to propose an implementation along these lines, without changing the rationale and public API you have already reviewed (it is fine for us to access a boolean TLS variable protected with Now I am in the rabbit hole of TLS implementations on MacOS. I am looking for the best way to implement it there (and I have no access to a Mac, even Godbolt misses it). I read that C++11 |
a29634c to
c7d7d25
Compare
|
Here is the dead-simple implementation that ticks all the boxes for me (except MacOS implementation for now, which CI will probably complain about). |
|
MacOS CI passes, so I un-draft this PR. It would be nice if someone familiar with the TLS restrictions with MacOS could please have a look, though (to explain why this seems to work but for |
|
I find this PR perplexing. Neither POSIX threads nor the Win32 API make it easy to check "do I own this mutex already?". Maybe it's a weakness of these APIs, but maybe it's a sign that this query is rarely needed, or even that it is an "anti-pattern" that should not be encouraged. Concerning the per-domain master lock, I agree there could be a need in C stub functions to acquire it even if we hold it already. This could be achieved by using a recursive lock instead of an "error-checking" lock. Recursive locks are common enough (e.g. in Java) to not be an anti-pattern. But the need here (in the ocaml-boxroot project) seems to be different and to call for completely different code paths depending on whether the master lock is owned by the calling thread or not. That's a completely new situation to me. Could we have more explanations? |
|
|
To explain the boxroot situation differently, you could imagine that all deallocations are initially "remote", and that "local" deallocations are merely introduced as an optimisation for the common case. This avoids seeing it as a "completely different" code path. |
|
A possible alternative is to strengthen the runtime hooks to let us implement (an approximation of) the feature on our side (this would involve making more hooks thread-safe and making systhreads better respect other people's hooks, but it seems doable). |
I think this is a valuable idea. Long-term, I think we will push to have boxroot (or something simliar) inside the runtime, but we are also learning that the ability to experiment with runtime extensions through hooks is quite nice -- it lets people deploy a runtime extension in a much shorter timeframe, and provide interesting experimental data for runtime evolutions. |
|
Thanks @gadmm for all the explanations. I need more time to chew on all this, esp. since I'm neck deep in other things. Other opinions are most welcome. |
To be clear, the solution I am thinking about might be ok-ish for boxroot, but will not constitute a solution for the general feature under discussion. On the plus side, it would let boxroot have the best possible emitted code for TLS. |
|
Looking into jemalloc's implementation, they use an |
|
+1 for a frank discussion of TLS in macOS as of today, and removal of macOS-specific cruft that may have been needed in the past but may no longer be needed. |
|
Here is the summary of the issue: ocaml-multicore/ocaml-multicore#57 (comment). The corresponding LLVM bug report is here: llvm/llvm-project#26195 which has been closed without a fix. FWIW, on my machine, the same example in the LLVM bug report does not fail if % clang --version
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: x86_64-apple-darwin21.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
% /usr/bin/xcodebuild -version
Xcode 13.3.1
Build version 13E500a |
|
The LLVM bug report sounds more like "maybe fixed in a new version". It seems that the So benchmarks are in order before one can simplify the implementation using Interestingly, they have reserved a couple of static key values for system libraries which have an optimised inline implementation of pthread_getspecific. So it is possible to get the efficient 1-instruction implementation using the segment register, but only if you work at Apple. For the current PR I think I will propose a version using |
c7d7d25 to
46d9f88
Compare
|
Here is a version that solves TLS issues on MacOS. All we need now is an agreement on the design and motivation. |
This function can be called from any C thread and returns true if the thread belongs to a domain and holds the lock of its domain. It necessary for implementing C functions which are generic on whether the domain lock is held or not. In case `caml_thread_has_lock` returns false, the programmer can for instance: - report an error - try to acquire the domain lock - do something else (e.g. delay an operation until the lock is acquired) One application is for the OCaml-Rust FFI: in some situations, it is not possible to enforce with static typing that some function runs only when the runtime lock is held (notably destructors). This function lets us implement: - a dynamic check of whether the lock is held (which plays well with the lifetime-based GC safety), - a way to acquire the domain lock only if it is not already held. In OCaml 4, it is possible to implement this feature on the programmer-side (using internals) by customizing the enter/leave runtime hooks. Alas, this strategy does not work in OCaml 5. One would have to somehow install hooks after systhread is loaded, but before the second domain is spawned (after which hooks can no longer be modified), and this is very hard to do. So we need a native solution for OCaml 5.
46d9f88 to
2f8e424
Compare
|
An alternate implementation is proposed at #11272. |
|
Since this alternate implementation is merged, I am closing this PR. Thank you for the discussion. Also, the result of the discussion about TLS on MacOS is that |
From the commit log:
The function
caml_thread_has_lockcan be called from any C thread and returns true if the thread belongs to a domain and holds the lock of its domain.It necessary for implementing C functions which are generic on whether the domain lock is held or not. In case
caml_thread_has_lockreturns false, the programmer can for instance:One application is for the OCaml-Rust FFI: in some situations, it is not possible to enforce with static typing that some function runs only when the runtime lock is held (notably destructors). This function lets us implement:
In OCaml 4, it is possible to implement this feature on the programmer-side (using internals) by customizing the enter/leave runtime hooks. Alas, this strategy does not work in OCaml 5. One would have to somehow install hooks after systhread is loaded, but before the second domain is spawned (after which hooks can no longer be modified), and this is very hard to do. So we need a native solution for OCaml 5.
Use-cases
We would use it as described above in the OCaml-Rust FFI. We suspect that this function has not appeared before in the C FFI because of a difference in code scale, and Rust poses unique challenges (the type system does not always lets us track the lock ownership, notably in destructors).
We would like to use it in the implementation of ocaml-boxroot to implement lock-free deallocation. If the lock is held, deallocation of a boxroot from the same domain is immediate and cheap, otherwise it is a "remote" deallocation involving a lock-free auxiliary freelist.
An old discussion already inquired about the existence of this function: https://discuss.ocaml.org/t/determining-if-the-runtime-system-is-currently-acquired-from-c/6934.
Notes
For OCaml 4, we can use the solution sketched above, but I can also backport the feature upon popular request.
No pressure, but having it in OCaml 5.0 would avoid a gap in the availability of the feature, assuming we implement it for OCaml 4 using internals.
This implementation adds a field to the domain state. It is a simple implementation. I experimented with an implementation that recycles internals and avoids a new field (see gadmm@4cff734b5), but the strategy did not work without systhreads. I just thought of another implementation that stores the info in a thread-local variable, not sure which one is best (edit: it's more complicated due to OSX TLS issues and the separate compilation of systhreads).
(cc @gasche, @jhjourdan)