Conversation
|
I opened a discussion thread at #13356 for the details of the local storage API. Please discuss these there, I can easily reflect whatever changes we agree on in the current PR. |
8af5438 to
3519056
Compare
The intent is to have this field saved/restored when thread switching, so that it really acts as thread-local state as opposed to / in addition to the current domain-local state.
This very small refactor clarifies the responsibilities of sub-parts of the DLS implementation. In particular, the handling of parent-initialized keys can be cleanly separated, moved to the end of the module.
Suggested-by: Daniel Bünzli <daniel.buenzli@erratique.ch>
3519056 to
8c28595
Compare
|
Today I pushed two commits that were arguably missing from the PR (there was a "things missing from this PR" section in the PR description, which is now removed):
As far as I am concerned, the PR could be merged in its current state. |
f64b879 to
cea07bc
Compare
cea07bc to
7533fd4
Compare
|
We discussed this PR at the maintainer meeting today. There isn't a strong consensus in favor of offering both TLS and DLS in the long run, as is being proposed here (people remain doubtful that DLS has enough real-world use to justify central stdlib support), but there was a consensus on the idea of adding TLS and leaving DLS alone for now, especially if we can share code between them as proposed in this PR. |
|
Hi @gasche, this is amazing. Looking at it has been on my TODO list for a while. I'd love to help move it forward if there's any way I can help. From a user's perspective, it would be great to have a lower barrier to entry than working directly with this OCaml compiler branch. For example, say I start experimenting with c-cube/thread-local-storage to adapt my code to the TLS API. Would it then be possible to switch to your branch more or less transparently once my code is running well? |
|
The blocking factor (to me) appears to be to find a maintainer willing to review this change. The maintainers that are typically interested in multicore aspects of the runtime are @kayceesrk and @OlivierNicole. I suppose that @stedolan, @damiendoligez and @NickBarnes would also be fine reviewers but they are typically unavailable. |
Why not |
|
Using a submodule of |
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - native compiler support has to be reverted back, needs more investigation, vm_compute also needs work - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - native compiler support has to be reverted back, needs more investigation, vm_compute also needs work - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
We use the [thread-local-storage](https://github.com/c-cube/thread-local-storage) to store Rocq's imperative state on a per-thread basis. This enables multi-threading / multi-domain programming, and opens up many possibilities. Notes: - this is at RFC / WIP stage and has been done quickly, as to start testing and gather feedback - very careful review and testing is still needed, in particular to ensure `CRef` is not in scope where temporary `refs` have been created - `mutable` fields haven't been handled yet - parts that may not be multi-thread capable, such at coqtop or stm have not been addressed on their use of refs - moreover there could be places that rely on actually threads sharing a memory block :crossing_fingers: - native compiler support has to be reverted back, needs more investigation, vm_compute also needs work - There are out-of-summary refs that may interact poorly with this, c.f. rocq-prover#20328 - This PR actually touches many parts that could see cleanup and would make code quality better; we could even start to introduce the `CRef` discipline without activating thread local storage - Note Xavier comment about TLS on: ocaml/ocaml#11770 (comment) > Also, I'd like us to keep in mind that thread-local state, like > domain-local state, is a quick fix; avoiding global state entirely > is a better way to go about parallelism and concurrency. - Also note Gabriel Scherer work to upstream TLS into Ocaml: ocaml/ocaml#13355 Thanks to Guillaume Munch-Maccagnoni for help and advice, and to c-cube, Gabriel, and other folks making this happen in OCaml.
This PR is the third PR to propose thread-local storage, after
#12719 introduced TLS as a separate module from DLS, with its own implementation. I asked whether we should have TLS separate from DLS, there was a debate, with a consensus moving in the direction of replacing DLS with TLS wholesale.
#12724 replaced DLS with TLS, with mostly the existing implementation unchanged, but extra logic to save/restore TLS state on thread context switch. There was a lot of API discussion and wide-reaching name changes.
In the meantime the DLS implementation has changed quite a bit, due to the move to make it thread-safe (but not thread-local) in #12889. We also got more data on third-party usage of DLS ( see for example my inspection of Domain.at_exit use-cases in #12724 (comment) ).
I have decided to revisit the decision of replacing DLS by TLS. Previously I argued that having just one mechanism (1) is conceptually simpler for users, and (2) means less implementation works for OCaml maintainers; so we should optimize for the majority use-case (threads) and sacrifice the other (domains). In the present PR I try an alternative to alleviate both concerns, which is to expose both mechanisms but with the same interface and the same implementation. I believe that this can provide conceptual consistency, at a manageable cost in implementation complexity.
The present PR is in a functional state, its history is clean, it is ready for review and could be integrated quickly if we like the idea.
What is in this PR
Runtime support for TLS storage, alongside the existing support for DLS storage (most of the code is reused/adapted from the previous PRs).
A new internal stdlib module,
camlinternalStorage, that exports both TLS and DLS modules with, basically, the same interface that Domain.DLS has today. Internally this is defined using a functor over the state-access primitives (which differ for DLS and TLS), which is instantiated twice. I massaged the code to ensure that the functor application does not hurt the performance of DLS.get, even on non-flambda compilers.A new user-exposed stdlib module
Thread_local_storagethat exports TLS, reused from the previous PRs. DLS remains available through Domain.DLS.Some tweaks to the "storage" interface that come from review discussions in the previous PRs, in particular the type
'a trather than'a keyand the use ofmakeinstead ofnew_key. The older names remain available in DLS for compatibility, but are not exposed in TLS.Simple unit tests for TLS, with and without the threads runtime linked. (But note that TLS is not used in the standard library for now, so we get substantially less test coverage that we had in make dls thread local #12724.)
What is not in this PR
DLS.getrelies on a compiler primitive supported by the compiler backend. I have not implemented this forTLS.get, which is just a call to a C function. This is good enough for a first iteration of the PR; eventually we want a primitive for performance, but it makes the change more invasive and can come in a later PR.I believe that a satisfying solution for TLS should expose a form of
Thread.at_exitfunction, so that we can release thread-local resources. No such function is currently implemented, so the picture is incomplete. ImplementingThread.at_exitin a way that works well with and without thethreadslibrary linked is not easy, and better left for another PR. I think that my separate PR move the systhreads C code from otherlibs/systhreads to runtime/ #13348 would help moving in this direction (sketch: defineCamlinternalThreadthat implementsat_exitdifferently whether or not the threads machinery has been initialized, and exposeat_exitfromThread_local_storagedirectly.)make dls thread local #12724 (necessarily) includes a move from DLS to TLS in the OCaml standard library, in particular for Format buffers and the global Random state. No such change is included in the present PR, in part to keep it smaller and in part due to the lack of
Thread.at_exit(see point (2) above).I remain unsatisfied with the new API. I think that "foo-local storage key" should maybe be called a "foo-local variable" or a "foo-local reference", some conceptual change is missing to make the API more consistent with OCaml users -- but maybe we want to expose or at least document both views to remain familiar for domain experts.
I propose to leave this outside the scope of the current PR, and instead open a separate discussion Reinventing the DLS API #13356 to hash out API proposals. Otherwise this has a good chance to monopolize discussions and prevent discussing the broader picture (one local storage module, two?).
If we converge on a satisfying API proposal quickly, I can update the PR to reflect it (I already tried to incorporate the consensus from previous PRs.) Otherwise an interface change can perfectly be done in a later PR. The functorized design makes it easy to evolve DLS and TLS in a consistent way.
What to look at, review and discuss
I believe that the main question is whether we like the two-modules solution, and whether we want to go for it in time for OCaml 5.3 (feature freeze date rumored to be mid-August, so that would require swift consensus-building).
If you want to look at the two user-exposed APIs, they are:
Domain.DLS: https://github.com/gasche/ocaml/blob/thread-local-storage/stdlib/domain.mli#L106-L174Thread_local_storage: https://github.com/gasche/ocaml/blob/thread-local-storage/stdlib/thread_local_storage.mli(but: I'll open a separate issue to discuss the details of the API design.)
If you want to look at implementation:
static inlinefunctions:https://github.com/gasche/ocaml/blob/thread-local-storage/runtime/domain.c#L2130-L2200
If you want to review the PR in full, I would suggest reviewing commit-by-commit.