Skip to content

move the systhreads C code from otherlibs/systhreads to runtime/#13348

Open
gasche wants to merge 2 commits intoocaml:trunkfrom
gasche:thread-in-runtime
Open

move the systhreads C code from otherlibs/systhreads to runtime/#13348
gasche wants to merge 2 commits intoocaml:trunkfrom
gasche:thread-in-runtime

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Aug 1, 2024

One problem we have in the discussion of the domain-local-storage vs. thread-local-storage discussion is that we cannot write code that relies on the Thread module in the runtime or stdlib, because that module depends on Unix and therefore needs to stay in otherlibs.

If we look at the Thread interface (otherlibs/systhreads/thread.mli), we see that it has two halves, one that is about thread creation and thread operation, whose implementation only depends on pthreads, not Unix, and another about exposing Unix stuff for threads (wait, select, signals, stuff), which does depend on Unix. This suggests that we should be able to split the Thread implementation in two, a camlinternalThread module in the stdlib which does not depend on Unix, and the rest in otherlibs/systhreads.

The present PR implements a first step in this direction: it moves the runtime code for threads from otherlibs/systhreads to runtime/, to become part of the main OCaml runtime. This is just fine as the OCaml 5 runtime already depends on pthreads anyway.

This is "very easy", which means that it took me 90 minutes of battling with the build system.

How

I had to deal with the following issues.

  1. The systhreads code plays weird tricks, there is a st_stubs.c file that includes either st_posix.h or st_unix.h, they both include st_pthreads.h (while defining some functions that st_pthreads rely on), and none of those .h files are actually public headers (there is one, caml/threads.h), my guess is that the authors just felt that including .c files would be too naughty. And then the Makefile code to compute dependencies over this is above my paygrade.

    I mirrored this exact same structure inside runtime/. I had to tell Makefile.common that there are header files in runtime/*.h now, not just in runtime/caml/*.h. (I considered turning st_posix.h and st_win32.h into .c files instead, but I suspect that this would have broken the dependency computation in horrible ways.)

  2. The names st_foo.[ch] made sense inside a systhreads/ directory (hint: 'st' is for 'systhreads'), but they become obscure inside runtime/, so I had to perform a ritual sacrifice to the Gods of Bikeshedding and their oracle suggested the following rename: otherlibs/systhreads/st_stubs.c is now runtime/threads.c, the files otherlibs/systhreads/st_{posix,win32,pthreads}.h are now runtime/threads_{posix,win32,pthreads}.h, and otherlibs/systhreads/caml/threads.h is just runtime/caml/threads.h. The functions in those files, of course, keep their st_ prefix when they have one.

  3. The code in otherlibs/systhreads/thread.ml has external declarations for some caml_thread_* functions, some of which were registered as CAMLprim and some were not. You would think that it's just a matter of adding CAMLprim on those functions, but they are defined in runtime/threads_{posix,win32}.h and the primitive-listing machinery only inspects C files. The solution is to add forward CAMLprim declarations for these primitives in threads.c.

  4. I naively thought that otherlibs/systhreads would become massively simpler after the change, due to now being a pure-OCaml library. But otherlibs/systhreads builds C archives called libthreads{,nat}.a, and who knows what third-party tools and build scripts will break if this file suddenly wasn't installed anymore, right? Mathematicians would naturally think of building an archive from the empty set of object files, but this is broken on most toolchains, so instead I created a dummy C file (st_dummy.c) to build instead of st_stubs.c, which now populates the libthreads* archives.

    I suppose that some of the logic (maybe all) in otherlibs/systhreads/Makefile could be simplified now that there is morally no C code anymore, but this is not my expertise and I decided to leave it outside the scope of the present PR.

The future

Two things can be considered if this PR is merged:

  • We can move a subset of the Thread module to camlinternalThread, which could be useful, for example, to expose thread-local storage in the standard library. I am planning to work on this next.
  • We can simplify the C runtime code for threads, now that it sits within the runtime proper, to stop acting like a third-party citizen that needs to register hooks to do anything, adding some fragility. This could eventually give us a noticeably simpler result, fix some corner cases of the code that are probably subtly broken by now, etc. But these will also be subtle changes that will need to be written and reviewed carefully. I am not planning to work on this for now.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Aug 1, 2024

A further comment regarding "simplifying the runtime code".

Currently the C runtime code for threads contains a stateful initialization function that modifies the rest of the runtime state to now "run with threads enabled". This initialization function, caml_thread_initialize, is called by a let () = ... sentence in otherlibs/systhreads/thread.ml (the Thread module), so it happens when the otherlibs/ Thread module is linked within a user program, at the point where the OCaml module gets evaluated.

This situation does not change with this PR. The runtime code for threads is now located with the rest of the runtime code, but it does not get initialized by default when linking the runtime, only when the otherlibs/ module is loaded.

In the future we could simplify the whole thing if we decided to initialize the thread machinery unconditionally, as a first step towards a tighter integration with the rest of the runtime code. But such a change should also be tested carefully with respect to performance and correctness, given that it is less heavily measured and tested than the runtime without threads initialized.

@gasche gasche force-pushed the thread-in-runtime branch 3 times, most recently from 638f1fb to ca717a3 Compare August 1, 2024 13:53
gasche added 2 commits August 1, 2024 16:17
I had to deal with the following issues.

1. The systhreads code plays weird tricks, there is a `st_stubs.c` file that
   includes either `st_posix.h` or `st_unix.h`, they both include
   `st_pthreads.h` (while defining some functions that `st_pthreads` rely on),
   and none of those .h files are actually public headers (there is one,
   caml/threads.h), my guess is that the authors just felt that including .c
   files would be too naughty. And then the Makefile code to compute
   dependencies over this is above my paygrade.

   I mirrored this exact same structure inside `runtime/`. I had to tell
   Makefile.common that there are header files in `runtime/*.h` now, not just in
   `runtime/caml/*.h`. (I considered turning st_posix.h and st_win32.h into .c
   files instead, but I suspect that this would have broken the dependency
   computation in horrible ways.)

2. The names `st_foo.[ch]` made sense inside a `systhreads/` directory (hint:
   'st' is for 'systhreads'), but they become obscure inside `runtime/`, so I
   had to perform a ritual sacrifice to the Gods of Bikeshedding and their
   oracle suggested the following rename: `otherlibs/systhreads/st_stubs.c` is
   now `runtime/threads.c`, the files
   `otherlibs/systhreads/st_{posix,win32,pthreads}.h` are now
   `runtime/threads_{posix,win32,pthreads}.h`, and
   `otherlibs/systhreads/caml/threads.h` is just `runtime/caml/threads.h`. The
   functions in those files, of course, keep their `st_` prefix when they have
   one.

3. The code in otherlibs/systhreads/thread.ml has `external` declarations for
   some `caml_thread_*` functions, some of which were registered as CAMLprim and
   some were not. You would think that it's just a matter of adding CAMLprim on
   those functions, but they are defined in `runtime/threads_{posix,win32}.h`
   and the primitive-listing machinery only inspects C files. The solution is to
   add forward `CAMLprim` declarations for these primitives in threads.c.

4. I naively thought that otherlibs/systhreads would become massively simpler
   after the change, due to now being a pure-OCaml library. But
   otherlibs/systhreads builds C archives called libthreads{,nat}.a, and who
   knows what third-party tools and build scripts will break if this file
   suddenly wasn't installed anymore, right? Mathematicians would naturally
   think of building an archive from the empty set of object files, but this is
   broken on most toolchains, so instead I created a dummy C file (st_dummy.c)
   to build instead of st_stubs.c, which now populates the libthreads* archives.

   I suppose that some of the logic (maybe all) in otherlibs/systhreads/Makefile
   could be simplified now that there is *morally* no C code anymore, but this
   is not my expertise and I decided to leave it outside the scope of the
   present commit.
Comment on lines +1 to +6
/* 'otherlibs/threads' used to contain C support code for the OCaml
Thread module, which has since been migrated into the OCaml runtime itself.

To avoid breaking build scripts all over the place, we want to keep
building a C library archive, even if we don't have C code to
compile anymore. Windows and OSX don't really support creating
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* 'otherlibs/threads' used to contain C support code for the OCaml
Thread module, which has since been migrated into the OCaml runtime itself.
To avoid breaking build scripts all over the place, we want to keep
building a C library archive, even if we don't have C code to
compile anymore. Windows and OSX don't really support creating
/* 'otherlibs/systhreads' used to contain C support code for the OCaml
Thread module, which has since been migrated into the OCaml runtime itself.
To avoid breaking build scripts all over the place, we want to keep
building a C library archive, even if we don't have C code to
compile anymore. Windows and macOS don't really support creating

@MisterDA
Copy link
Copy Markdown
Contributor

MisterDA commented Aug 2, 2024

Thanks for all the detailed explanations. This is looking good, I have two questions:

  1. Should the signal and wait code (basically, threads_posix.h) stay in systhreads? They're not required by the runtime, and you wouldn't have to carry the definition of the primitives.

  2. I had to tell Makefile.common that there are header files in runtime/*.h now, not just in runtime/caml/*.h

    There is already runtime/sync_posix.h for which there is no mention in the Makefiles. How do these new headers differ?

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Aug 2, 2024

Should the signal and wait code (basically, threads_posix.h) stay in systhreads? They're not required by the runtime, and you wouldn't have to carry the definition of the primitives.

This is a good idea, I will give it a try on my next iteration.

There is already runtime/sync_posix.h for which there is no mention in the Makefiles. How do these new headers differ?

I don't understand the dependency-computation logic in the Makefile. One possibility is that the handling of sync_posix.h is currently slightly wrong, but it works by chance. Another possibility is that the current dependencies of all .o files on all RUNTIME_HEADERS is unnecessary (I can see that caml/foo.h does end up listed in .dep/runtime/bar.b.d), and in that case my efforts to also include .h files from runtime/ there are also unnecessary. Or maybe something else! For now I think that the safest choice is to imitate what is already in the Makefile, and ensure that these new headers are listed in RUNTIME_HEADERS.

(I checked that adding things in RUNTIME_HEADERS does not make them installed; the installation targets have their own caml/*.h glob.)

@xavierleroy
Copy link
Copy Markdown
Contributor

Should the signal and wait code (basically, threads_posix.h) stay in systhreads?

This is an excellent suggestion, even more so given that the implementation of the signal-related Thread function can be moved to Unix, see #13429 .

Further simplifications are possible on top of #13429 so that the C code for the Thread library is a single file, silencing @gasche 's moaning about file names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants