Simplifications and fixes to multicore systhreads implementation#11271
Simplifications and fixes to multicore systhreads implementation#11271gasche merged 4 commits intoocaml:trunkfrom
Conversation
|
IIUC the first commit fixes a bug, the remaining ones are (welcome!) re-workings in systhreads. If that's right, please could the first commit be separated as I think that's then definitely a 5.0 change, where the rest might have to be 5.1? |
|
On whether this is necessary for 5.0. The "bug" part fixed by the first commit is also fixed by the parts of the PR series on reworking async callbacks for multicore that are already necessary for 5.0 (by not calling into OCaml inside caml_thread_yield). So I would not say that the first one is necessary for 5.0 by itself (unless calling caml_thread_yield in the backup thread causes more issues than I know). So this PR is not in itself necessary for 5.0. On the other hand, if people like the approach at #11272 which depends on this PR, then it should be in for 5.0 (API change), in which case this PR is also necessary for 5.0. |
|
Personally I think of such "documentation / cleanups / refactoring" PRs (I wrote a couple myself) as a continuation of the review effort for the multicore merge, which has happened in an unusual way with several "big picture" discussions but much less "looking at the details" review work than typical PRs. So: in scope for "whenever before or after the merge", including now -- and probably a better long-term use of our collective time than giving up on documentations and rushing to finish new feature X or Y in time for 5.0. But of course it's a matter of finding reviewers able to do a review. Here I wonder if @Engil would be available? (Or maybe @ctk21?) We need someone (else than @gadmm) who knows how threads work in Multicore. Personally I don't; is there some clear documentation in the implementation already? |
I do not claim to be an expert; but it is fairly similar to the pre-multicore systhreads implementation, with some subtlety in the interaction with the backup thread. |
otherlibs/systhreads/st_stubs.c
Outdated
| void caml_thread_interrupt_hook(void) | ||
| { | ||
| /* Do not attempt to yield from the backup thread */ | ||
| if (caml_bt_is_in_blocking_section()) return; |
There was a problem hiding this comment.
CI failure was hard to reproduce, but when I managed to do it, the backtrace pointed fingers at this test not being enough to prevent caml_thread_yield from running from the backup thread. Do you have ideas (@Engil? @ctk21?) for a better way of excluding the backup thread here?
(To reproduce the bug reliably, it turns out that you either need an Intel CPU, or to put load on your AMD CPU and be patient. I'd be curious at an explanation of the differences between processors.)
There was a problem hiding this comment.
@kayceesrk The CI failure at multicore_systhreads.ml appears to come from here, let me know if you have an idea.
There was a problem hiding this comment.
Very nice catch!
Indeed, this is a problematic situation, and after investigating for a bit I figured why this solution does not work.
The reason is that, there can be a timing where the mutator thread can request re-entry into the runtime system while the backup thread is processing interrupts.
backup_thread_message is then set to BT_ENTERING_OCAML, and the backup thread will yield control once it is done processing the interrupts.
However, in this scenario, it means that caml_bt_is_in_blocking_section is not a reliable way to check the invariant of whether we are running from the backup thread or not.
Likely we need a separate way to check for this invariant (that we do not have right now). Maybe a thread local value to check?
There was a problem hiding this comment.
Addendum: caml_bt_is_in_blocking_section was not meant as a way to check whether we are the backup thread or not.
The purpose is rather to check around the masterlock what is the proper course of action. (ie. is there a thread waiting to run? should we signal the backup thread if not?)
There was a problem hiding this comment.
Thanks for investigating. Naive question (I did not look at the code), is it not possible to test backup_thread_message for the two values? Otherwise a new thread-local variable is fine too.
There was a problem hiding this comment.
@gadmm I don't think this would work: because either the backup thread is processing (IN_BLOCKING_SECTION), or is about to yield, or already waiting on the domain lock (ENTERING_OCAML). (and then there's also BT_TERMINATING and BT_INIT for lifecycle purposes.)
This would probably always return true for any thread calling it.
I think we do need to expose a way to tell if a thread is a backup thread, though there may be a nicer solution that eludes me.
|
I'd be more inclined to review if the motivations were given for some of the changes. For example "Use a thread-local variable this_thread instead of explicit TLS; prefer using this_thread over Current_thread." doesn't tell me why a thread-local variable is preferable: is that for performance? safety (w.r.t. signals for example)? portability?. |
|
The change from Merging In this file, the accesses to |
|
(Thanks for asking) |
|
Thanks for the review @kayceesrk! I think you have inadvertently written your comments in the wrong thread (#11272). I am a bit busy for the coming days but I will address your comments in the present PR. |
|
Sorry for the long delay answering this, @gadmm, I was off for a while. |
This behaviour was likely unintended given that caml_thread_yield runs signal handlers, which can execute arbitrary OCaml code.
1. Use a thread-local variable this_thread instead of explicit pthread TLS. This lets us avoid circonvolutions. 2. Prefer using this_thread over Active_thread (trivial). 3. Never access Caml_state while the domain lock is not held. The main motivation is 3. (in preparation of a subsequent PR). But the fact that this is done by simplifying the code is nice too.
abbysmal
left a comment
There was a problem hiding this comment.
I just did a first pass of review.
I feel like the bug fix is nice, and I understand the overall rationale behind some simplifications.
However at times I feel like the code is less explicit than it used to be.
For instance, the this_thread and Current_thread situation is a bit hard to understand, I can't see a consistency in when to use one over the other (?).
Also some of these simplifications I am not super confident about and will need a deeper look into the corner cases removed by this PR.
I commented in a few places I was not sure I understood well.
| Current_thread = new_thread; | ||
| Tick_thread_running = 0; | ||
|
|
||
| st_masterlock_init(&Thread_main_lock); |
There was a problem hiding this comment.
Why was the initialization of the lock moved toward the end?
Is it a style decision?
There was a problem hiding this comment.
It is necessary because you need this_thread to be initialized to use the macro. I have made it better in the new version.
|
|
||
| /* If no other OCaml thread remains, ask the tick thread to stop | ||
| so that it does not prevent the whole process from exiting (#9971) */ | ||
| if (All_threads == NULL) caml_thread_cleanup(Val_unit); |
There was a problem hiding this comment.
Was this really dead code?
There was a problem hiding this comment.
IIUC, this follows from the comment /* The main domain thread does not go through [caml_thread_stop]. There is always one more thread in the chain at this point in time. */ and the assert above. Let me know if I missed something.
But this is surprising indeed, because this means that the tick thread is never stopped. I have removed this commit and I will reintroduce it in a PR that also fixes stopping the tick thread during domain stop.
bd753d4 to
cd4aa47
Compare
|
I have found a few more fixes to do so I have split the PR in two (i.e. I removed some commits and will reintroduce them in another PR). I have renamed Current_thread as proposed by @kayceesrk. I have also checked that no other function is unexpectedly callable from the backup thread. I have found a nice solution to caml_thread_yield, please have a look. Please accept a small tooling documentation commit, it is unrelated except for the fact that debugging the issue with caml_thread_yield had me to try and find one time too much the information which I wrote in the commit. |
[Current_thread -> Active_thread]
One commit was split in two so that should make it easier. I have removed other commits to deal with later. |
|
This PR had a first round of review by @Engil and @kayceesrk, the comments have been addressed (I think). The PR has been simplified in the hope that we could converge quickly and move to the second PR I mentioned (which I think fixes the turning-off of the tick threads) as well as #11272 which depends on this PR. Is that ok with you? |
| /* If no other OCaml thread remains, ask the tick thread to stop | ||
| so that it does not prevent the whole process from exiting (#9971) */ | ||
| if (All_threads == NULL) caml_thread_cleanup(Val_unit); | ||
| if (Active_thread == NULL) caml_thread_cleanup(Val_unit); |
There was a problem hiding this comment.
Given the earlier comment about the main thread not going through caml_thread_stop and the assertion that there is at least one other thread, is it possible at all that Active_thread becomes NULL here?
kayceesrk
left a comment
There was a problem hiding this comment.
The PR fixes a bug and includes a couple of improvements (avoiding the use of explicit pthread TLS in favour of thread-local variable). While I would have liked each of these to be separate PRs, each one of these improvements is small enough and split into individual commits that they can go in as they are.
I just have one open question #11271 (review).
|
Thanks, let's discuss your question in the next PR. CI is green, no change entry is needed. |
|
Thanks, could you please cherry-pick on 5.0? |
|
I would normally think twice about merging a subtle PR to the release branch without going through our glorious release manager @Octachron, but this is 5.0-the-alpha-release and reviewers agreed that this fixes a bug, so let's go for it. |
Simplifications and fixes to multicore systhreads implementation (cherry picked from commit 475dc68)
|
That may have been unwise. Jenkins CI is rather unhappy with these commits (https://ci.inria.fr/ocaml/job/main/3090/): There's also a boatload of Windows failures, but possibly unrelated. Time to debug... |
|
Yep, both 5.0 and trunk appear to fail CI. (The testsuite passes with 5.0 on my machine, i checked before pushing). I will revert the cherry-pick commit from 5.0 for now to un-break that CI. I don't know if we should revert from trunk as well. |
This reverts commit be33545. This change was found to break the CI: https://ci.inria.fr/ocaml/job/main/3091/
|
@gadmm this PR appears to break bytecode tests (with a segfault) under some architectures including Alpine-x86_64 (that one is easy to test from Docker I suppose). |
|
I just pushed 373c941 to I'd suggest we revert this on |
|
I agree with reverting the change on 5.0 and trunk. I’ll study the failure early next week if @gadmm does not get to it earlier. |
|
For what it's worth, here is what GDB says about the Alpine segfault in fib_thread: Same context for multicore_lifecycle. |
This reverts commit 475dc68, reversing changes made to 6f9a0a0. The PR appears to break the CI, see https://ci.inria.fr/ocaml/job/main/3090/
|
Thanks for the feedback, the PR is now reverted from both 5.0 and trunk. Once the bug are found, we need to open another PR (github won't less us "reopen" a reverted PR), and to run (I wonder why the Github CI did not observe the issue. Once we understand what was at fault, maybe we will be able to strengthen the CI.) |
|
I am looking into it. I could bisect the failure on Alpine to the third commit. I opened #11390 to run the INRIA CI on it the other commits and check if the other failures come from the same commit. Could someone please run |
Here are various fixes and simplifications to the multicore systhreads implementation (cc @Engil).
From the commit logs:
Do not run
caml_thread_yieldfrom the backup thread.This behaviour was likely unintended given that
caml_thread_yieldrunssignal handlers, which can execute arbitrary OCaml code.
Simplifications to multicore systhreads
Use a thread-local variable
this_threadinstead of explicit TLS;prefer using
this_threadoverCurrent_thread.Remove redundant
All_threadscircular list and remove a FIXME.Never access
Caml_statewhile the domain lock is not held.The main motivation is 3. (in preparation of #11272). But the
fact that this is done by simplifying the code is nice.
Remove some dead code
Here, we are never the last thread of the domain.
Fix some incorrect comment and restore some comments from pre-multicore systhreads implementation.
This is best reviewed commit-per-commit. No change entry needed.