Skip to content

Simplifications and fixes to multicore systhreads implementation#11271

Merged
gasche merged 4 commits intoocaml:trunkfrom
gadmm:systhread_simpl_and_fixes
Jul 1, 2022
Merged

Simplifications and fixes to multicore systhreads implementation#11271
gasche merged 4 commits intoocaml:trunkfrom
gadmm:systhread_simpl_and_fixes

Conversation

@gadmm
Copy link
Copy Markdown
Contributor

@gadmm gadmm commented May 22, 2022

Here are various fixes and simplifications to the multicore systhreads implementation (cc @Engil).

From the commit logs:


Do not run caml_thread_yield from the backup thread.

This behaviour was likely unintended given that caml_thread_yield runs
signal handlers, which can execute arbitrary OCaml code.


Simplifications to multicore systhreads

  1. Use a thread-local variable this_thread instead of explicit TLS;
    prefer using this_thread over Current_thread.

  2. Remove redundant All_threads circular list and remove a FIXME.

  3. Never access Caml_state while the domain lock is not held.

The main motivation is 3. (in preparation of #11272). But the
fact that this is done by simplifying the code is nice.


Remove some dead code

Here, we are never the last thread of the domain.


Fix some incorrect comment and restore some comments from pre-multicore systhreads implementation.


This is best reviewed commit-per-commit. No change entry needed.

@gadmm gadmm changed the title Systhread simpl and fixes Simplifications and fixes to multicore systhreads implementation May 22, 2022
@dra27
Copy link
Copy Markdown
Member

dra27 commented May 23, 2022

IIUC the first commit fixes a bug, the remaining ones are (welcome!) re-workings in systhreads. If that's right, please could the first commit be separated as I think that's then definitely a 5.0 change, where the rest might have to be 5.1?

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented May 23, 2022

On whether this is necessary for 5.0. The "bug" part fixed by the first commit is also fixed by the parts of the PR series on reworking async callbacks for multicore that are already necessary for 5.0 (by not calling into OCaml inside caml_thread_yield). So I would not say that the first one is necessary for 5.0 by itself (unless calling caml_thread_yield in the backup thread causes more issues than I know). So this PR is not in itself necessary for 5.0. On the other hand, if people like the approach at #11272 which depends on this PR, then it should be in for 5.0 (API change), in which case this PR is also necessary for 5.0.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 23, 2022

Personally I think of such "documentation / cleanups / refactoring" PRs (I wrote a couple myself) as a continuation of the review effort for the multicore merge, which has happened in an unusual way with several "big picture" discussions but much less "looking at the details" review work than typical PRs. So: in scope for "whenever before or after the merge", including now -- and probably a better long-term use of our collective time than giving up on documentations and rushing to finish new feature X or Y in time for 5.0.

But of course it's a matter of finding reviewers able to do a review. Here I wonder if @Engil would be available? (Or maybe @ctk21?) We need someone (else than @gadmm) who knows how threads work in Multicore. Personally I don't; is there some clear documentation in the implementation already?

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented May 23, 2022

Personally I don't; is there some clear documentation in the implementation already?

I do not claim to be an expert; but it is fairly similar to the pre-multicore systhreads implementation, with some subtlety in the interaction with the backup thread.

void caml_thread_interrupt_hook(void)
{
/* Do not attempt to yield from the backup thread */
if (caml_bt_is_in_blocking_section()) return;
Copy link
Copy Markdown
Contributor Author

@gadmm gadmm May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure was hard to reproduce, but when I managed to do it, the backtrace pointed fingers at this test not being enough to prevent caml_thread_yield from running from the backup thread. Do you have ideas (@Engil? @ctk21?) for a better way of excluding the backup thread here?
(To reproduce the bug reliably, it turns out that you either need an Intel CPU, or to put load on your AMD CPU and be patient. I'd be curious at an explanation of the differences between processors.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kayceesrk The CI failure at multicore_systhreads.ml appears to come from here, let me know if you have an idea.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice catch!

Indeed, this is a problematic situation, and after investigating for a bit I figured why this solution does not work.
The reason is that, there can be a timing where the mutator thread can request re-entry into the runtime system while the backup thread is processing interrupts.

backup_thread_message is then set to BT_ENTERING_OCAML, and the backup thread will yield control once it is done processing the interrupts.

However, in this scenario, it means that caml_bt_is_in_blocking_section is not a reliable way to check the invariant of whether we are running from the backup thread or not.

Likely we need a separate way to check for this invariant (that we do not have right now). Maybe a thread local value to check?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addendum: caml_bt_is_in_blocking_section was not meant as a way to check whether we are the backup thread or not.
The purpose is rather to check around the masterlock what is the proper course of action. (ie. is there a thread waiting to run? should we signal the backup thread if not?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating. Naive question (I did not look at the code), is it not possible to test backup_thread_message for the two values? Otherwise a new thread-local variable is fine too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gadmm I don't think this would work: because either the backup thread is processing (IN_BLOCKING_SECTION), or is about to yield, or already waiting on the domain lock (ENTERING_OCAML). (and then there's also BT_TERMINATING and BT_INIT for lifecycle purposes.)
This would probably always return true for any thread calling it.

I think we do need to expose a way to tell if a thread is a backup thread, though there may be a nicer solution that eludes me.

@xavierleroy
Copy link
Copy Markdown
Contributor

I'd be more inclined to review if the motivations were given for some of the changes. For example "Use a thread-local variable this_thread instead of explicit TLS; prefer using this_thread over Current_thread." doesn't tell me why a thread-local variable is preferable: is that for performance? safety (w.r.t. signals for example)? portability?.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented May 23, 2022

this_thread is more efficient than the previous Current_thread which goes through a Caml_state->id indexed table. (But I won't claim the performance difference is measurable.) I guess the circonvolution was due to the fact that accessing the explicit TLS was syntactically heavy. I imagine that using explicit TLS was inspired by the previous implementation and perhaps previous problems with thread_local on MacOS, but here using a static thread_local is just fine for multicore. This also removes some lines of code, and removes a couple of corner cases in caml_register/unregister_c_thread. It adds one inside caml_thread_scan_root but I'd argue that this clarifies code (it makes explicit that we can be in the backup thread). This change is also motivated by not accessing Caml_state without the domain lock. The change is pretty superficial otherwise.

The change from Caml_state->id-based indexing to this_thread->domain_id-based indexing forces to move some code around in the same function (when this_thread is not yet initialized or is reset). The motivation here is again to not access Caml_state without holding the domain lock.

Merging All_threads with Current_thread also makes it more natural in two places (where lines of code are removed).

In this file, the accesses to Caml_state->id without holding the domain lock were safe, so part of the motivation will stem from how people feel about #11272.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented May 23, 2022

(Thanks for asking)

@kayceesrk kayceesrk assigned kayceesrk and unassigned kayceesrk May 24, 2022
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented May 24, 2022

Thanks for the review @kayceesrk! I think you have inadvertently written your comments in the wrong thread (#11272). I am a bit busy for the coming days but I will address your comments in the present PR.

@abbysmal abbysmal self-assigned this Jun 9, 2022
@abbysmal
Copy link
Copy Markdown
Contributor

abbysmal commented Jun 9, 2022

Sorry for the long delay answering this, @gadmm, I was off for a while.
I am back now and will review this in the coming days.

gadmm added 4 commits June 20, 2022 15:24
This behaviour was likely unintended given that caml_thread_yield runs
signal handlers, which can execute arbitrary OCaml code.
1. Use a thread-local variable this_thread instead of explicit pthread
   TLS. This lets us avoid circonvolutions.

2. Prefer using this_thread over Active_thread (trivial).

3. Never access Caml_state while the domain lock is not held.

The main motivation is 3. (in preparation of a subsequent PR). But the
fact that this is done by simplifying the code is nice too.
Copy link
Copy Markdown
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a first pass of review.

I feel like the bug fix is nice, and I understand the overall rationale behind some simplifications.
However at times I feel like the code is less explicit than it used to be.

For instance, the this_thread and Current_thread situation is a bit hard to understand, I can't see a consistency in when to use one over the other (?).

Also some of these simplifications I am not super confident about and will need a deeper look into the corner cases removed by this PR.

I commented in a few places I was not sure I understood well.

Current_thread = new_thread;
Tick_thread_running = 0;

st_masterlock_init(&Thread_main_lock);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the initialization of the lock moved toward the end?
Is it a style decision?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary because you need this_thread to be initialized to use the macro. I have made it better in the new version.


/* If no other OCaml thread remains, ask the tick thread to stop
so that it does not prevent the whole process from exiting (#9971) */
if (All_threads == NULL) caml_thread_cleanup(Val_unit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this really dead code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this follows from the comment /* The main domain thread does not go through [caml_thread_stop]. There is always one more thread in the chain at this point in time. */ and the assert above. Let me know if I missed something.
But this is surprising indeed, because this means that the tick thread is never stopped. I have removed this commit and I will reintroduce it in a PR that also fixes stopping the tick thread during domain stop.

@gadmm gadmm force-pushed the systhread_simpl_and_fixes branch from bd753d4 to cd4aa47 Compare June 21, 2022 13:23
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jun 21, 2022

I have found a few more fixes to do so I have split the PR in two (i.e. I removed some commits and will reintroduce them in another PR).

I have renamed Current_thread as proposed by @kayceesrk. I have also checked that no other function is unexpectedly callable from the backup thread.

I have found a nice solution to caml_thread_yield, please have a look.

Please accept a small tooling documentation commit, it is unrelated except for the fact that debugging the issue with caml_thread_yield had me to try and find one time too much the information which I wrote in the commit.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jun 21, 2022

For instance, the this_thread and Current_thread situation is a bit hard to understand, I can't see a consistency in when to use one over the other (?).

[Current_thread -> Active_thread]
I tried to use this_thread whenever possible (because it does not go through an indirection), and Active_thread only when we really mean it, never as a synonymous for this_thread when the two match by chance. This should make the distinction between the two clearer. I have a further commit for a later PR where I clarified the invariant concerning Active_thread notably by using the same set_active function everywhere (this forces to fix another bug, so I leave it for later).

Also some of these simplifications I am not super confident about and will need a deeper look into the corner cases removed by this PR.

One commit was split in two so that should make it easier. I have removed other commits to deal with later.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jun 21, 2022

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jun 30, 2022

This PR had a first round of review by @Engil and @kayceesrk, the comments have been addressed (I think). The PR has been simplified in the hope that we could converge quickly and move to the second PR I mentioned (which I think fixes the turning-off of the tick threads) as well as #11272 which depends on this PR. Is that ok with you?

/* If no other OCaml thread remains, ask the tick thread to stop
so that it does not prevent the whole process from exiting (#9971) */
if (All_threads == NULL) caml_thread_cleanup(Val_unit);
if (Active_thread == NULL) caml_thread_cleanup(Val_unit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the earlier comment about the main thread not going through caml_thread_stop and the assertion that there is at least one other thread, is it possible at all that Active_thread becomes NULL here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR fixes a bug and includes a couple of improvements (avoiding the use of explicit pthread TLS in favour of thread-local variable). While I would have liked each of these to be separate PRs, each one of these improvements is small enough and split into individual commits that they can go in as they are.

I just have one open question #11271 (review).

@kayceesrk kayceesrk added this to the 5.0 milestone Jul 1, 2022
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 1, 2022

Thanks, let's discuss your question in the next PR.

CI is green, no change entry is needed.

@gasche gasche merged commit 475dc68 into ocaml:trunk Jul 1, 2022
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 1, 2022

Thanks, could you please cherry-pick on 5.0?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 2, 2022

I would normally think twice about merging a subtle PR to the release branch without going through our glorious release manager @Octachron, but this is 5.0-the-alpha-release and reviewers agreed that this fixes a bug, so let's go for it.

gasche added a commit that referenced this pull request Jul 2, 2022
Simplifications and fixes to multicore systhreads implementation

(cherry picked from commit 475dc68)
@xavierleroy
Copy link
Copy Markdown
Contributor

That may have been unwise. Jenkins CI is rather unhappy with these commits (https://ci.inria.fr/ocaml/job/main/3090/):

Alpine Linux x86_64
    tests/parallel/'fib_threads.ml' with 1.1 (bytecode) Process 17083 got signal 11(Segmentation fault), no core dumped
    tests/lib-systhreads/'multicore_lifecycle.ml' with 1.1 (bytecode) Process 13556 got signal 11(Segmentation fault), no core dumped

PPC64

    tests/lib-systhreads/'multicore_lifecycle.ml' with 1.1 (bytecode) Timeout expired, killing all child processes
    tests/parallel/'multicore_systhreads.ml' with 1.1 (bytecode) Timeout expired, killing all child processes
    tests/parallel/'fib_threads.ml' with 1.1 (bytecode) Timeout expired, killing all child processes

PPC32
    tests/lib-systhreads/'multicore_lifecycle.ml' with 1.1 (bytecode) Timeout expired, killing all child processes
    tests/parallel/'multicore_systhreads.ml' with 1.1 (bytecode) Timeout expired, killing all child processes
    tests/parallel/'fib_threads.ml' with 1.1 (bytecode) Timeout expired, killing all child processes

There's also a boatload of Windows failures, but possibly unrelated.

Time to debug...

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 2, 2022

Yep, both 5.0 and trunk appear to fail CI. (The testsuite passes with 5.0 on my machine, i checked before pushing).

I will revert the cherry-pick commit from 5.0 for now to un-break that CI. I don't know if we should revert from trunk as well.

gasche added a commit that referenced this pull request Jul 2, 2022
@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 2, 2022

@gadmm this PR appears to break bytecode tests (with a segfault) under some architectures including Alpine-x86_64 (that one is easy to test from Docker I suppose).

@dra27
Copy link
Copy Markdown
Member

dra27 commented Jul 2, 2022

I just pushed 373c941 to trunk which is a missing part of #11304, but that doesn't eliminate the Windows problems caused by this PR on Jenkins either.

I'd suggest we revert this on trunk as well (I'm also taking another look at restoring more of the Windows CI testing, as that would have been a useful canary here)

@kayceesrk
Copy link
Copy Markdown
Contributor

kayceesrk commented Jul 2, 2022

I agree with reverting the change on 5.0 and trunk. I’ll study the failure early next week if @gadmm does not get to it earlier.

@xavierleroy
Copy link
Copy Markdown
Contributor

For what it's worth, here is what GDB says about the Alpine segfault in fib_thread:

Thread 14 "ocamlrun" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 30081]
0x00007fffe7e5e1db in caml_thread_restore_runtime_state () at st_stubs.c:183
183	  Caml_state->current_stack = Active_thread->current_stack;
(gdb) bt
#0  0x00007fffe7e5e1db in caml_thread_restore_runtime_state ()
    at st_stubs.c:183
#1  0x00007fffe7e5e6db in caml_thread_stop () at st_stubs.c:450
#2  caml_thread_start (v=<optimized out>) at st_stubs.c:483
#3  0x00007ffff7fba221 in ?? () from /lib/ld-musl-x86_64.so.1
#4  0x0000000000000000 in ?? ()

Same context for multicore_lifecycle.

gasche added a commit that referenced this pull request Jul 2, 2022
This reverts commit 475dc68, reversing
changes made to 6f9a0a0.

The PR appears to break the CI, see
  https://ci.inria.fr/ocaml/job/main/3090/
@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 2, 2022

Thanks for the feedback, the PR is now reverted from both 5.0 and trunk.

Once the bug are found, we need to open another PR (github won't less us "reopen" a reverted PR), and to run precheck on it to test the INRIA CI before merging.

(I wonder why the Github CI did not observe the issue. Once we understand what was at fault, maybe we will be able to strengthen the CI.)

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 2, 2022

I am looking into it. I could bisect the failure on Alpine to the third commit. I opened #11390 to run the INRIA CI on it the other commits and check if the other failures come from the same commit. Could someone please run precheck on it?

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 3, 2022

Thanks everyone for your assistance. I have opened #11393 to fix the bug (see more details there). So now we have 3 PRs for 5.0 (in order: #11390 -> #11385 -> #11393) and 1 PR for 5.1 only (#11386).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants