domain.c: document the STW synchronization code by gasche · Pull Request #11072 · ocaml/ocaml

gasche · 2022-02-28T23:49:05Z

Today I met with @Engil and @Armael and @jhjourdan and we read together the STW synchronization code in domain.c. This PR documents our understanding of the current code.

gasche · 2022-02-28T23:49:17Z

(cc @ctk21 @kayceesrk)

ctk21

Useful documentation.

I think while reviewing your notes I might have spotted a bug in our current implementation; very useful to have more eyes on this.

runtime/domain.c

ctk21 · 2022-03-01T09:45:52Z

runtime/domain.c

+     it waits for the [all_domains_cond] that is signalled at the end
+     of the STW section (see [create_domain]).
+
+   - Domain termination code will not run in parallel with a STW


Not sure I would write it this way; code in domain_terminate will run. In fact
a terminating domain must take part in STW sections until it is in a state where it
can safely leave the STW set.

My focus would be:

A terminating domain uses [all_domains_lock] in [domain_terminate] to ensure it does not drop itself from the STW set while a STW section is in progress.

What I had in mind with "domain termination code" is the part that actually cleanups the domain state and releases resources. Your proposal is of course correct, but it's also important to note that there cannot be a race between the STW code and the domain cleanup (because the cleanup happens with all_domains_lock hold, so new STW sections cannot start). I will try to rephase.

Generally my goal was to understand what it means for some mutable variables to be "protected by STW sections". STW sections act as a mutual-exclusion mechanism, what sort of synchronization guarantees does it provide? (I had to think about this in the context of #10971.)

True, domain cleanup is done once the domain is out of the STW set.

At the top of domain.c, I had a go at listing the STW invariants:

ocaml/runtime/domain.c

Lines 48 to 62 in 04ddddd

/* From a runtime perspective, domains must handle stop-the-world (STW)

sections, during which:

- they are within a section no mutator code is running

- all domains will execute the section in parallel

- barriers are provided to know all domains have reached the

same stage within a section

Stop-the-world sections are used to handle duties such as:

- minor GC

- major GC to trigger major state machine phase changes

Two invariants for STW sections:

- domains only execute mutator code if in the stop-the-world set

- domains in the stop-the-world set guarantee to service the sections

*/

Good point! I need an additional invariant, which is that domain initialization and cleanup never run concurrently with STW sections (in addition to mutator code). This makes sure that STW sections can change domain parameters without racing against domain_create or terminate_domain.

Question:

they are within a section no mutator code is running

What does this one mean? (Maybe "a section where no mutator code is running"?)

Agree, could be clearer, maybe:

- all domains know that there is no mutator code running in the section - domains take care to not trigger callbacks (e.g. a finaliser or signal handler) from within the section

I pushed a new commit that rephrases this toplevel explanation, would you care to give your opinion? This is just a proposal for clarity (and just a small bit of extra information), it's not about correctness, so I'm perfectly fine with leaving things as-is (or taking your proposal exactly), just tell me.

runtime/domain.c

ctk21

I took another look.

I think we need to be careful with how we describe domain termination. It can be thought of as two phases:

a phase where the domain is a part of the STW set and so participating; in this phase it is handing off structures to other domains.
a phase where the domain has left the STW set and is then tearing down its resources (e.g. shared heap structures, major GC structures, backup threads, etc).

The STW mechanism provides no guarantees over the second phase.

runtime/domain.c

gasche · 2022-03-02T22:58:44Z

@ctk21 I think you are right, the current code runs domain-state-cleanup code outside the all_domains_lock. This is not symmetric with create_domain, which does keep all_domains_lock during all initialisation code.

I wondered about how this can work without races between domains terminating and domains being created. The answer is that there is another lock domain_self->domain_lock (that seems used most of the time to orchestrate control transfer with the backup thread) that is taken on domain creation, held by the domain during its lieftime, and released after the cleanup code has run.

Could we instead keep all_domains_lock until termination is done? Basically instead of

  while (!finished) {
   [...]
    caml_empty_minor_heaps_once();
    caml_plat_lock(&all_domains_lock);

    if (!caml_incoming_interrupts_queued() &&
        domain_state->marking_done &&
        domain_state->sweeping_done) {

      finished = 1;
      s->terminating = 0;
      s->running = 0;

      remove_from_stw_domains(domain_self);
      [...]
    }
    caml_plat_unlock(&all_domains_lock);
  }

  /* rest of the domain cleanup code, without holding all_domains_lock */
  [...]

we could use

  do {
   [...]
    caml_empty_minor_heaps_once();
    caml_plat_lock(&all_domains_lock);

    if (!caml_incoming_interrupts_queued() &&
        domain_state->marking_done &&
        domain_state->sweeping_done) {

      finished = 1;
      s->terminating = 0;
      s->running = 0;

      remove_from_stw_domains(domain_self);
      [...]
    } else {
      caml_plat_unlock(&all_domains_lock);
    }
  } while (!finished);

  /* rest of the domain cleanup code, still holding all_domains_lock */
  [...]
  caml_plat_unlock(&all_domains_lock);

Would this work? Would this provide the STW exclusion guarantees I have optimistically written down in this PR?

(This could not work if any of the cleanup code needed to run an STW section, but currently I don't think it does.)

The reason I care about exclusion from domain initialisation and cleanup code is #10971: if we want to be able to change minor heap reservations in-flight, we want to be able to touch the dom_internal data of all domains within one domain of a STW section, and feel confident that this will not race with (initialisation or) cleanup.

Another option would be to ensure that the cleanup code works on the domain_state alone, and leaves the dom_internal variable alone. (I think this is the case for now.) Then we could not provide general exclusion guarantees, but at least the fact that dom_internal structures (but not their inner domain_state field) can be safely mutated from STW without races with initialisation and cleaup. (But keeping all_domains_lock sounds much simpler, unless I am missing something obvious again.)

ctk21 · 2022-03-03T14:45:11Z

It might work but there are a lot of moving pieces.

From a design perspective, there is another option. Maybe create_domain should only use all_domain_lock at the point it inserts the domain into the stop-the-world set? Not immediately clear to me if we want finer grained or coarser grained locking in the long run. It depends on what we are trying to achieve and why.

Does it make sense to present the other code you have in mind and then look at this topic within that concrete proposal?

gasche · 2022-03-03T15:02:44Z

I think the best thing is to focus in this PR on documenting the current state as it is, and then send other PRs if we want to change the behavior. So I'll go back to doing that.

gasche · 2022-03-03T15:36:48Z

@ctk21 I merged your suggestions, and did a couple small changes. Would you by chance be happy with the current state?

For people following from a distance, here is what the main new explanation block now looks like:

STW sections use [all_domains_lock] and the variable [stw_leader]
(0 when no STW section is running, the dom_internal* pointer of the
STW leader when a STW section is running) to guarantee that no
domain is running something else:

If two STW sections are attempted in parallel, only one will
manage to take the lock, and the domain starting the other will
join that winning STW section, without running its own STW code
at all. (This is the [try] in the function name: if it returns
0, the STW section did not run at all, so you should call this
function in a loop.)

Domain initialization code from [create_domain] will not run in
parallel with a STW section, as [create_domain] starts by
looping until (1) it has the [all_domains_lock] and (2) there is
no current STW section (using the [stw_leader] variable).

Domain cleanup code runs after the terminating domain may run in
parallel to a STW section, but only after that domain has safely
removed itself from the STW participant set: the
[domain_terminate] function is careful to only leave the STW set
when (1) it has the [all_domains_lock] and (2) it hasn't received
any request to participate in a STW section.

Each domain leaves the section as soon as it is finished running
the section code. In particular, a mutator may resume while some
other domain are still in the section. Section code that needs to
happen before any mutator must be followed by a barrier, forcing
all STW participants to synchronize.

Taken together, these properties guarantee that STW sections act as
a proper exclusion mechanism: for example, some mutable state
global to all domains can be "protected by STW" if it is only
mutated within STW section, with a barrier before the next
read. Such state can be safely updated by domain initialization,
but additional synchronization would be required to update it
during domain cleanup.

gasche · 2022-03-03T23:47:07Z

@ctk21: for the record, I tried to change terminate_domain in a branch so that the cleanup code runs under all_domains_lock: gasche/ocaml@document-stw-synchronization...domain-termination-synchronization. The code is not simpler (it is different), but I find the provided guarantees simpler (and more symmetric between domain creation and termination), and I'm assuming that the performance difference does not matter as we decided that domain creation and termination were unfrequent oprations. I'm thinking of submitting this change as a PR -- but it would be easier if we could converge on this one first.

kayceesrk · 2022-03-04T06:23:51Z

runtime/domain.c

-    - barriers are provided to know all domains have reached the
-      same stage within a section
+/* The runtime can run stop-the-world (STW) sections, during which all
+   active domains run the same code in parallel (with a barrier


Active domains are the same as STW participants?

In my mind "active domain" is an informal term to mean "domains currently running", in particular all domains that may run OCaml code ("mutator code"). This is the comment at the top of the file, it's giving a high-level description only, and at this point readers don't really know what "STW participants" are.

kayceesrk · 2022-03-04T06:26:26Z

runtime/domain.c

+    - domains registered as STW participants must be careful to
+      service STW interrupt requests
+    - STW sections must not trigger callbacks (eg. finalisers or
+      signal handlers).


Would it be possible to explicitly mention an example of a domain that is "running" but is not part of the STW participants? A newly created domain that is yet to enter OCaml I guess. Since this domain is not part of the STW participants, what is this domain doing when the STW section is ongoing? Is it parked on a particular condition variable?

These more detailed aspects are discussed in the documentation comment of caml_try_run_on_all_domains_with_spin_work. Rather than try to summarize them in the overview comment, I made the forward reference to that detailed documentation more explicit.

See the comments on [caml_try_run_on_all_domains_with_spin_work]
below for more details on the synchronization mechanisms involved.

kayceesrk · 2022-03-04T06:29:26Z

runtime/domain.c

+
+   Each domain leaves the section as soon as it is finished running
+   the section code. In particular, a mutator may resume while some
+   other domain are still in the section. Section code that needs to


"Section code" -> "The section of code"

I meant "the code of the STW section", which I think is different. I went over the whole documentation to talk more explicitly about a "STW callback" (the function that all participants are running), and mention "section callback" or "STW callback" instead of the more vague "section code". THanks!

kayceesrk · 2022-03-04T06:35:57Z

I did a light review, and the documentation looks good to me.

I'm assuming that the performance difference does not matter as we decided that domain creation and termination were unfrequent oprations.

We have Sandmark nightly and performance-tuned machines for benchmarking parallel programs. Let's use them. Mostly, we may just observe that there is no noticeable performance degradation, and we can move on. :-) CC @shakthimaan.

FYI currently the nightly runs are broken since the version change from 5.00 to 5.0 broke some dependencies. These should be resolved in a week. After the service is back up, we can point your branch at sandmark nightly and get the results out.

gasche · 2022-03-04T09:16:40Z

Thanks @kayceesrk for the review, I'll try to go over the comments soon.

We have Sandmark nightly and performance-tuned machines for benchmarking parallel programs. Let's use them. Mostly, we may just observe that there is no noticeable performance degradation, and we can move on. :-) CC @shakthimaan.

Just to clarify, the change whose performance impact we are discussing are not included in the present PR. They may turn into a PR later (for now it's only at the stage "think hard about whether I dare sending this to @ctk21's review"), and even then I think that a review for correctness would naturally come before performance tests, because this code is easier to get wrong than to get "too slow". So: any benchmarking help is warmly welcome, and I'll take the liberty to ping @shakthimaan again, but right now it's too soon.

ctk21 · 2022-03-04T09:51:14Z

I've looked at the changes, I think they are good.

@kayceesrk's comments to tighten up what we mean by being in the STW participant set (i.e. guaranteed to service the STW sections) vs being just a pthread entering and exiting, look reasonable.

ctk21 · 2022-03-04T10:01:39Z

off topic for this PR (:

Let me argue the case to leave the door slightly open on the assumption: "we don't care about the performance of domain spawn & join". Maybe there are reasonable programs and libraries where good spawn/join performance is useful? Once we bake in firmly that domain spawn/join is expensive, it may be difficult to come back.

That said, moving this lock around might not make a meaningful performance difference (it could be desirable in the context of another change to come). I'm a bit skeptical we have programs or use cases that help measure the performance of spawn/join. If no sensible use cases arrive, then I guess it is a mute point.

gasche · 2022-03-04T11:06:19Z

I have pushed another commit to address @kayceesrk's comments. Thanks again for the review.

kayceesrk · 2022-03-09T03:40:24Z

runtime/domain.c

+   To provide these guarantees:
+    - Domains must register as STW participants before running any
+      mutator code.
+    - Domains registered as STW participants must be careful to


Nitpick: This comment is correct, but is not needed "To provide these guarantees". The guarantee is "no mutator code runs while STW section is in progress". For this guarantee, STW participants need not service STW interrupt requests. If the STW participants don't service interrupt requests, the program won't enter STW section, and the guarantee is trivially satisfied, isn't it?

If this understanding is correct, we may move the second item out of this list, and add it as a separate paragraph following the list. We may say "For timely handling of STW requests, <2nd item>".

Indeed. I moved the second item to a separate paragraph, and was drawn into mentioning "poll points" and "backup thread", and I ended up moving this remark to the second comment just below which is gives an overview of the backup-thread mechanism.

runtime/domain.c

kayceesrk

LGTM! Thanks for improving the documentation in the tricky runtime bits @gasche.

Co-authored-by: Tom Kelly <ctk21@users.noreply.github.com>

Suggested-by: KC Sivaramakrishnan <kc@kcsrk.info>

gasche · 2022-03-16T09:38:43Z

Thanks @kayceesrk (and @ctk21, @jhjourdan) for your reviews. I took the last batch of comments from KC into account and rebased the PR, this should be good to go.

gasche mentioned this pull request Feb 28, 2022

Code comprehension: why don't STW sections keep all_domains_lock the whole time? #11073

Closed

ctk21 mentioned this pull request Mar 1, 2022

Fix broadcast. gasche/ocaml#11

Closed

ctk21 reviewed Mar 1, 2022

View reviewed changes

jhjourdan reviewed Mar 1, 2022

View reviewed changes

runtime/domain.c Outdated Show resolved Hide resolved

gasche force-pushed the document-stw-synchronization branch 3 times, most recently from 7708275 to 4551a1c Compare March 2, 2022 05:32

ctk21 reviewed Mar 2, 2022

View reviewed changes

runtime/domain.c Outdated Show resolved Hide resolved

runtime/domain.c Outdated Show resolved Hide resolved

runtime/domain.c Outdated Show resolved Hide resolved

runtime/domain.c Outdated Show resolved Hide resolved

runtime/domain.c Outdated Show resolved Hide resolved

kayceesrk reviewed Mar 4, 2022

View reviewed changes

gasche force-pushed the document-stw-synchronization branch from c476ad7 to 35a3897 Compare March 4, 2022 13:05

gasche mentioned this pull request Mar 4, 2022

Reserve only caml_minor_heap_max_wsz * Max_domains for the minor heap #11082

Merged

kayceesrk reviewed Mar 9, 2022

View reviewed changes

runtime/domain.c Outdated Show resolved Hide resolved

kayceesrk approved these changes Mar 9, 2022

View reviewed changes

gasche and others added 3 commits March 16, 2022 10:35

domain.c: document the STW synchronization code

27c615d

Correct the discussion of domain termination

ead70b9

Co-authored-by: Tom Kelly <ctk21@users.noreply.github.com>

domain.c: address review comments from KC

f1267ab

Suggested-by: KC Sivaramakrishnan <kc@kcsrk.info>

gasche force-pushed the document-stw-synchronization branch from 35a3897 to f1267ab Compare March 16, 2022 09:36

gasche added the no-change-entry-needed label Mar 16, 2022

gasche added the merge-me label Mar 16, 2022

gasche merged commit 3439f91 into ocaml:trunk Mar 16, 2022

	/* From a runtime perspective, domains must handle stop-the-world (STW)
	sections, during which:
	- they are within a section no mutator code is running
	- all domains will execute the section in parallel
	- barriers are provided to know all domains have reached the
	same stage within a section

	Stop-the-world sections are used to handle duties such as:
	- minor GC
	- major GC to trigger major state machine phase changes

	Two invariants for STW sections:
	- domains only execute mutator code if in the stop-the-world set
	- domains in the stop-the-world set guarantee to service the sections
	*/

Conversation

gasche commented Feb 28, 2022

Uh oh!

gasche commented Feb 28, 2022

Uh oh!

ctk21 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ctk21 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gasche commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctk21 commented Mar 3, 2022

Uh oh!

gasche commented Mar 3, 2022

Uh oh!

gasche commented Mar 3, 2022

Uh oh!

gasche commented Mar 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayceesrk commented Mar 4, 2022

Uh oh!

gasche commented Mar 4, 2022

Uh oh!

ctk21 commented Mar 4, 2022

Uh oh!

ctk21 commented Mar 4, 2022

Uh oh!

gasche commented Mar 4, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kayceesrk left a comment

Choose a reason for hiding this comment

Uh oh!

gasche commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

gasche commented Mar 2, 2022 •

edited

Loading