Skip to content

domain.c: document the STW synchronization code#11072

Merged
gasche merged 3 commits intoocaml:trunkfrom
gasche:document-stw-synchronization
Mar 16, 2022
Merged

domain.c: document the STW synchronization code#11072
gasche merged 3 commits intoocaml:trunkfrom
gasche:document-stw-synchronization

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Feb 28, 2022

Today I met with @Engil and @Armael and @jhjourdan and we read together the STW synchronization code in domain.c. This PR documents our understanding of the current code.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 28, 2022

(cc @ctk21 @kayceesrk)

Copy link
Copy Markdown
Contributor

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful documentation.

I think while reviewing your notes I might have spotted a bug in our current implementation; very useful to have more eyes on this.

runtime/domain.c Outdated
it waits for the [all_domains_cond] that is signalled at the end
of the STW section (see [create_domain]).

- Domain termination code will not run in parallel with a STW
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I would write it this way; code in domain_terminate will run. In fact
a terminating domain must take part in STW sections until it is in a state where it
can safely leave the STW set.

My focus would be:

A terminating domain uses [all_domains_lock] in [domain_terminate] to ensure 
it does not drop itself from the STW set while a STW section is in progress. 

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind with "domain termination code" is the part that actually cleanups the domain state and releases resources. Your proposal is of course correct, but it's also important to note that there cannot be a race between the STW code and the domain cleanup (because the cleanup happens with all_domains_lock hold, so new STW sections cannot start). I will try to rephase.

Generally my goal was to understand what it means for some mutable variables to be "protected by STW sections". STW sections act as a mutual-exclusion mechanism, what sort of synchronization guarantees does it provide? (I had to think about this in the context of #10971.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, domain cleanup is done once the domain is out of the STW set.

At the top of domain.c, I had a go at listing the STW invariants:

ocaml/runtime/domain.c

Lines 48 to 62 in 04ddddd

/* From a runtime perspective, domains must handle stop-the-world (STW)
sections, during which:
- they are within a section no mutator code is running
- all domains will execute the section in parallel
- barriers are provided to know all domains have reached the
same stage within a section
Stop-the-world sections are used to handle duties such as:
- minor GC
- major GC to trigger major state machine phase changes
Two invariants for STW sections:
- domains only execute mutator code if in the stop-the-world set
- domains in the stop-the-world set guarantee to service the sections
*/

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I need an additional invariant, which is that domain initialization and cleanup never run concurrently with STW sections (in addition to mutator code). This makes sure that STW sections can change domain parameters without racing against domain_create or terminate_domain.

Question:

they are within a section no mutator code is running

What does this one mean? (Maybe "a section where no mutator code is running"?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, could be clearer, maybe:

- all domains know that there is no mutator code running in the section
- domains take care to not trigger callbacks (e.g. a finaliser or signal 
handler) from within the section

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new commit that rephrases this toplevel explanation, would you care to give your opinion? This is just a proposal for clarity (and just a small bit of extra information), it's not about correctness, so I'm perfectly fine with leaving things as-is (or taking your proposal exactly), just tell me.

@gasche gasche force-pushed the document-stw-synchronization branch 3 times, most recently from 7708275 to 4551a1c Compare March 2, 2022 05:32
Copy link
Copy Markdown
Contributor

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look.

I think we need to be careful with how we describe domain termination. It can be thought of as two phases:

  • a phase where the domain is a part of the STW set and so participating; in this phase it is handing off structures to other domains.
  • a phase where the domain has left the STW set and is then tearing down its resources (e.g. shared heap structures, major GC structures, backup threads, etc).

The STW mechanism provides no guarantees over the second phase.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 2, 2022

@ctk21 I think you are right, the current code runs domain-state-cleanup code outside the all_domains_lock. This is not symmetric with create_domain, which does keep all_domains_lock during all initialisation code.

I wondered about how this can work without races between domains terminating and domains being created. The answer is that there is another lock domain_self->domain_lock (that seems used most of the time to orchestrate control transfer with the backup thread) that is taken on domain creation, held by the domain during its lieftime, and released after the cleanup code has run.

Could we instead keep all_domains_lock until termination is done? Basically instead of

  while (!finished) {
   [...]
    caml_empty_minor_heaps_once();
    caml_plat_lock(&all_domains_lock);

    if (!caml_incoming_interrupts_queued() &&
        domain_state->marking_done &&
        domain_state->sweeping_done) {

      finished = 1;
      s->terminating = 0;
      s->running = 0;

      remove_from_stw_domains(domain_self);
      [...]
    }
    caml_plat_unlock(&all_domains_lock);
  }

  /* rest of the domain cleanup code, without holding all_domains_lock */
  [...]

we could use

  do {
   [...]
    caml_empty_minor_heaps_once();
    caml_plat_lock(&all_domains_lock);

    if (!caml_incoming_interrupts_queued() &&
        domain_state->marking_done &&
        domain_state->sweeping_done) {

      finished = 1;
      s->terminating = 0;
      s->running = 0;

      remove_from_stw_domains(domain_self);
      [...]
    } else {
      caml_plat_unlock(&all_domains_lock);
    }
  } while (!finished);

  /* rest of the domain cleanup code, still holding all_domains_lock */
  [...]
  caml_plat_unlock(&all_domains_lock);

Would this work? Would this provide the STW exclusion guarantees I have optimistically written down in this PR?

(This could not work if any of the cleanup code needed to run an STW section, but currently I don't think it does.)

The reason I care about exclusion from domain initialisation and cleanup code is #10971: if we want to be able to change minor heap reservations in-flight, we want to be able to touch the dom_internal data of all domains within one domain of a STW section, and feel confident that this will not race with (initialisation or) cleanup.

Another option would be to ensure that the cleanup code works on the domain_state alone, and leaves the dom_internal variable alone. (I think this is the case for now.) Then we could not provide general exclusion guarantees, but at least the fact that dom_internal structures (but not their inner domain_state field) can be safely mutated from STW without races with initialisation and cleaup. (But keeping all_domains_lock sounds much simpler, unless I am missing something obvious again.)

@ctk21
Copy link
Copy Markdown
Contributor

ctk21 commented Mar 3, 2022

It might work but there are a lot of moving pieces.

From a design perspective, there is another option. Maybe create_domain should only use all_domain_lock at the point it inserts the domain into the stop-the-world set? Not immediately clear to me if we want finer grained or coarser grained locking in the long run. It depends on what we are trying to achieve and why.

Does it make sense to present the other code you have in mind and then look at this topic within that concrete proposal?

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 3, 2022

I think the best thing is to focus in this PR on documenting the current state as it is, and then send other PRs if we want to change the behavior. So I'll go back to doing that.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 3, 2022

@ctk21 I merged your suggestions, and did a couple small changes. Would you by chance be happy with the current state?

For people following from a distance, here is what the main new explanation block now looks like:

STW sections use [all_domains_lock] and the variable [stw_leader]
(0 when no STW section is running, the dom_internal* pointer of the
STW leader when a STW section is running) to guarantee that no
domain is running something else:

  • If two STW sections are attempted in parallel, only one will
    manage to take the lock, and the domain starting the other will
    join that winning STW section, without running its own STW code
    at all. (This is the [try] in the function name: if it returns
    0, the STW section did not run at all, so you should call this
    function in a loop.)

  • Domain initialization code from [create_domain] will not run in
    parallel with a STW section, as [create_domain] starts by
    looping until (1) it has the [all_domains_lock] and (2) there is
    no current STW section (using the [stw_leader] variable).

  • Domain cleanup code runs after the terminating domain may run in
    parallel to a STW section, but only after that domain has safely
    removed itself from the STW participant set: the
    [domain_terminate] function is careful to only leave the STW set
    when (1) it has the [all_domains_lock] and (2) it hasn't received
    any request to participate in a STW section.

Each domain leaves the section as soon as it is finished running
the section code. In particular, a mutator may resume while some
other domain are still in the section. Section code that needs to
happen before any mutator must be followed by a barrier, forcing
all STW participants to synchronize.

Taken together, these properties guarantee that STW sections act as
a proper exclusion mechanism: for example, some mutable state
global to all domains can be "protected by STW" if it is only
mutated within STW section, with a barrier before the next
read. Such state can be safely updated by domain initialization,
but additional synchronization would be required to update it
during domain cleanup.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 3, 2022

@ctk21: for the record, I tried to change terminate_domain in a branch so that the cleanup code runs under all_domains_lock: gasche/ocaml@document-stw-synchronization...domain-termination-synchronization. The code is not simpler (it is different), but I find the provided guarantees simpler (and more symmetric between domain creation and termination), and I'm assuming that the performance difference does not matter as we decided that domain creation and termination were unfrequent oprations. I'm thinking of submitting this change as a PR -- but it would be easier if we could converge on this one first.

runtime/domain.c Outdated
- barriers are provided to know all domains have reached the
same stage within a section
/* The runtime can run stop-the-world (STW) sections, during which all
active domains run the same code in parallel (with a barrier
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active domains are the same as STW participants?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind "active domain" is an informal term to mean "domains currently running", in particular all domains that may run OCaml code ("mutator code"). This is the comment at the top of the file, it's giving a high-level description only, and at this point readers don't really know what "STW participants" are.

runtime/domain.c Outdated
- domains registered as STW participants must be careful to
service STW interrupt requests
- STW sections must not trigger callbacks (eg. finalisers or
signal handlers).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to explicitly mention an example of a domain that is "running" but is not part of the STW participants? A newly created domain that is yet to enter OCaml I guess. Since this domain is not part of the STW participants, what is this domain doing when the STW section is ongoing? Is it parked on a particular condition variable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These more detailed aspects are discussed in the documentation comment of caml_try_run_on_all_domains_with_spin_work. Rather than try to summarize them in the overview comment, I made the forward reference to that detailed documentation more explicit.

See the comments on [caml_try_run_on_all_domains_with_spin_work]
below for more details on the synchronization mechanisms involved.

runtime/domain.c Outdated

Each domain leaves the section as soon as it is finished running
the section code. In particular, a mutator may resume while some
other domain are still in the section. Section code that needs to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Section code" -> "The section of code"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "the code of the STW section", which I think is different. I went over the whole documentation to talk more explicitly about a "STW callback" (the function that all participants are running), and mention "section callback" or "STW callback" instead of the more vague "section code". THanks!

@kayceesrk
Copy link
Copy Markdown
Contributor

I did a light review, and the documentation looks good to me.

I'm assuming that the performance difference does not matter as we decided that domain creation and termination were unfrequent oprations.

We have Sandmark nightly and performance-tuned machines for benchmarking parallel programs. Let's use them. Mostly, we may just observe that there is no noticeable performance degradation, and we can move on. :-) CC @shakthimaan.

FYI currently the nightly runs are broken since the version change from 5.00 to 5.0 broke some dependencies. These should be resolved in a week. After the service is back up, we can point your branch at sandmark nightly and get the results out.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 4, 2022

Thanks @kayceesrk for the review, I'll try to go over the comments soon.

We have Sandmark nightly and performance-tuned machines for benchmarking parallel programs. Let's use them. Mostly, we may just observe that there is no noticeable performance degradation, and we can move on. :-) CC @shakthimaan.

Just to clarify, the change whose performance impact we are discussing are not included in the present PR. They may turn into a PR later (for now it's only at the stage "think hard about whether I dare sending this to @ctk21's review"), and even then I think that a review for correctness would naturally come before performance tests, because this code is easier to get wrong than to get "too slow". So: any benchmarking help is warmly welcome, and I'll take the liberty to ping @shakthimaan again, but right now it's too soon.

@ctk21
Copy link
Copy Markdown
Contributor

ctk21 commented Mar 4, 2022

I've looked at the changes, I think they are good.

@kayceesrk's comments to tighten up what we mean by being in the STW participant set (i.e. guaranteed to service the STW sections) vs being just a pthread entering and exiting, look reasonable.

@ctk21
Copy link
Copy Markdown
Contributor

ctk21 commented Mar 4, 2022

off topic for this PR (:

Let me argue the case to leave the door slightly open on the assumption: "we don't care about the performance of domain spawn & join". Maybe there are reasonable programs and libraries where good spawn/join performance is useful? Once we bake in firmly that domain spawn/join is expensive, it may be difficult to come back.

That said, moving this lock around might not make a meaningful performance difference (it could be desirable in the context of another change to come). I'm a bit skeptical we have programs or use cases that help measure the performance of spawn/join. If no sensible use cases arrive, then I guess it is a mute point.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 4, 2022

I have pushed another commit to address @kayceesrk's comments. Thanks again for the review.

runtime/domain.c Outdated
To provide these guarantees:
- Domains must register as STW participants before running any
mutator code.
- Domains registered as STW participants must be careful to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This comment is correct, but is not needed "To provide these guarantees". The guarantee is "no mutator code runs while STW section is in progress". For this guarantee, STW participants need not service STW interrupt requests. If the STW participants don't service interrupt requests, the program won't enter STW section, and the guarantee is trivially satisfied, isn't it?

If this understanding is correct, we may move the second item out of this list, and add it as a separate paragraph following the list. We may say "For timely handling of STW requests, <2nd item>".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I moved the second item to a separate paragraph, and was drawn into mentioning "poll points" and "backup thread", and I ended up moving this remark to the second comment just below which is gives an overview of the backup-thread mechanism.

Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for improving the documentation in the tricky runtime bits @gasche.

gasche and others added 3 commits March 16, 2022 10:35
Co-authored-by: Tom Kelly <ctk21@users.noreply.github.com>
Suggested-by: KC Sivaramakrishnan <kc@kcsrk.info>
@gasche gasche force-pushed the document-stw-synchronization branch from 35a3897 to f1267ab Compare March 16, 2022 09:36
@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 16, 2022

Thanks @kayceesrk (and @ctk21, @jhjourdan) for your reviews. I took the last batch of comments from KC into account and rebased the PR, this should be good to go.

@gasche gasche merged commit 3439f91 into ocaml:trunk Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants