domain.c: document the STW synchronization code#11072
Conversation
|
(cc @ctk21 @kayceesrk) |
ctk21
left a comment
There was a problem hiding this comment.
Useful documentation.
I think while reviewing your notes I might have spotted a bug in our current implementation; very useful to have more eyes on this.
runtime/domain.c
Outdated
| it waits for the [all_domains_cond] that is signalled at the end | ||
| of the STW section (see [create_domain]). | ||
|
|
||
| - Domain termination code will not run in parallel with a STW |
There was a problem hiding this comment.
Not sure I would write it this way; code in domain_terminate will run. In fact
a terminating domain must take part in STW sections until it is in a state where it
can safely leave the STW set.
My focus would be:
A terminating domain uses [all_domains_lock] in [domain_terminate] to ensure
it does not drop itself from the STW set while a STW section is in progress.
There was a problem hiding this comment.
What I had in mind with "domain termination code" is the part that actually cleanups the domain state and releases resources. Your proposal is of course correct, but it's also important to note that there cannot be a race between the STW code and the domain cleanup (because the cleanup happens with all_domains_lock hold, so new STW sections cannot start). I will try to rephase.
Generally my goal was to understand what it means for some mutable variables to be "protected by STW sections". STW sections act as a mutual-exclusion mechanism, what sort of synchronization guarantees does it provide? (I had to think about this in the context of #10971.)
There was a problem hiding this comment.
True, domain cleanup is done once the domain is out of the STW set.
At the top of domain.c, I had a go at listing the STW invariants:
Lines 48 to 62 in 04ddddd
There was a problem hiding this comment.
Good point! I need an additional invariant, which is that domain initialization and cleanup never run concurrently with STW sections (in addition to mutator code). This makes sure that STW sections can change domain parameters without racing against domain_create or terminate_domain.
Question:
they are within a section no mutator code is running
What does this one mean? (Maybe "a section where no mutator code is running"?)
There was a problem hiding this comment.
Agree, could be clearer, maybe:
- all domains know that there is no mutator code running in the section
- domains take care to not trigger callbacks (e.g. a finaliser or signal
handler) from within the section
There was a problem hiding this comment.
I pushed a new commit that rephrases this toplevel explanation, would you care to give your opinion? This is just a proposal for clarity (and just a small bit of extra information), it's not about correctness, so I'm perfectly fine with leaving things as-is (or taking your proposal exactly), just tell me.
7708275 to
4551a1c
Compare
ctk21
left a comment
There was a problem hiding this comment.
I took another look.
I think we need to be careful with how we describe domain termination. It can be thought of as two phases:
- a phase where the domain is a part of the STW set and so participating; in this phase it is handing off structures to other domains.
- a phase where the domain has left the STW set and is then tearing down its resources (e.g. shared heap structures, major GC structures, backup threads, etc).
The STW mechanism provides no guarantees over the second phase.
|
@ctk21 I think you are right, the current code runs domain-state-cleanup code outside the I wondered about how this can work without races between domains terminating and domains being created. The answer is that there is another lock Could we instead keep while (!finished) {
[...]
caml_empty_minor_heaps_once();
caml_plat_lock(&all_domains_lock);
if (!caml_incoming_interrupts_queued() &&
domain_state->marking_done &&
domain_state->sweeping_done) {
finished = 1;
s->terminating = 0;
s->running = 0;
remove_from_stw_domains(domain_self);
[...]
}
caml_plat_unlock(&all_domains_lock);
}
/* rest of the domain cleanup code, without holding all_domains_lock */
[...]we could use do {
[...]
caml_empty_minor_heaps_once();
caml_plat_lock(&all_domains_lock);
if (!caml_incoming_interrupts_queued() &&
domain_state->marking_done &&
domain_state->sweeping_done) {
finished = 1;
s->terminating = 0;
s->running = 0;
remove_from_stw_domains(domain_self);
[...]
} else {
caml_plat_unlock(&all_domains_lock);
}
} while (!finished);
/* rest of the domain cleanup code, still holding all_domains_lock */
[...]
caml_plat_unlock(&all_domains_lock);Would this work? Would this provide the STW exclusion guarantees I have optimistically written down in this PR? (This could not work if any of the cleanup code needed to run an STW section, but currently I don't think it does.) The reason I care about exclusion from domain initialisation and cleanup code is #10971: if we want to be able to change minor heap reservations in-flight, we want to be able to touch the Another option would be to ensure that the cleanup code works on the |
|
It might work but there are a lot of moving pieces. From a design perspective, there is another option. Maybe Does it make sense to present the other code you have in mind and then look at this topic within that concrete proposal? |
|
I think the best thing is to focus in this PR on documenting the current state as it is, and then send other PRs if we want to change the behavior. So I'll go back to doing that. |
|
@ctk21 I merged your suggestions, and did a couple small changes. Would you by chance be happy with the current state? For people following from a distance, here is what the main new explanation block now looks like:
|
|
@ctk21: for the record, I tried to change |
runtime/domain.c
Outdated
| - barriers are provided to know all domains have reached the | ||
| same stage within a section | ||
| /* The runtime can run stop-the-world (STW) sections, during which all | ||
| active domains run the same code in parallel (with a barrier |
There was a problem hiding this comment.
Active domains are the same as STW participants?
There was a problem hiding this comment.
In my mind "active domain" is an informal term to mean "domains currently running", in particular all domains that may run OCaml code ("mutator code"). This is the comment at the top of the file, it's giving a high-level description only, and at this point readers don't really know what "STW participants" are.
runtime/domain.c
Outdated
| - domains registered as STW participants must be careful to | ||
| service STW interrupt requests | ||
| - STW sections must not trigger callbacks (eg. finalisers or | ||
| signal handlers). |
There was a problem hiding this comment.
Would it be possible to explicitly mention an example of a domain that is "running" but is not part of the STW participants? A newly created domain that is yet to enter OCaml I guess. Since this domain is not part of the STW participants, what is this domain doing when the STW section is ongoing? Is it parked on a particular condition variable?
There was a problem hiding this comment.
These more detailed aspects are discussed in the documentation comment of caml_try_run_on_all_domains_with_spin_work. Rather than try to summarize them in the overview comment, I made the forward reference to that detailed documentation more explicit.
See the comments on [caml_try_run_on_all_domains_with_spin_work]
below for more details on the synchronization mechanisms involved.
runtime/domain.c
Outdated
|
|
||
| Each domain leaves the section as soon as it is finished running | ||
| the section code. In particular, a mutator may resume while some | ||
| other domain are still in the section. Section code that needs to |
There was a problem hiding this comment.
"Section code" -> "The section of code"
There was a problem hiding this comment.
I meant "the code of the STW section", which I think is different. I went over the whole documentation to talk more explicitly about a "STW callback" (the function that all participants are running), and mention "section callback" or "STW callback" instead of the more vague "section code". THanks!
|
I did a light review, and the documentation looks good to me.
We have Sandmark nightly and performance-tuned machines for benchmarking parallel programs. Let's use them. Mostly, we may just observe that there is no noticeable performance degradation, and we can move on. :-) CC @shakthimaan. FYI currently the nightly runs are broken since the version change from 5.00 to 5.0 broke some dependencies. These should be resolved in a week. After the service is back up, we can point your branch at sandmark nightly and get the results out. |
|
Thanks @kayceesrk for the review, I'll try to go over the comments soon.
Just to clarify, the change whose performance impact we are discussing are not included in the present PR. They may turn into a PR later (for now it's only at the stage "think hard about whether I dare sending this to @ctk21's review"), and even then I think that a review for correctness would naturally come before performance tests, because this code is easier to get wrong than to get "too slow". So: any benchmarking help is warmly welcome, and I'll take the liberty to ping @shakthimaan again, but right now it's too soon. |
|
I've looked at the changes, I think they are good. @kayceesrk's comments to tighten up what we mean by being in the STW participant set (i.e. guaranteed to service the STW sections) vs being just a pthread entering and exiting, look reasonable. |
|
off topic for this PR (: Let me argue the case to leave the door slightly open on the assumption: "we don't care about the performance of domain spawn & join". Maybe there are reasonable programs and libraries where good spawn/join performance is useful? Once we bake in firmly that domain spawn/join is expensive, it may be difficult to come back. That said, moving this lock around might not make a meaningful performance difference (it could be desirable in the context of another change to come). I'm a bit skeptical we have programs or use cases that help measure the performance of spawn/join. If no sensible use cases arrive, then I guess it is a mute point. |
|
I have pushed another commit to address @kayceesrk's comments. Thanks again for the review. |
c476ad7 to
35a3897
Compare
runtime/domain.c
Outdated
| To provide these guarantees: | ||
| - Domains must register as STW participants before running any | ||
| mutator code. | ||
| - Domains registered as STW participants must be careful to |
There was a problem hiding this comment.
Nitpick: This comment is correct, but is not needed "To provide these guarantees". The guarantee is "no mutator code runs while STW section is in progress". For this guarantee, STW participants need not service STW interrupt requests. If the STW participants don't service interrupt requests, the program won't enter STW section, and the guarantee is trivially satisfied, isn't it?
If this understanding is correct, we may move the second item out of this list, and add it as a separate paragraph following the list. We may say "For timely handling of STW requests, <2nd item>".
There was a problem hiding this comment.
Indeed. I moved the second item to a separate paragraph, and was drawn into mentioning "poll points" and "backup thread", and I ended up moving this remark to the second comment just below which is gives an overview of the backup-thread mechanism.
Co-authored-by: Tom Kelly <ctk21@users.noreply.github.com>
Suggested-by: KC Sivaramakrishnan <kc@kcsrk.info>
35a3897 to
f1267ab
Compare
|
Thanks @kayceesrk (and @ctk21, @jhjourdan) for your reviews. I took the last batch of comments from KC into account and rebased the PR, this should be good to go. |
Today I met with @Engil and @Armael and @jhjourdan and we read together the STW synchronization code in domain.c. This PR documents our understanding of the current code.