Manual chapters on parallelism and memory model#11280
Conversation
0c0601e to
37c46eb
Compare
OlivierNicole
left a comment
There was a problem hiding this comment.
I just found a few typos, otherwise I have no further comment about the contents, it looks good to me.
|
@kayceesrk Could you update the rendered version? Thanks! |
|
I've pushed the rendered page. It should go live in a few minutes. Thanks for taking a look. |
|
Thanks for the reviews so far. For reviewers who have write permission to |
|
Chapter 9 looks good to me! |
|
Thanks @fabbing for having a look. |
xavierleroy
left a comment
There was a problem hiding this comment.
Nice work, thanks! Some minor remarks below.
| modules. These modules are the same as those used to synchronise threads | ||
| created by the threads library~\ref{c:threads}. For clarity, in the rest of | ||
| this chapter, we shall call the threads created by the threads library as | ||
| systhreads. The following program implements a blocking version of a concurrent |
There was a problem hiding this comment.
I'm not a fan of the "systhreads" terminology, but don't have a better suggestion yet.
There was a problem hiding this comment.
Not a fan either. But I wanted to call this something other than "threads" to avoid confusion. Let me know if you have a better name.
aa63257 to
675d886
Compare
|
The memory model chapter is also complete now. I've included the operational view of the memory model in the memory model chapter. Given that the OCaml memory model has a straightforward small-step operational semantics, it is useful to include it in the manual for our expert users rather than only have it in the PLDI 2018 paper (section 3). |
talex5
left a comment
There was a problem hiding this comment.
Note: I only looked at the memorymodel file here.
Perhaps it's obvious to most readers, but it might be useful to clarify that an OCaml location is not the same as a physical memory location. e.g. the location returned by ref is a new location, even though it may reuse a physical address that was previously GC'd.
| operation on the mutex in the execution trace, then "x" precedes "y" in | ||
| happens-before order. | ||
| \end{itemize} | ||
|
|
There was a problem hiding this comment.
Is this list complete? It looks like an extension of the ocaml.cat hb relation.
There was a problem hiding this comment.
The list is missing happens-before ordering due to semaphores and condition variables. I can't think of anything else that is missing.
| \subsection{s:datarace}{Data race} | ||
|
|
||
| In a given trace, two actions are said to be \emph{conflicting} if they access | ||
| the same non-atomic location, and at least one is a write. |
There was a problem hiding this comment.
Conflicting accesses originate from different concurrent agents (threads?)
| the same non-atomic location, and at least one is a write. | |
| the same non-atomic location, are performed by two different threads, and at least one is a write. |
There was a problem hiding this comment.
Not necessarily. Two accesses from the same thread may also be conflicting. But such a pair will not have a data race since the program order between the pair of accesses implies happens-before order.
The definition of conflicting access closely follows the definition (9) from the PLDI paper on the OCaml memory model: https://kcsrk.info/papers/pldi18-memory.pdf. I've tried not to be creative here.
gadmm
left a comment
There was a problem hiding this comment.
The title of the section "LDRF > Message Passing", the one using non-atomics, makes think that it presents a way to publish data to multiple readers which is cheaper for the reader than using OCaml's atomics. After I asked about this at #10995, I think that there are still open questions about it.
- Is there is any guarantee that the load
!rsees a recent value, in the absence of any further synchronization between the two domains? Your reasoning goes “ifd2seesSomethen ...” but this leaves the door open for the function never seeing the updated value. Concretely, after inlining, I wonder if some optimization (e.g. RL in the PLDI paper) could haved2staying forever with the initial valueNoneas a result of the load. (I have been thinking that aSys.opaque_identitywas missing in theory somewhere around the read in my example at #10995, without expectations that the memory model would justify this data-passing idiom.) One solution could be not to present it as a message-passing idiom but only as a memory-safety property. But I was wondering whether you have stronger guarantees in theory. (In practice, I know that all polling location are expected to possibly hide a write tor, so such optimizations are unlikely to happen.) - Previous point aside, it was interesting to see how LDRF could be used to reason about it. But as we have agreed at #10995, this usage follows from the OCaml memory model but not strictly from the PLDI paper, whose notion of initializing write is too weak even to prove publication safety with the stop-the-world GC. Stephen's explanations have been mostly convincing so far, but some work appears needed to even give it a proper definition. To be clear, I am fine with anticipating on results concerning the memory model implemented in practice in OCaml. However I have a nit and a comment:
- The manual is not clear currently that the notion of initializing write is stronger than the one from the PLDI paper. The latter defines the initial write in such a way that it is determined too early to possibly create a well-formed OCaml value, whereas the OCaml memory model needs to leverage dependency ordering in a novel way (which is the key to making this message-passing idiom possible). I believe that the text should make a clearer distinction between the intended practical memory model for OCaml and what has been presented in the PLDI paper, indeed the reader might incorrectly infer that an argument for memory safety follows from the referenced paper.
- Whether you intend to present it as a data-passing idiom or not, users might end up using it anyway in some use-cases, on the basis that it works in practice. From this point of view, it comes close to having a clearer and more useful guarantee where the reader does not merely see the initial (or later) write, but the values at least in their state when they were first published (think of data which is created by mutation and later published as immutable). (You have this guarantee I think with another
opaque_identityaround the write.) This might also end-up being relied-upon in practice. And an updated formal memory model will likely do 99% of the work of justifying such an idiom, which could be an interesting development for leveraging dependency ordering in other languages as well. (I initially wanted to suggest an improvement to the manual based on this, but by thinking more about it I realized that it requires someopaque_identitysomewhere, so I just leave this for you as a comment.)
|
@gadmm Thanks for the insightful comments as always.
The memory model doesn't guarantee that
Sounds like a good plan. I can present this as a "buggy" message passing example, where a non-atomic reference
It is very likely that the person who is reading the memory model manual page is likely not to have read the PLDI paper. In that case, what would be the wording we should include such that it doesn't confuse the reader? I'd like to avoid pointing to some detail in the paper, that the reader likely hasn't read, and unintentionally sounding like that detail is important. As with any published paper, the paper is a snapshot of an idea in time, and it is likely that the idea itself evolves whereas the paper doesn't. I used the Java and c++ memory model documentation as guides for writing this page. One nice aspect of that document is that it aims to be self-contained. Could you suggest the text that should go in here? |
|
I've redone the "LDRF > memory model" section to be on memory safety. The ones to be addressed are the additional clarifications on (a) dependency ordering and (b) the diff between the PLDI paper and this manual chapter. Waiting for your text suggestions for both. |
To clarify, I was speaking about the more general pattern where the user might be reading
I have read the new text and I like it.
I think this is a good plan, and indeed going into low-level details such as dependency ordering at this particular place goes against the purpose of presenting a higher-level model. A better place could be at the start, when you cite the PLDI paper. You could clarify that the paper is augmented with details about publication-safety in the more recent GC, ideally citing alongside whatever outcome of #10995; this is not yet available but my hope is that it will result in some kind of even more expert-level documentation as an |
|
Let me try to work in a sentence at the beginning about the difference between memory model in the paper and this document. |
|
FTR, I have reviewed this off-line during my flight between California and Paris, I need a bit more time to actually send my comments. Sorry for the delay. |
|
No worries @jhjourdan. Happy to wait for your comments. |
jhjourdan
left a comment
There was a problem hiding this comment.
Finally, here is my review. Apart from several minor concerns, I have a major concern about the HB relationship which is defined in the document: I don't think it is a good thing to include a HB relation between an initialization and any accesses to the same memory location.
| simplicity, we restrict the intra-thread actions to just the accesses to atomic | ||
| and non-atomic locations, ignoring domain spawn and join operations, and the | ||
| operations on mutexes. |
There was a problem hiding this comment.
Why so? Does it really complicate the model so much?
There was a problem hiding this comment.
Yes, I would think so. We'll have to model the fact that a domain may block on a mutex or domain join. While this shouldn't be too hard, the additional details may make it harder to appreciate the semantics of the memory access, which is the focus in this section.
At the very least, any extension to the basic presentation will need to be reviewed very closely. The operational semantics presented here closely follows the PLDI 2018 operational semantics.
There was a problem hiding this comment.
I don't have a particularly strong opinion on this and I don't want to block anything, but I indeed think that the PLDI 2018 presentation is lacking some important aspects. I think it would be a good think to include such a more in-depth description here because:
- As experience shows (e.g., allocations), the memory model for the full language is never a "trivial extension", and subtle problems can arise unexpectedly (e.g., how one should formulate the model to make sure that a read is always safe even though it is not happens-after the allocation of the cell...). I understand that this text is targeted at OCaml developpers which are not necessarily interested in all these details, but if it's not described in the OCaml manual, then where else should it be described?
- I don't think adding explanations for spawn/join/allocations/mutexes/... is invasive in the sense that it would require changing the text for the other operations. The general setting (frontiers...) need not be changed.
There was a problem hiding this comment.
As experience shows (e.g., allocations), the memory model for the full language is never a "trivial extension"
Indeed. This matched our experience of implementing the model. I have a sufficient understanding of how allocation and initialisation work in the memory model now.
Perhaps the right approach would be to first write a peer-reviewed paper that covers allocations and initialisations, compilation to LKMM and reasoning about concurrent accesses to shared locations from OCaml (hardware model) and C (LKMM). And then we can aim to summarize that in the manual. I am hesitant to include new tricky material on the memory model directly in the manual. What do you think?
Could you point me to memory model papers that explicitly deal with allocations?
I don't think adding explanations for spawn/join/allocations/mutexes/... is invasive in the sense that it would require changing the text for the other operations. The general setting (frontiers...) need not be changed.
In a follow-up PR, I'll try to extend the operational semantics with spawn, join, mutexes, condition and semaphore.
There was a problem hiding this comment.
Perhaps the right approach would be to first write a peer-reviewed paper that covers allocations and initialisations, compilation to LKMM and reasoning about concurrent accesses to shared locations from OCaml (hardware model) and C (LKMM). And then we can aim to summarize that in the manual. I am hesitant to include new tricky material on the memory model directly in the manual. What do you think?
Seems good to me. The memory model of OCaml and its interaction with C are two different matters.
Could you point me to memory model papers that explicitly deal with allocations?
Our work on Cosmo deals with allocations.
In a follow-up PR, I'll try to extend the operational semantics with spawn, join, mutexes, condition and semaphore.
👍
|
@kayceesrk (and perhaps this could interest @lthls also) I was wondering whether you had considered specifying the way asynchronous callbacks (finalizers, signal handlers, memprof callbacks) communicate with their main thread, by saying that they behave as if they are on different threads as far as read/writes to mutable locations are concerned. More precisely I was wondering whether there could be a path towards the compiler (e.g. flambda) not having to consider that any mutable field could be read or written to at every safe point, in order to make code optimisations apply more often, in the long run. Concretely if you have two loads Due to backwards-compatibility, this is best specified early-on after introducing the memory model. Also the patterns that I know in existing code would not break due to such a change (concretely nobody does |
It's not really an issue for Flambda, as we're not interested in mutable structures anyway. However, you can look at the following lines in Lines 290 to 294 in 1d65f5b If the memory model states that asynchronous code behaves as if it was executing in another thread, then you should be able to remove the call to kill_loads.
|
| \item The "Bytes" module~\stdmoduleref{Bytes} permits mixed-mode accesses | ||
| where reads and writes may be of different sizes. Unsynchronized mixed-mode | ||
| accesses lead to tearing. |
There was a problem hiding this comment.
Maybe we should document the fact that some Bytes functions step out of the memory model. E.g.:
\item Unsynchronized accesses to the same sub-range of "bytes" using
functions from the \stdmoduleref{Bytes} module, such as "Bytes.blit" or
"Bytes.fill", are an exception to the memory model. Concurrent writes or
concurrent reads and writes to the same bytes invalidate any guarantees
about the program. This warning extends to all standard functions that
writes to "bytes" directly, such as "Unix.recv".
There was a problem hiding this comment.
First, the problem is about accesses to bytes of the same byte array, not on ly to the same bytes. We could even say that there is a problem when accessing to bytes in the same 64bits (or 32 bits) block in a byte array only.
Second, we should take care about bounding the undefinedness to the array of bytes. This is the general spirit of the memory model that a race condition is bounded in space and time. I.e., we could say that all bets are off when it comes to values read from that bytes array until the next synchronized write.
There was a problem hiding this comment.
Regarding your second point, it seems that we have a choice between accepting a possibility of C-level undefined behavior with this function, not bounded in space and time, or making it much slower by performing a byte-per-byte copy.
There was a problem hiding this comment.
Or we could use in C relaxed atomic accesses that do not suffer from UB, but do not incur significant overheads since they are compiled to plain accesses.
I think that breaking type safety because of this is not an option.
|
@jhjourdan I've addressed your major comment about the initialising writes being part of the happens-before. I have also addressed the minor comments. Would you be able to review this again please? At some point, we will need to redo the memory model paper fixing this one bug, but also extending the model to include the
p.s: As I was addressing the comments, I've marked as "resolved" those comments that I think need not be reviewed again. If you think this workflow makes reviewing hard, let me know and I shall unresolve all the resolved comments. |
| \subsection{s:datarace}{Data race} | ||
|
|
||
| In a given trace, two actions are said to be \emph{conflicting} if they access | ||
| the same non-atomic location, and at least one is a write. |
There was a problem hiding this comment.
There should not be a conflict between an initializing write and any further access (read or write) to the same memory location (these are not races). These should be removed from the definition of conflicting accesses.
There was a problem hiding this comment.
Good catch. I've changed the definition to:
In a given trace, two actions are said to be \emph{conflicting} if they access
the same non-atomic location, at least one is a write and neither is an
initialising write to that location.
|
@kayceesrk I had a brief look at the current state and the changes you made. This seems good to me, except the few minor comments I made. |
13ee210 to
b3e8bed
Compare
b3e8bed to
c73af49
Compare
|
Thanks everyone for the comments. I've squashed the commit history and added a Changes entry. This PR is ready to merge. As future work, we need the following:
|
|
I have pushed a commit that transforms the remaining TODO comments into a latex comment (with @kayceesrk approval). I will squash and merge the PR once the CI returns greens. |
Add manual chapters on parallelism and memory model (cherry picked from commit 75b8d68)
For the two new chapters "Parallel programming" and "Memory model". Follow-up to #11280.
This PR adds a manual chapter on parallelism features in OCaml.
Many sections are yet to be written, but I thought it might be a good idea to open the PR to get some early feedback.The rendered HTML pages are at: