Skip to content

Manual chapters on parallelism and memory model#11280

Merged
Octachron merged 3 commits intoocaml:trunkfrom
kayceesrk:domains_manual
Oct 4, 2022
Merged

Manual chapters on parallelism and memory model#11280
Octachron merged 3 commits intoocaml:trunkfrom
kayceesrk:domains_manual

Conversation

@kayceesrk
Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk commented May 27, 2022

This PR adds a manual chapter on parallelism features in OCaml. Many sections are yet to be written, but I thought it might be a good idea to open the PR to get some early feedback.

The rendered HTML pages are at:

  • (Section) Memory model: the easy bits. The aim is to cover only DRF-SC.
  • (Chapter) Memory model: the hard bits. Cover non-SC behaviours and features that do not respect the memory model.
  • (Section) Atomics
  • (Section) Blocking synchronization
  • Update the chapter on the threads library and its interaction with domains.

@kayceesrk kayceesrk force-pushed the domains_manual branch 3 times, most recently from 0c0601e to 37c46eb Compare May 29, 2022 12:48
@kayceesrk
Copy link
Copy Markdown
Contributor Author

I have now included everything that I wanted to say in the parallelism chapter. CC @gasche @fpottier.

Copy link
Copy Markdown
Contributor

@OlivierNicole OlivierNicole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found a few typos, otherwise I have no further comment about the contents, it looks good to me.

@fpottier
Copy link
Copy Markdown
Contributor

@kayceesrk Could you update the rendered version? Thanks!

@kayceesrk
Copy link
Copy Markdown
Contributor Author

kayceesrk commented May 31, 2022

@fpottier

I've pushed the rendered page. It should go live in a few minutes. Thanks for taking a look.

@kayceesrk
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews so far.

For reviewers who have write permission to ocaml/ocaml, please feel free to directly push minor edits to kayceesrk:domains_manual branch.

@fabbing
Copy link
Copy Markdown
Contributor

fabbing commented Jun 1, 2022

Chapter 9 looks good to me!

@kayceesrk
Copy link
Copy Markdown
Contributor Author

Thanks @fabbing for having a look.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks! Some minor remarks below.

modules. These modules are the same as those used to synchronise threads
created by the threads library~\ref{c:threads}. For clarity, in the rest of
this chapter, we shall call the threads created by the threads library as
systhreads. The following program implements a blocking version of a concurrent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the "systhreads" terminology, but don't have a better suggestion yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan either. But I wanted to call this something other than "threads" to avoid confusion. Let me know if you have a better name.

@kayceesrk kayceesrk force-pushed the domains_manual branch 2 times, most recently from aa63257 to 675d886 Compare June 2, 2022 05:06
@kayceesrk
Copy link
Copy Markdown
Contributor Author

kayceesrk commented Jun 2, 2022

The memory model chapter is also complete now.

I've included the operational view of the memory model in the memory model chapter. Given that the OCaml memory model has a straightforward small-step operational semantics, it is useful to include it in the manual for our expert users rather than only have it in the PLDI 2018 paper (section 3).

@kayceesrk kayceesrk changed the title Manual chapter on parallelism Manual chapters on parallelism and memory model Jun 2, 2022
Copy link
Copy Markdown
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I only looked at the memorymodel file here.

Perhaps it's obvious to most readers, but it might be useful to clarify that an OCaml location is not the same as a physical memory location. e.g. the location returned by ref is a new location, even though it may reuse a physical address that was previously GC'd.

operation on the mutex in the execution trace, then "x" precedes "y" in
happens-before order.
\end{itemize}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this list complete? It looks like an extension of the ocaml.cat hb relation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list is missing happens-before ordering due to semaphores and condition variables. I can't think of anything else that is missing.

\subsection{s:datarace}{Data race}

In a given trace, two actions are said to be \emph{conflicting} if they access
the same non-atomic location, and at least one is a write.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicting accesses originate from different concurrent agents (threads?)

Suggested change
the same non-atomic location, and at least one is a write.
the same non-atomic location, are performed by two different threads, and at least one is a write.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. Two accesses from the same thread may also be conflicting. But such a pair will not have a data race since the program order between the pair of accesses implies happens-before order.

The definition of conflicting access closely follows the definition (9) from the PLDI paper on the OCaml memory model: https://kcsrk.info/papers/pldi18-memory.pdf. I've tried not to be creative here.

Copy link
Copy Markdown
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of the section "LDRF > Message Passing", the one using non-atomics, makes think that it presents a way to publish data to multiple readers which is cheaper for the reader than using OCaml's atomics. After I asked about this at #10995, I think that there are still open questions about it.

  1. Is there is any guarantee that the load !r sees a recent value, in the absence of any further synchronization between the two domains? Your reasoning goes “if d2 sees Some then ...” but this leaves the door open for the function never seeing the updated value. Concretely, after inlining, I wonder if some optimization (e.g. RL in the PLDI paper) could have d2 staying forever with the initial value None as a result of the load. (I have been thinking that a Sys.opaque_identity was missing in theory somewhere around the read in my example at #10995, without expectations that the memory model would justify this data-passing idiom.) One solution could be not to present it as a message-passing idiom but only as a memory-safety property. But I was wondering whether you have stronger guarantees in theory. (In practice, I know that all polling location are expected to possibly hide a write to r, so such optimizations are unlikely to happen.)
  2. Previous point aside, it was interesting to see how LDRF could be used to reason about it. But as we have agreed at #10995, this usage follows from the OCaml memory model but not strictly from the PLDI paper, whose notion of initializing write is too weak even to prove publication safety with the stop-the-world GC. Stephen's explanations have been mostly convincing so far, but some work appears needed to even give it a proper definition. To be clear, I am fine with anticipating on results concerning the memory model implemented in practice in OCaml. However I have a nit and a comment:
    • The manual is not clear currently that the notion of initializing write is stronger than the one from the PLDI paper. The latter defines the initial write in such a way that it is determined too early to possibly create a well-formed OCaml value, whereas the OCaml memory model needs to leverage dependency ordering in a novel way (which is the key to making this message-passing idiom possible). I believe that the text should make a clearer distinction between the intended practical memory model for OCaml and what has been presented in the PLDI paper, indeed the reader might incorrectly infer that an argument for memory safety follows from the referenced paper.
    • Whether you intend to present it as a data-passing idiom or not, users might end up using it anyway in some use-cases, on the basis that it works in practice. From this point of view, it comes close to having a clearer and more useful guarantee where the reader does not merely see the initial (or later) write, but the values at least in their state when they were first published (think of data which is created by mutation and later published as immutable). (You have this guarantee I think with another opaque_identity around the write.) This might also end-up being relied-upon in practice. And an updated formal memory model will likely do 99% of the work of justifying such an idiom, which could be an interesting development for leveraging dependency ordering in other languages as well. (I initially wanted to suggest an improvement to the manual based on this, but by thinking more about it I realized that it requires some opaque_identity somewhere, so I just leave this for you as a comment.)

@kayceesrk
Copy link
Copy Markdown
Contributor Author

kayceesrk commented Jun 28, 2022

@gadmm Thanks for the insightful comments as always.

Concretely, after inlining, I wonder if some optimization (e.g. RL in the PLDI paper) could have d2 staying forever with the initial value None as a result of the load.

The memory model doesn't guarantee that Some .. will be seen. It would be surprising if the compiler eliminated the read in d2. The redundant load optimisation, for example, only optimises when it sees two loads. There is only one load of r in this program.

One solution could be not to present it as a message-passing idiom but only as a memory-safety property.

Sounds like a good plan. I can present this as a "buggy" message passing example, where a non-atomic reference r is used whereas the correct implementation should have had r as an atomic reference. We then argue that memory safety is still preserved.

I believe that the text should make a clearer distinction between the intended practical memory model for OCaml and what has been presented in the PLDI paper, indeed the reader might incorrectly infer that an argument for memory safety follows from the referenced paper.

It is very likely that the person who is reading the memory model manual page is likely not to have read the PLDI paper. In that case, what would be the wording we should include such that it doesn't confuse the reader? I'd like to avoid pointing to some detail in the paper, that the reader likely hasn't read, and unintentionally sounding like that detail is important. As with any published paper, the paper is a snapshot of an idea in time, and it is likely that the idea itself evolves whereas the paper doesn't. I used the Java and c++ memory model documentation as guides for writing this page. One nice aspect of that document is that it aims to be self-contained.

Could you suggest the text that should go in here?

@kayceesrk
Copy link
Copy Markdown
Contributor Author

@gadmm

I've redone the "LDRF > memory model" section to be on memory safety. The ones to be addressed are the additional clarifications on (a) dependency ordering and (b) the diff between the PLDI paper and this manual chapter. Waiting for your text suggestions for both.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jun 28, 2022

@gadmm Thanks for the insightful comments as always.

Concretely, after inlining, I wonder if some optimization (e.g. RL in the PLDI paper) could have d2 staying forever with the initial value None as a result of the load.

The memory model doesn't guarantee that Some .. will be seen. It would be surprising if the compiler eliminated the read in d2. The redundant load optimisation, for example, only optimises when it sees two loads. There is only one load of r in this program.

To clarify, I was speaking about the more general pattern where the user might be reading r several times. In fact I am convinced by the argument that the program must also poll often-enough and this forces r to be reloaded often-enough (at least until the compiler becomes smart-enough to know that r is not accessed from any asynchronous callback). So I liked the presentation of the data-passing idiom. But well this does not follow from the memory model.

One solution could be not to present it as a message-passing idiom but only as a memory-safety property.

Sounds like a good plan. I can present this as a "buggy" message passing example, where a non-atomic reference r is used whereas the correct implementation should have had r as an atomic reference. We then argue that memory safety is still preserved.

I have read the new text and I like it.

I believe that the text should make a clearer distinction between the intended practical memory model for OCaml and what has been presented in the PLDI paper, indeed the reader might incorrectly infer that an argument for memory safety follows from the referenced paper.

It is very likely that the person who is reading the memory model manual page is likely not to have read the PLDI paper. In that case, what would be the wording we should include such that it doesn't confuse the reader? I'd like to avoid pointing to some detail in the paper, that the reader likely hasn't read, and unintentionally sounding like that detail is important. As with any published paper, the paper is a snapshot of an idea in time, and it is likely that the idea itself evolves whereas the paper doesn't. I used the Java and c++ memory model documentation as guides for writing this page. One nice aspect of that document is that it aims to be self-contained.

Could you suggest the text that should go in here?

I think this is a good plan, and indeed going into low-level details such as dependency ordering at this particular place goes against the purpose of presenting a higher-level model. A better place could be at the start, when you cite the PLDI paper. You could clarify that the paper is augmented with details about publication-safety in the more recent GC, ideally citing alongside whatever outcome of #10995; this is not yet available but my hope is that it will result in some kind of even more expert-level documentation as an .md file in the tree. There was no progress recently on #10995 but it should be easy to copy-paste Stephen's various explanations into a file. Otherwise a quick explanation at the start of the manual.

@kayceesrk
Copy link
Copy Markdown
Contributor Author

Let me try to work in a sentence at the beginning about the difference between memory model in the paper and this document.

@jhjourdan
Copy link
Copy Markdown
Contributor

FTR, I have reviewed this off-line during my flight between California and Paris, I need a bit more time to actually send my comments. Sorry for the delay.

@kayceesrk
Copy link
Copy Markdown
Contributor Author

No worries @jhjourdan. Happy to wait for your comments.

Copy link
Copy Markdown
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, here is my review. Apart from several minor concerns, I have a major concern about the HB relationship which is defined in the document: I don't think it is a good thing to include a HB relation between an initialization and any accesses to the same memory location.

Comment on lines +538 to +540
simplicity, we restrict the intra-thread actions to just the accesses to atomic
and non-atomic locations, ignoring domain spawn and join operations, and the
operations on mutexes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so? Does it really complicate the model so much?

Copy link
Copy Markdown
Contributor Author

@kayceesrk kayceesrk Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would think so. We'll have to model the fact that a domain may block on a mutex or domain join. While this shouldn't be too hard, the additional details may make it harder to appreciate the semantics of the memory access, which is the focus in this section.

At the very least, any extension to the basic presentation will need to be reviewed very closely. The operational semantics presented here closely follows the PLDI 2018 operational semantics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a particularly strong opinion on this and I don't want to block anything, but I indeed think that the PLDI 2018 presentation is lacking some important aspects. I think it would be a good think to include such a more in-depth description here because:

  • As experience shows (e.g., allocations), the memory model for the full language is never a "trivial extension", and subtle problems can arise unexpectedly (e.g., how one should formulate the model to make sure that a read is always safe even though it is not happens-after the allocation of the cell...). I understand that this text is targeted at OCaml developpers which are not necessarily interested in all these details, but if it's not described in the OCaml manual, then where else should it be described?
  • I don't think adding explanations for spawn/join/allocations/mutexes/... is invasive in the sense that it would require changing the text for the other operations. The general setting (frontiers...) need not be changed.

Copy link
Copy Markdown
Contributor Author

@kayceesrk kayceesrk Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As experience shows (e.g., allocations), the memory model for the full language is never a "trivial extension"

Indeed. This matched our experience of implementing the model. I have a sufficient understanding of how allocation and initialisation work in the memory model now.

Perhaps the right approach would be to first write a peer-reviewed paper that covers allocations and initialisations, compilation to LKMM and reasoning about concurrent accesses to shared locations from OCaml (hardware model) and C (LKMM). And then we can aim to summarize that in the manual. I am hesitant to include new tricky material on the memory model directly in the manual. What do you think?

Could you point me to memory model papers that explicitly deal with allocations?

I don't think adding explanations for spawn/join/allocations/mutexes/... is invasive in the sense that it would require changing the text for the other operations. The general setting (frontiers...) need not be changed.

In a follow-up PR, I'll try to extend the operational semantics with spawn, join, mutexes, condition and semaphore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the right approach would be to first write a peer-reviewed paper that covers allocations and initialisations, compilation to LKMM and reasoning about concurrent accesses to shared locations from OCaml (hardware model) and C (LKMM). And then we can aim to summarize that in the manual. I am hesitant to include new tricky material on the memory model directly in the manual. What do you think?

Seems good to me. The memory model of OCaml and its interaction with C are two different matters.

Could you point me to memory model papers that explicitly deal with allocations?

Our work on Cosmo deals with allocations.

In a follow-up PR, I'll try to extend the operational semantics with spawn, join, mutexes, condition and semaphore.

👍

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jul 6, 2022

@kayceesrk (and perhaps this could interest @lthls also) I was wondering whether you had considered specifying the way asynchronous callbacks (finalizers, signal handlers, memprof callbacks) communicate with their main thread, by saying that they behave as if they are on different threads as far as read/writes to mutable locations are concerned.

More precisely I was wondering whether there could be a path towards the compiler (e.g. flambda) not having to consider that any mutable field could be read or written to at every safe point, in order to make code optimisations apply more often, in the long run. Concretely if you have two loads !x and !x across an allocation, then you should be able to do CSE. IIRC Vincent's explanations, Flambda will never do such CSE because it assumes that arbitrary code can run at any point due to async callbacks.

Due to backwards-compatibility, this is best specified early-on after introducing the memory model. Also the patterns that I know in existing code would not break due to such a change (concretely nobody does while !x do poll_async() done).

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Jul 7, 2022

More precisely I was wondering whether there could be a path towards the compiler (e.g. flambda) not having to consider that any mutable field could be read or written to at every safe point, in order to make code optimisations apply more often, in the long run. Concretely if you have two loads !x and !x across an allocation, then you should be able to do CSE. IIRC Vincent's explanations, Flambda will never do such CSE because it assumes that arbitrary code can run at any point due to async callbacks.

It's not really an issue for Flambda, as we're not interested in mutable structures anyway. However, you can look at the following lines in CSEgen.ml:

ocaml/asmcomp/CSEgen.ml

Lines 290 to 294 in 1d65f5b

Moreover, allocations and polls can trigger the asynchronous execution
of arbitrary Caml code (finalizer, signal handler, context
switch), which can contain non-initializing stores.
Hence, all equations over mutable loads must be removed. *)
let n1 = kill_addr_regs (self#kill_loads n) in

If the memory model states that asynchronous code behaves as if it was executing in another thread, then you should be able to remove the call to kill_loads.

Comment on lines +787 to +756
\item The "Bytes" module~\stdmoduleref{Bytes} permits mixed-mode accesses
where reads and writes may be of different sizes. Unsynchronized mixed-mode
accesses lead to tearing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should document the fact that some Bytes functions step out of the memory model. E.g.:

    \item Unsynchronized accesses to the same sub-range of "bytes" using
      functions from the \stdmoduleref{Bytes} module, such as "Bytes.blit" or
      "Bytes.fill", are an exception to the memory model. Concurrent writes or
      concurrent reads and writes to the same bytes invalidate any guarantees
      about the program. This warning extends to all standard functions that
      writes to "bytes" directly, such as "Unix.recv".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the problem is about accesses to bytes of the same byte array, not on ly to the same bytes. We could even say that there is a problem when accessing to bytes in the same 64bits (or 32 bits) block in a byte array only.

Second, we should take care about bounding the undefinedness to the array of bytes. This is the general spirit of the memory model that a race condition is bounded in space and time. I.e., we could say that all bets are off when it comes to values read from that bytes array until the next synchronized write.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your second point, it seems that we have a choice between accepting a possibility of C-level undefined behavior with this function, not bounded in space and time, or making it much slower by performing a byte-per-byte copy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could use in C relaxed atomic accesses that do not suffer from UB, but do not incur significant overheads since they are compiled to plain accesses.

I think that breaking type safety because of this is not an option.

@kayceesrk
Copy link
Copy Markdown
Contributor Author

kayceesrk commented Sep 27, 2022

@jhjourdan I've addressed your major comment about the initialising writes being part of the happens-before. I have also addressed the minor comments. Would you be able to review this again please?

At some point, we will need to redo the memory model paper fixing this one bug, but also extending the model to include the

  1. compilation of the memory model to the Linux kernel memory model (LKMM) that we use in our C FFI / runtime.
  2. semantics with initialising writes and their compilation to hardware memory models and LKMM.

p.s: As I was addressing the comments, I've marked as "resolved" those comments that I think need not be reviewed again. If you think this workflow makes reviewing hard, let me know and I shall unresolve all the resolved comments.

\subsection{s:datarace}{Data race}

In a given trace, two actions are said to be \emph{conflicting} if they access
the same non-atomic location, and at least one is a write.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not be a conflict between an initializing write and any further access (read or write) to the same memory location (these are not races). These should be removed from the definition of conflicting accesses.

Copy link
Copy Markdown
Contributor Author

@kayceesrk kayceesrk Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've changed the definition to:

In a given trace, two actions are said to be \emph{conflicting} if they access   
the same non-atomic location, at least one is a write and neither is an               
initialising write to that location.

@jhjourdan
Copy link
Copy Markdown
Contributor

@kayceesrk I had a brief look at the current state and the changes you made. This seems good to me, except the few minor comments I made.

@kayceesrk
Copy link
Copy Markdown
Contributor Author

Thanks everyone for the comments. I've squashed the commit history and added a Changes entry. This PR is ready to merge.

As future work, we need the following:

  1. Include semaphores and condition in the inter-domain actions Manual chapters on parallelism and memory model #11280 (comment)
  2. Extend the operational semantics to include allocations, mutex, condition and semaphores Manual chapters on parallelism and memory model #11280 (comment)
  3. Describe the interaction with C FFI (that C FFI preserves local data-race freedom)

@Octachron
Copy link
Copy Markdown
Member

I have pushed a commit that transforms the remaining TODO comments into a latex comment (with @kayceesrk approval). I will squash and merge the PR once the CI returns greens.

@Octachron Octachron merged commit 75b8d68 into ocaml:trunk Oct 4, 2022
Octachron pushed a commit that referenced this pull request Oct 4, 2022
Add manual chapters on parallelism and memory model

(cherry picked from commit 75b8d68)
xavierleroy added a commit that referenced this pull request Dec 20, 2022
For the two new chapters "Parallel programming" and "Memory model".

Follow-up to #11280.
xavierleroy added a commit that referenced this pull request Dec 20, 2022
For the two new chapters "Parallel programming" and "Memory model".

Follow-up to #11280.

(cherry picked from commit e6340ce)
@gadmm gadmm mentioned this pull request Jan 12, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.