Concurrency safety documentation, part one: concurrency unsafe types by Octachron · Pull Request #11193 · ocaml/ocaml

Octachron · 2022-04-15T15:45:11Z

This PR adds a warning about concurrency safety in the documentation of the modules:

Buffer
Ephemeron
Oo: only the copy function
Hashtbl
Queue
Stack
Scanf
Weak: only the weak hash set type.

This PR focuses on types for which module abstraction is broken on concurrent uses (excluding channels that are under work right now). I am planning to have a separate PR for at least modules with global mutable states, and a third one for more subtle types like arrays and Lazy.t.

Except for the Oo and Weak module, I have added a new global concurrency_unsafe alert to the documented module. This alert is currently disabled by default to make it less invasive.

I have tried to have a standardized warning note on all modules, at the cost of a small rewrite of the warning note newly added in Scanf by @OlivierNicole .

stdlib/hashtbl.mli

stdlib/moreLabels.mli

kayceesrk · 2022-05-12T05:16:31Z

stdlib/buffer.mli

+(** {b Concurrency safety} *)
+
+[@@@alert concurrency_unsafe
+    "Concurrent use of buffers is a programming error."


Thanks for the PR to document the concurrency safety. Putting myself into the shoes of a user, I feel that the current documentation errs too much on the side of safety and makes the module unusable for reasonable use cases.

We are conflating concurrency and parallelism when we mean concurrency_unsafe. To be precise, we have three modes of concurrency in OCaml 5: fibers, systhreads and domains. OCaml 4 had 2: systhreads and monadic concurrency libraries such as Lwt and Async.

The observation is that if the user only uses fibers, with a library such as eio, then the Buffer module is safe. In particular, anything that is safe with Lwt (modulo the parts that uses systhreads -- Lwt_preemptive and others?) should remain safe with fibers.

I could rename the section and alert to Parallelism safety or Domain safety? Thread safety might be confusing with systhreads being a concurrency mechanism.

Similarly, if we want to de-emphasize those warnings, I can remove the alert and move the section to the end of the module documentations (with an example of a potential issue?). That sounds sensible to me since I am not sure if people will ever enable the alert since it is such an imprecise tool. Note however, that the alert is disabled by default, thus the modules are still perfectly usable.

If we use Parallelism safety then eventually we would need to decide what Concurrency safety will mean. Is that fibers or systhreads? Hence, a better option might be to name it Domain safety. CC @avsm who raised the issue.

Also, would it be possible to document individual functions with the safety tags? I'm thinking for modules like Arrays, only a few functions are domain unsafe.

We should also define what safety means. Does "safety" mean no crashes or something stronger like linearizability? @fpottier @gasche who may have opinions on this.

It is possible to add alerts to individual functions (or documentation comments) when needed.
For arrays, do you meant the blit function?

In term of users documentation, iI was draw to a hierarchy of the form:

no crashes

no broken abstractions

linearizability

since corrupting a data structure (breaking abstractions and expected invariants) is a surprising result for a sequential OCaml programmer.

If I am not mistaken, the first level of safety ought to be guarantee by the language, so it is better documented in the future memory model section of the manual to focus on module specific in the module-by-module documentation.

For arrays, do you meant the blit function?

I did have the Array.blit function in mind. In particular, Array.blit on float arrays does not respect the memory model.

ocaml/runtime/array.c

Lines 345 to 356 in 66128ba

/* [MM] [TODO]: Not consistent with the memory model. See the discussion in

https://github.com/ocaml-multicore/ocaml-multicore/pull/822. */

CAMLprim value caml_floatarray_blit(value a1, value ofs1, value a2, value ofs2,

value n)

{

/* See memory model [MM] notes in memory.c */

atomic_thread_fence(memory_order_acquire);

memmove((double *)a2 + Long_val(ofs2),

(double *)a1 + Long_val(ofs1),

Long_val(n) * sizeof(double));

return Val_unit;

}

kayceesrk · 2022-11-01T09:46:27Z

I've re-reviewed the PR.

I feel that the recommendation such as this one:

(** {b Concurrency safety} *)

[@@@alert concurrency_unsafe
    "Concurrent use of buffers is a programming error."

is too coarse-grained and suffers the risk of misinterpretation in multiple ways. It is unclear that the intended meaning of "concurrent use" is "accessing a buffer from multiple domains without a mutex protecting the buffer". In the parallel programming literature, "concurrent use" does not mean necessarily mean accessing without a mutex. I also agree with the earlier comment that "thread-safety" is not an appropriate term given that we have 3 kinds of threading -- domains, systhreads and user-level threads. The challenge here is that we do not have a widely accepted terminology for ready use. I don't have a good answer to this. What do you think @gasche @fpottier @stedolan?

What is the purpose of disabled-by-default alerts? When are the users likely to encounter these? If these are disabled by default, is their purpose to serve as documentation to be read by the users in the API docs? I don't know how alerts work in the compiler. Does building libraries using dune with warnings are errors trigger these alerts?

Octachron · 2022-11-01T10:20:57Z

Error as warnings don't affect alerts. The intended purpose of the alerts was both for documentation purpose and to give users the option to selectively enable the alert in a limited scope.

stdlib/hashtbl.mli

stdlib/moreLabels.mli

stdlib/queue.mli

kayceesrk · 2022-11-01T11:23:29Z

After discussing with @Octachron privately, I agree that trying to come up with precise documentation here is not the goal.
The original motivation of the PR is to warn the users that OCaml 5.0 somehow does not make all of these standard library modules safe for use with parallelism and concurrency. With that in mind, the current PR seems ok to me.

gasche · 2022-11-01T12:27:39Z

Re. "concurrent use": one could say "racy concurrent use" (precise for people familiar with concurrent races, weird for everyone else), or "non-synchronized concurrent use", or "unsafe concurrent use".

kayceesrk · 2022-11-01T15:13:07Z

"Unsynchronized concurrent access" would also work

fpottier · 2022-11-01T16:04:30Z

What do you think @gasche @fpottier @stedolan?

I would vote for "unsynchronized concurrent access".

Octachron · 2022-11-01T16:06:32Z

Unsynchronized concurrent access sounds good to me since it emphasizes that synchronization is left to the users of the mutable data structures.

dbuenzli · 2022-11-01T16:24:52Z

You have an unsynchronized concurrent access here feels a bit long winded to me for discussing things.

What about establishing (un)synchronized access ? That's the kind of vocabulary used by Java™.

kayceesrk · 2022-11-02T05:47:18Z

What about establishing (un)synchronized access? That's the kind of vocabulary used by Java™.

This sounds good to me.

avsm · 2022-11-02T14:24:24Z

"(Un)synchronized access" is also the term we use when teaching Concurrent & Distributed Systems in Cambridge. The "concurrent" is dropped as synchronisation is rarely useful in a sequential setting (with callback-driven code being the exception).

Octachron · 2022-11-07T09:17:14Z

I like that the unsynchronized access vocabulary puts the emphasis on the root issue and give towards potential solution.
For the alert name, I am thinking of unsafe_unsychronized_access (or we could also drop the alert).

kayceesrk · 2022-11-10T08:50:50Z

stdlib/weak.mli


+(** {b Unsynchronized accesses}
+
+    Unsynchronized accesses of weak hash sets is a programming error.


The alert is missing here. Any particular reason for this?

The alert should be on the module produced by the Make functor which is not currently supported as far as I can see.

kayceesrk · 2022-11-10T08:57:34Z

stdlib/buffer.mli

+]
+
+ (**
+    Unsynchronized accesses might break module abstraction and create invalid


"Breaking module abstraction" sounds quite abstract. Is that needed?

Can we perhaps go with

Unsynchronized access to a buffer may lead to an invalid buffer state. Thus, concurrent uses of a buffer must be synchronized (for instance with a {!Mutex.t}).

I'm also avoiding the term "sequentialized" as it is unclear what that means.

A similar change needs to be made in other places.

Creating an invalid state is a better phrasing indeed. I would rather avoid switching between access and uses between the two sentence. It is also not clear to me if there a case where a single unsynchronized access can create an invalid buffer state, thus accesses seems better to me:

Unsynchronized accesses to a buffer may lead to an invalid buffer state. Thus, concurrent accesses to a buffer must be synchronized (for instance with a {!Mutex.t}).

?

kayceesrk · 2022-11-10T09:10:10Z

unsafe_unsychronized_access

We can go with unsynchronized_access. The term "unsynchronized" already has a negative connotation and hence I'd think that the prefix "unsafe" is not necessary.

Octachron · 2022-11-10T09:47:47Z

I don't know: a more explicit name for the alert would unsafe_access_if_unsynchronized.

I fear that with unsynchronized_access the alert may give the impression to warn against a non-synchronized which is not an information that a alert can ever know.

Maybe a better shorter name would be unsafe_if_unsynchronized?

kayceesrk · 2022-11-10T10:01:03Z

Hmm. In a concurrent setting, which is where the term unsynchronized makes sense, unsynchronized access is always unsafe. So unsafe_if_unsynchronized reads like a tautology to me.

Taking a step back, what is the purpose of the alert? Is it to enable users to optionally enable this alert to discover whether they are using any "concurrency unsafe" code? Presumably, the users are enabling this alert for their concurrent or parallel code. In this case, concurrency_unsafe seems like the right name for the alert. (we seem to have come full circle).

Octachron · 2022-11-10T10:19:02Z

The workflow that I am considering for the alert would look like:

enable the alert in a module with some shared mutable state
for each alert:
- disable the alert locally for false positives
- fix the code otherwise

But then, outside of false positives, the unsynchronized_access name makes sense.

I think that I was trying too hard to harden the name against the false positive cases by taking in account that when enabling the alert on a new module, there will always be some false positives.

I would thus propose to go with unsynchronized_access (to avoid splitting the terminology).

kayceesrk · 2022-11-10T10:42:21Z

Sounds good to me. Let's go with unsynchronized_access.

Concurrency safety documentation, part one: concurrency unsafe types (cherry picked from commit db5949f)

Octachron · 2022-11-24T16:39:16Z

Cherry-picked on 5.0 as 616fa13 .

OlivierNicole · 2022-12-22T14:55:12Z

I get this alert in a CI build on macOS while running ocamldoc: https://github.com/ocaml-multicore/ocaml-tsan/actions/runs/3751404038/jobs/6372347665
Is this expected?

gasche · 2023-01-11T19:53:06Z

@gadmm made an excellent point on Discuss: when using a datastructure, it is not necessarily obvious which operations are read-only and which operations may mutate the structure, and this information is necessary to reason about whether a given usage is racy. (Read/read operations in parallel are not racy.)

I think that we should consider documenting explicitly which operations act as read-only on our datastructures. For example:

for Hashtbl, find* and mem are read-only
for Stack, top* is read-only
for Stack and Queue, iterators are read-only
Hashtbl iterators are currently not read-only (I wouldn't have guessed without reviewing the implementation)
for Lazy values, Lazy.force thunk is generally not read-only, but I believe that it is read-only when Lazy.is_val thunk holds

bluddy · 2023-01-11T21:05:40Z

Hashtbl iterators are currently not read-only (I wouldn't have guessed without reviewing the implementation)

This seems like it should be fixed. It looks like a possibly premature optimization in the code.

gasche · 2023-02-24T09:35:05Z

There is something that I don't understand about the unsynchronized_access alert:

$ ocaml -alert +all
OCaml version 5.0.0
Enter #help;; for help.

# !Sys.interactive;;
Alert unsynchronized_access: Stdlib.Sys.interactive
The interactive status is a mutable global state.
- : bool = true
# Buffer.create 42;;
- : Buffer.t = <abstr>
# Queue.create ();;
- : '_weak1 Queue.t = <abstr>

Buffer and Queue have an @@@alert annotation in their .mli (added by the present PR), so I would think that any usage of one of their functino should warn just as Sys.interactive (which has an item-specific annotation instead of a module-global annotation), right?

Octachron · 2023-02-24T09:42:33Z

I feart that this is a rebasing mistake on my side: the alert should also be added to the module aliases exported by the Stdlib module (and that was done in a previous version of this PR that enabled the alert in the compiler before disabling it).

gasche · 2023-02-24T09:48:50Z

Ah, so this is basically #11867 making our life painful.

Octachron force-pushed the concurrency_alerts_part_one branch from 78350b9 to 0d86860 Compare April 15, 2022 15:45

Octachron mentioned this pull request Apr 25, 2022

Meta-issue for OCaml 5.0 release goals #11013

Closed

16 tasks

Octachron added the no-change-entry-needed label Apr 29, 2022

This was referenced Apr 29, 2022

stdlib concurrency safety documentation, part two: global states #11227

Merged

Audit stdlib for mutable state #10960

Closed

avsm reviewed May 4, 2022

View reviewed changes

stdlib/hashtbl.mli Outdated Show resolved Hide resolved

avsm reviewed May 4, 2022

View reviewed changes

stdlib/moreLabels.mli Outdated Show resolved Hide resolved

kayceesrk reviewed May 12, 2022

View reviewed changes

jmid mentioned this pull request Jun 13, 2022

add test for stdlib ephemeron module ocaml-multicore/multicoretests#27

Merged

dra27 added this to the 5.0 milestone Oct 25, 2022

kayceesrk self-assigned this Oct 25, 2022

gasche mentioned this pull request Oct 29, 2022

[multicore] does Hashtbl respect separation? #11681

Closed