Skip to content

Concurrency safety documentation, part one: concurrency unsafe types#11193

Merged
Octachron merged 2 commits intoocaml:trunkfrom
Octachron:concurrency_alerts_part_one
Nov 24, 2022
Merged

Concurrency safety documentation, part one: concurrency unsafe types#11193
Octachron merged 2 commits intoocaml:trunkfrom
Octachron:concurrency_alerts_part_one

Conversation

@Octachron
Copy link
Copy Markdown
Member

This PR adds a warning about concurrency safety in the documentation of the modules:

  • Buffer
  • Ephemeron
  • Oo: only the copy function
  • Hashtbl
  • Queue
  • Stack
  • Scanf
  • Weak: only the weak hash set type.

This PR focuses on types for which module abstraction is broken on concurrent uses (excluding channels that are under work right now). I am planning to have a separate PR for at least modules with global mutable states, and a third one for more subtle types like arrays and Lazy.t.

Except for the Oo and Weak module, I have added a new global concurrency_unsafe alert to the documented module. This alert is currently disabled by default to make it less invasive.

I have tried to have a standardized warning note on all modules, at the cost of a small rewrite of the warning note newly added in Scanf by @OlivierNicole .

(** {b Concurrency safety} *)

[@@@alert concurrency_unsafe
"Concurrent use of buffers is a programming error."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR to document the concurrency safety. Putting myself into the shoes of a user, I feel that the current documentation errs too much on the side of safety and makes the module unusable for reasonable use cases.

We are conflating concurrency and parallelism when we mean concurrency_unsafe. To be precise, we have three modes of concurrency in OCaml 5: fibers, systhreads and domains. OCaml 4 had 2: systhreads and monadic concurrency libraries such as Lwt and Async.

The observation is that if the user only uses fibers, with a library such as eio, then the Buffer module is safe. In particular, anything that is safe with Lwt (modulo the parts that uses systhreads -- Lwt_preemptive and others?) should remain safe with fibers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could rename the section and alert to Parallelism safety or Domain safety? Thread safety might be confusing with systhreads being a concurrency mechanism.

Similarly, if we want to de-emphasize those warnings, I can remove the alert and move the section to the end of the module documentations (with an example of a potential issue?). That sounds sensible to me since I am not sure if people will ever enable the alert since it is such an imprecise tool. Note however, that the alert is disabled by default, thus the modules are still perfectly usable.

Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use Parallelism safety then eventually we would need to decide what Concurrency safety will mean. Is that fibers or systhreads? Hence, a better option might be to name it Domain safety. CC @avsm who raised the issue.

Also, would it be possible to document individual functions with the safety tags? I'm thinking for modules like Arrays, only a few functions are domain unsafe.

We should also define what safety means. Does "safety" mean no crashes or something stronger like linearizability? @fpottier @gasche who may have opinions on this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to add alerts to individual functions (or documentation comments) when needed.
For arrays, do you meant the blit function?

In term of users documentation, iI was draw to a hierarchy of the form:

  1. no crashes
  2. no broken abstractions
  3. linearizability

since corrupting a data structure (breaking abstractions and expected invariants) is a surprising result for a sequential OCaml programmer.

If I am not mistaken, the first level of safety ought to be guarantee by the language, so it is better documented in the future memory model section of the manual to focus on module specific in the module-by-module documentation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For arrays, do you meant the blit function?

I did have the Array.blit function in mind. In particular, Array.blit on float arrays does not respect the memory model.

ocaml/runtime/array.c

Lines 345 to 356 in 66128ba

/* [MM] [TODO]: Not consistent with the memory model. See the discussion in
https://github.com/ocaml-multicore/ocaml-multicore/pull/822. */
CAMLprim value caml_floatarray_blit(value a1, value ofs1, value a2, value ofs2,
value n)
{
/* See memory model [MM] notes in memory.c */
atomic_thread_fence(memory_order_acquire);
memmove((double *)a2 + Long_val(ofs2),
(double *)a1 + Long_val(ofs1),
Long_val(n) * sizeof(double));
return Val_unit;
}

@kayceesrk
Copy link
Copy Markdown
Contributor

I've re-reviewed the PR.

I feel that the recommendation such as this one:

(** {b Concurrency safety} *)

[@@@alert concurrency_unsafe
    "Concurrent use of buffers is a programming error."

is too coarse-grained and suffers the risk of misinterpretation in multiple ways. It is unclear that the intended meaning of "concurrent use" is "accessing a buffer from multiple domains without a mutex protecting the buffer". In the parallel programming literature, "concurrent use" does not mean necessarily mean accessing without a mutex. I also agree with the earlier comment that "thread-safety" is not an appropriate term given that we have 3 kinds of threading -- domains, systhreads and user-level threads. The challenge here is that we do not have a widely accepted terminology for ready use. I don't have a good answer to this. What do you think @gasche @fpottier @stedolan?

What is the purpose of disabled-by-default alerts? When are the users likely to encounter these? If these are disabled by default, is their purpose to serve as documentation to be read by the users in the API docs? I don't know how alerts work in the compiler. Does building libraries using dune with warnings are errors trigger these alerts?

@Octachron
Copy link
Copy Markdown
Member Author

Error as warnings don't affect alerts. The intended purpose of the alerts was both for documentation purpose and to give users the option to selectively enable the alert in a limited scope.

@kayceesrk
Copy link
Copy Markdown
Contributor

kayceesrk commented Nov 1, 2022

After discussing with @Octachron privately, I agree that trying to come up with precise documentation here is not the goal.
The original motivation of the PR is to warn the users that OCaml 5.0 somehow does not make all of these standard library modules safe for use with parallelism and concurrency. With that in mind, the current PR seems ok to me.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 1, 2022

Re. "concurrent use": one could say "racy concurrent use" (precise for people familiar with concurrent races, weird for everyone else), or "non-synchronized concurrent use", or "unsafe concurrent use".

@kayceesrk
Copy link
Copy Markdown
Contributor

"Unsynchronized concurrent access" would also work

@fpottier
Copy link
Copy Markdown
Contributor

fpottier commented Nov 1, 2022

What do you think @gasche @fpottier @stedolan?

I would vote for "unsynchronized concurrent access".

@Octachron
Copy link
Copy Markdown
Member Author

Unsynchronized concurrent access sounds good to me since it emphasizes that synchronization is left to the users of the mutable data structures.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Nov 1, 2022

You have an unsynchronized concurrent access here feels a bit long winded to me for discussing things.

What about establishing (un)synchronized access ? That's the kind of vocabulary used by Java™.

@kayceesrk
Copy link
Copy Markdown
Contributor

What about establishing (un)synchronized access? That's the kind of vocabulary used by Java™.

This sounds good to me.

@avsm
Copy link
Copy Markdown
Member

avsm commented Nov 2, 2022

"(Un)synchronized access" is also the term we use when teaching Concurrent & Distributed Systems in Cambridge. The "concurrent" is dropped as synchronisation is rarely useful in a sequential setting (with callback-driven code being the exception).

@Octachron
Copy link
Copy Markdown
Member Author

I like that the unsynchronized access vocabulary puts the emphasis on the root issue and give towards potential solution.
For the alert name, I am thinking of unsafe_unsychronized_access (or we could also drop the alert).

@Octachron Octachron force-pushed the concurrency_alerts_part_one branch from 6aa3a7c to 96fb2d5 Compare November 7, 2022 12:29
stdlib/weak.mli Outdated

(** {b Unsynchronized accesses}

Unsynchronized accesses of weak hash sets is a programming error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alert is missing here. Any particular reason for this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alert should be on the module produced by the Make functor which is not currently supported as far as I can see.

]

(**
Unsynchronized accesses might break module abstraction and create invalid
Copy link
Copy Markdown
Contributor

@kayceesrk kayceesrk Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Breaking module abstraction" sounds quite abstract. Is that needed?

Can we perhaps go with

Unsynchronized access to a buffer may lead to an invalid buffer state. Thus, concurrent uses of a buffer must be synchronized (for instance with a {!Mutex.t}).

I'm also avoiding the term "sequentialized" as it is unclear what that means.

A similar change needs to be made in other places.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an invalid state is a better phrasing indeed. I would rather avoid switching between access and uses between the two sentence. It is also not clear to me if there a case where a single unsynchronized access can create an invalid buffer state, thus accesses seems better to me:

Unsynchronized accesses to a buffer may lead to an invalid buffer state. Thus, concurrent accesses to a buffer must be synchronized (for instance with a {!Mutex.t}).

?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kayceesrk
Copy link
Copy Markdown
Contributor

unsafe_unsychronized_access

We can go with unsynchronized_access. The term "unsynchronized" already has a negative connotation and hence I'd think that the prefix "unsafe" is not necessary.

@Octachron
Copy link
Copy Markdown
Member Author

I don't know: a more explicit name for the alert would unsafe_access_if_unsynchronized.

I fear that with unsynchronized_access the alert may give the impression to warn against a non-synchronized which is not an information that a alert can ever know.

Maybe a better shorter name would be unsafe_if_unsynchronized?

@kayceesrk
Copy link
Copy Markdown
Contributor

Hmm. In a concurrent setting, which is where the term unsynchronized makes sense, unsynchronized access is always unsafe. So unsafe_if_unsynchronized reads like a tautology to me.

Taking a step back, what is the purpose of the alert? Is it to enable users to optionally enable this alert to discover whether they are using any "concurrency unsafe" code? Presumably, the users are enabling this alert for their concurrent or parallel code. In this case, concurrency_unsafe seems like the right name for the alert. (we seem to have come full circle).

@Octachron
Copy link
Copy Markdown
Member Author

The workflow that I am considering for the alert would look like:

  • enable the alert in a module with some shared mutable state
  • for each alert:
    • disable the alert locally for false positives
    • fix the code otherwise

But then, outside of false positives, the unsynchronized_access name makes sense.

I think that I was trying too hard to harden the name against the false positive cases by taking in account that when enabling the alert on a new module, there will always be some false positives.

I would thus propose to go with unsynchronized_access (to avoid splitting the terminology).

@kayceesrk
Copy link
Copy Markdown
Contributor

Sounds good to me. Let's go with unsynchronized_access.

@Octachron Octachron force-pushed the concurrency_alerts_part_one branch from 96fb2d5 to 7c79659 Compare November 18, 2022 10:28
@Octachron Octachron merged commit db5949f into ocaml:trunk Nov 24, 2022
Octachron added a commit that referenced this pull request Nov 24, 2022
Concurrency safety documentation, part one: concurrency unsafe types

(cherry picked from commit db5949f)
@Octachron
Copy link
Copy Markdown
Member Author

Cherry-picked on 5.0 as 616fa13 .

@OlivierNicole
Copy link
Copy Markdown
Contributor

I get this alert in a CI build on macOS while running ocamldoc: https://github.com/ocaml-multicore/ocaml-tsan/actions/runs/3751404038/jobs/6372347665
Is this expected?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 11, 2023

@gadmm made an excellent point on Discuss: when using a datastructure, it is not necessarily obvious which operations are read-only and which operations may mutate the structure, and this information is necessary to reason about whether a given usage is racy. (Read/read operations in parallel are not racy.)

I think that we should consider documenting explicitly which operations act as read-only on our datastructures. For example:

  • for Hashtbl, find* and mem are read-only
  • for Stack, top* is read-only
  • for Stack and Queue, iterators are read-only
  • Hashtbl iterators are currently not read-only (I wouldn't have guessed without reviewing the implementation)
  • for Lazy values, Lazy.force thunk is generally not read-only, but I believe that it is read-only when Lazy.is_val thunk holds

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Jan 11, 2023

Hashtbl iterators are currently not read-only (I wouldn't have guessed without reviewing the implementation)

This seems like it should be fixed. It looks like a possibly premature optimization in the code.

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 24, 2023

There is something that I don't understand about the unsynchronized_access alert:

$ ocaml -alert +all
OCaml version 5.0.0
Enter #help;; for help.

# !Sys.interactive;;
Alert unsynchronized_access: Stdlib.Sys.interactive
The interactive status is a mutable global state.
- : bool = true
# Buffer.create 42;;
- : Buffer.t = <abstr>
# Queue.create ();;
- : '_weak1 Queue.t = <abstr>

Buffer and Queue have an @@@alert annotation in their .mli (added by the present PR), so I would think that any usage of one of their functino should warn just as Sys.interactive (which has an item-specific annotation instead of a module-global annotation), right?

@Octachron
Copy link
Copy Markdown
Member Author

I feart that this is a rebasing mistake on my side: the alert should also be added to the module aliases exported by the Stdlib module (and that was done in a previous version of this PR that enabled the alert in the compiler before disabling it).

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 24, 2023

Ah, so this is basically #11867 making our life painful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants