Skip to content

Domain.DLS is not thread-safe #12677

@polytypic

Description

@polytypic

The current Domain.DLS implementation is not (sys)thread-safe, because the implementations of the various state updating operations include safe points during which the runtime may switch between threads. I'll briefly highlight some specific scenarios of what can go wrong.

Consider the implementation of Domain.DLS.get:

ocaml/stdlib/domain.ml

Lines 120 to 127 in e397ed2

let get (idx, init) =
let st = maybe_grow idx in
let v = st.(idx) in
if v == unique_value then
let v' = Obj.repr (init ()) in
st.(idx) <- (Sys.opaque_identity v');
Obj.magic v'
else Obj.magic v

Two threads on a single domain that concurrently get with the same key before init has been called may both end up calling init (so init would be called twice for a single key). That can happen because maybe_grow (which potentially allocates) includes a safe point and because init (which often allocates) itself may include a safe point. What can happen then is that the two calls of get return two different results (e.g. two different mutable records), one of which is stored in DLS.

Consider the implementation of the internal Domain.DLS.maybe_grow:

ocaml/stdlib/domain.ml

Lines 98 to 111 in e397ed2

let maybe_grow idx =
let st = get_dls_state () in
let sz = Array.length st in
if idx < sz then st
else begin
let rec compute_new_size s =
if idx < s then s else compute_new_size (2 * s)
in
let new_sz = compute_new_size sz in
let new_st = Array.make new_sz unique_value in
Array.blit st 0 new_st 0 sz;
set_dls_state new_st;
new_st
end

Two threads on a single domain that concurrently get with two different keys both call maybe_grow and may both end up allocating a new array for the DLS. This is possible because maybe_grow includes a safe point. Only one of the two arrays eventually ends up being stored as the DLS and one of the get operations ends up being undone. Furthermore if the two threads also perform set operations, some of those may end up being undone.

Here is an example interleaving (variation of the above) where the effects of DLS operations performed by Thread B vanish:

  Thread A                 Thread B
  get x starts

    <--- switch at a safe point --->
  
                           get y starts
                           get y ends
                           set y starts
                           set y ends  

    <--- switch at a safe point --->
  
  set_dls_state
  get x ends

The effects of both get y and set y vanish. Even just having get y vanish is a problem, because get may call init, which may return a mutable object, which might be mutated or stored outside of the DLS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions