Condition.broadcast must interlock as well#324
Merged
talex5 merged 1 commit intoocaml-multicore:mainfrom Sep 24, 2022
Merged
Conversation
There's a race where broadcast doesn't interlock and a in-flight waiter will
have a lost wakeup:
CPU0 at:
Eio.mutex_lock mutex;
check_condition -> maybe abort
Mutex.lock t.mutex; (inlined await for visibility)
B: Eio_mutex.unlock mutex;
---> HERE
match Waiters.await ~mutex:(Some t.mutex) t.waiters t.id with
| () -> Eio_mutex.lock mutex
| exception ex -> Eio_mutex.lock mutex; raise ex
CPU1 at:
Eio.Mutex.use_rw ~protect:false mutex (fun () -> foo := true)
---> HERE
Waiters.wake_all t.waiters (); (inlined broadcast for visibility)
1 - CPU0:HERE already tested the condition, it's false so it's ready to sleep,
it interlocked with the runtime Mutex and released the Eio one.
2 - CPU1 sets the condition just after CPU0:B and now drains the queue and
wakes the waiters, but CPU0 didn't make the waiting list yet, so it loses
the wakeup.
3 - CPU0 waits forever.
The fix is to lock t.mutex, we don't need the full dance:
LOCK(eio_mutex)->LOCK(mutex)->UNLOCK(eio_mutex), just locking the
inner mutex is enough.
disclaimer: It's late and I might have missed something.
talex5
approved these changes
Sep 24, 2022
Collaborator
talex5
left a comment
There was a problem hiding this comment.
The docs for Waiters.wake_all say:
If [t] is shared between domains, the caller must hold the mutex while calling this.
So, yeah, the old code is obviously wrong.
Thanks!
talex5
added a commit
to talex5/opam-repository
that referenced
this pull request
Oct 12, 2022
CHANGES: Changes: - Update to OCaml 5.0.0~beta1 (@anmonteiro @talex5 ocaml-multicore/eio#346). - Add API for seekable read/writes (@nojb ocaml-multicore/eio#307). - Add `Flow.write` (@haesbaert ocaml-multicore/eio#318). This provides an optimised alternative to `copy` in the case where you are writing from a buffer. - Add `Net.with_tcp_connect` (@bikallem ocaml-multicore/eio#302). Convenience function for opening a TCP connection. - Add `Eio.Time.Timeout` (@talex5 ocaml-multicore/eio#320). Makes it easier to pass timeouts around. - Add `Eio_mock.Clock` (@talex5 ocaml-multicore/eio#328). Control time in tests. - Add `Buf_read.take_while1` and `skip_while1` (@bikallem ocaml-multicore/eio#309). These fail if no characters match. - Make the type parameter for `Promise.t` covariant (@anmonteiro ocaml-multicore/eio#300). - Move list functions into a dedicated submodule (@raphael-proust ocaml-multicore/eio#315). - Direct implementation of `Flow.source_string` (@c-cube ocaml-multicore/eio#317). Slightly faster. Bug fixes: - `Condition.broadcast` must interlock as well (@haesbaert ocaml-multicore/eio#324). - Split the reads into no more than 2^32-1 for luv (@haesbaert @talex5 @EduardoRFS ocaml-multicore/eio#343). Luv uses a 32 bit int for buffer sizes and wraps if the value passed is too big. - eio_luv: allow `Net.connect` to be cancelled (@talex5 @nojb ocaml-multicore/eio#311). - eio_main: Use dedicated log source (@anmonteiro ocaml-multicore/eio#326). - linux_eio: fix kernel version number in log message (@talex5 @nojb ocaml-multicore/eio#314). - Account for stack differences in the socketpair test (issue ocaml-multicore/eio#312) (@haesbaert ocaml-multicore/eio#313). Documentation: - Add Best Practices section to README (@talex5 ocaml-multicore/eio#299). - Documentation improvements (@talex5 ocaml-multicore/eio#295 ocaml-multicore/eio#337).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There's a race where broadcast doesn't interlock and a in-flight waiter will have a lost wakeup:
The fix is to lock t.mutex, we don't need the full dance: LOCK(eio_mutex)->LOCK(mutex)->UNLOCK(eio_mutex), just locking the inner mutex is enough.
disclaimer: It's late and I might have missed something.