Fix or document concurrency issues on channels in Multicore by OlivierNicole · Pull Request #11171 · ocaml/ocaml

OlivierNicole · 2022-04-07T15:41:38Z

This fixes a few channel-related concurrency issues, mentioned in #10960 (comment), and proposes to document the thread-unsafety or possibly surprising behavior of a few other IO functions.

Lock the channel in runtime functions caml_ml_close_channel and caml_ml_pos_{in,out}. Lock it earlier in caml_ml_flush. Without this, these functions show non sequentially consistent behaviour, unexpected Sys_error exceptions and even segfaults in multicore.
Document the fact that really_input, really_input_string and input_all, because they read by chunks and release the channel lock between chunks, may not read contiguous chunks when there are multiple concurrent readers.
Document the fact that Scanf is thread-unsafe.

stdlib/scanf.mli

xavierleroy · 2022-04-10T17:34:47Z

I need more time to review this, but I like the direction it is taking, thanks a lot !

General question in connection with #11177: once we properly lock channels before operating over them, do we still need the refcount field to be atomic?

OlivierNicole · 2022-04-11T16:21:30Z

Thanks for the feedback!

Currently the channel mutex is used exclusively to protect I/O operations. It can probably be used to protect accesses to refcount which would allow to make it non-atomic. Is that something we want?

Octachron · 2022-05-09T11:52:13Z

runtime/io.c

-  /* Ensure that every read or write on the channel will cause an
-     immediate caml_flush_partial or caml_refill, thus raising a Sys_error
-     exception */
-  channel->curr = channel->max = channel->end;


This deletion changes the behavior of runtime warnings: closing a channel no longer disables the warning about unflushed contents.
For instance, this program only emits warnings once this PR applied.

Sys.enable_runtime_warnings true;; external close : out_channel -> unit = "caml_ml_close_channel" let f s = let o = Out_channel.open_bin ("/tmp/"^s) in Out_channel.set_buffered o true; Out_channel.output_string o "test"; close o let () = f "a" let () = for i = 0 to 1000 do let a = Array.make 1_000 () in () done

I'm not aware of these runtime warnings, what form do they take?

@Octachron I can't find a difference when running your example with and without this PR. Could you share your compilation commands?

It is simpler to enable the warnings in the code itself, I have fixed the test program along this line, sorry about that. With trunk, there is no warning emitted, whereas I see the following warning with this PR:

[ocaml] (use Sys.enable_runtime_warnings to control these warnings) [ocaml] (moreover, it has unflushed data)

Thanks for the report, I hadn't seen this. I updated this function, it now resets channel->curr to channel->buff so that this false warning is not triggered. It seems clearer to me than re-introducing the deleted statement.

Resetting channel->curr has the disadvantage of making it possible to write on the buffer of the closed channel without raising an error for a time.
Taking in account that

Putch only flushes the channel when channel->current≥channel->end.

Getch only refillls the channel when channel->current ≥ channel->max
I am not sure if there is a simpler option than the existing

channel->curr = channel->max = channel->end

to forbids all writes or reads on the disconnected buffer.

Ah yes, you're right, I misunderstood that comment. I went back to the existing

channel->curr = channel->max = channel->end

only it's performed after acquiring the lock now.

Octachron · 2022-05-09T12:31:53Z

Looking at the channels function used in Stdlib , it seems to me that both caml_ml_set_binary_mode
and caml_ml_set_name should also hold the channel lock.

hhugo · 2022-05-09T13:15:44Z

Looking at the channels function used in Stdlib , it seems to me that both caml_ml_set_binary_mode and caml_ml_set_name should also hold the channel lock.

and caml_ml_set_buffered ?

Octachron · 2022-05-09T13:18:24Z

And also caml_ml_set_buffered indeed.

OlivierNicole · 2022-05-11T14:26:40Z

I mentioned those functions in my “unclear if action required” section at #10960 (comment). I suggested that they are probably only used in setup code, so concurrent accesses seem unlikely. But if that's not true, or if we prefer to be bulletproof, I'm happy to add locks in these functions.

Octachron · 2022-05-11T14:29:05Z

Sorry, I had forgotten your comment! Since performance is not at risk for setup code, it seems better to err on the side of safety.

OlivierNicole · 2022-05-13T12:54:09Z

I have added locking where required.

Octachron · 2022-05-13T13:06:04Z

runtime/io.c

-  if (result == -1) caml_sys_error (NO_ARG);
+    /* Resetting curr makes it clear that this is a closed channel without
+     * unflushed data, so that no warning will be emitted by the channel's
+     * finalizer. */


The comment is inexact? We may still have unflushed data on the buffer, since caml_ml_close_channel does not flush the buffer.

It is a precondition that output channels should be flushed before calling caml_ml_close_channel (as says the comment at the beginning of the function), and stdlib.ml abides by this as the only way to close an out_channel is close_out which first flushes it.

However, I have removed this comment in light of your remarks above.

Octachron · 2022-05-16T13:00:47Z

runtime/io.c

+  Lock(channel);
+  res = channel->offset - (file_offset)(channel->max - channel->curr);
+  Unlock(channel);
+  return res;


Last nitpicking: for other functions, only the caml_ml_* variant acquires the lock, whereas the raw C function does not do so (for instance see caml_seek_in vs caml_ml_seek_in, or caml_input_scan_line vs caml_ml_input_scan_line). Would it not clearer to be homogeneous in term of channel lock for the functions exposed in caml/io.h and move the lock to caml_ml_pos_in and caml_ml_pos_out?

That seems indeed sensible. Done in the last version.

Octachron

The current state looks good except for a windows-only data race.

Octachron · 2022-05-17T13:17:44Z

runtime/io.c

    caml_sys_error(NO_ARG);
  }
 #endif
+  Lock(channel);


The WIN32 only access to channel->flags above is a potential data race, that lock should be moved before the ifdef section.

My bad; fixed.

Octachron · 2022-05-18T11:53:42Z

Merged, thanks for the PR!

Octachron reviewed Apr 8, 2022

View reviewed changes

stdlib/scanf.mli Outdated Show resolved Hide resolved

OlivierNicole force-pushed the lock_channels branch from 14b9c58 to 6e575e1 Compare April 11, 2022 14:50

Octachron mentioned this pull request Apr 29, 2022

Audit stdlib for mutable state #10960

Closed

Octachron reviewed May 9, 2022

View reviewed changes

OlivierNicole force-pushed the lock_channels branch from 6e575e1 to 033642d Compare May 13, 2022 12:49

Octachron reviewed May 13, 2022

View reviewed changes

OlivierNicole force-pushed the lock_channels branch from 033642d to 31caadd Compare May 13, 2022 14:31

Octachron reviewed May 16, 2022

View reviewed changes

OlivierNicole force-pushed the lock_channels branch from 31caadd to 548d6f9 Compare May 16, 2022 13:20

Octachron requested changes May 17, 2022

View reviewed changes

Fix or document concurrency issues on channels

3c62eba

OlivierNicole force-pushed the lock_channels branch from 548d6f9 to 3c62eba Compare May 18, 2022 08:19

Octachron approved these changes May 18, 2022

View reviewed changes

Octachron added the merge-me label May 18, 2022

Octachron merged commit 2a87ced into ocaml:trunk May 18, 2022

OlivierNicole deleted the lock_channels branch May 18, 2022 11:54

gasche mentioned this pull request Jul 30, 2023

Missing GC root registrations in runtime/io.c #12445

Merged

Conversation

OlivierNicole commented Apr 7, 2022

Uh oh!

Uh oh!

xavierleroy commented Apr 10, 2022

Uh oh!

OlivierNicole commented Apr 11, 2022

Uh oh!

Octachron May 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Octachron May 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OlivierNicole May 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Octachron commented May 9, 2022

Uh oh!

hhugo commented May 9, 2022

Uh oh!

Octachron commented May 9, 2022

Uh oh!

OlivierNicole commented May 11, 2022

Uh oh!

Octachron commented May 11, 2022

Uh oh!

OlivierNicole commented May 13, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Octachron left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Octachron commented May 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Octachron May 9, 2022 •

edited

Loading

Octachron May 12, 2022 •

edited

Loading

OlivierNicole May 13, 2022 •

edited

Loading