Fix or document concurrency issues on channels in Multicore#11171
Fix or document concurrency issues on channels in Multicore#11171Octachron merged 1 commit intoocaml:trunkfrom
Conversation
|
I need more time to review this, but I like the direction it is taking, thanks a lot ! General question in connection with #11177: once we properly lock channels before operating over them, do we still need the refcount field to be atomic? |
14b9c58 to
6e575e1
Compare
|
Thanks for the feedback! Currently the channel mutex is used exclusively to protect I/O operations. It can probably be used to protect accesses to |
| /* Ensure that every read or write on the channel will cause an | ||
| immediate caml_flush_partial or caml_refill, thus raising a Sys_error | ||
| exception */ | ||
| channel->curr = channel->max = channel->end; |
There was a problem hiding this comment.
This deletion changes the behavior of runtime warnings: closing a channel no longer disables the warning about unflushed contents.
For instance, this program only emits warnings once this PR applied.
Sys.enable_runtime_warnings true;;
external close : out_channel -> unit = "caml_ml_close_channel"
let f s =
let o = Out_channel.open_bin ("/tmp/"^s) in
Out_channel.set_buffered o true;
Out_channel.output_string o "test";
close o
let () = f "a"
let () = for i = 0 to 1000 do let a = Array.make 1_000 () in () doneThere was a problem hiding this comment.
I'm not aware of these runtime warnings, what form do they take?
There was a problem hiding this comment.
@Octachron I can't find a difference when running your example with and without this PR. Could you share your compilation commands?
There was a problem hiding this comment.
It is simpler to enable the warnings in the code itself, I have fixed the test program along this line, sorry about that. With trunk, there is no warning emitted, whereas I see the following warning with this PR:
[ocaml] (use Sys.enable_runtime_warnings to control these warnings)
[ocaml] (moreover, it has unflushed data)
There was a problem hiding this comment.
Thanks for the report, I hadn't seen this. I updated this function, it now resets channel->curr to channel->buff so that this false warning is not triggered. It seems clearer to me than re-introducing the deleted statement.
There was a problem hiding this comment.
Resetting channel->curr has the disadvantage of making it possible to write on the buffer of the closed channel without raising an error for a time.
Taking in account that
- Putch only flushes the channel when
channel->current≥channel->end. - Getch only refillls the channel when
channel->current≥channel->max
I am not sure if there is a simpler option than the existing
channel->curr = channel->max = channel->endto forbids all writes or reads on the disconnected buffer.
There was a problem hiding this comment.
Ah yes, you're right, I misunderstood that comment. I went back to the existing
channel->curr = channel->max = channel->end
only it's performed after acquiring the lock now.
|
Looking at the channels function used in |
and |
|
And also |
|
I mentioned those functions in my “unclear if action required” section at #10960 (comment). I suggested that they are probably only used in setup code, so concurrent accesses seem unlikely. But if that's not true, or if we prefer to be bulletproof, I'm happy to add locks in these functions. |
|
Sorry, I had forgotten your comment! Since performance is not at risk for setup code, it seems better to err on the side of safety. |
6e575e1 to
033642d
Compare
|
I have added locking where required. |
runtime/io.c
Outdated
| if (result == -1) caml_sys_error (NO_ARG); | ||
| /* Resetting curr makes it clear that this is a closed channel without | ||
| * unflushed data, so that no warning will be emitted by the channel's | ||
| * finalizer. */ |
There was a problem hiding this comment.
The comment is inexact? We may still have unflushed data on the buffer, since caml_ml_close_channel does not flush the buffer.
There was a problem hiding this comment.
It is a precondition that output channels should be flushed before calling caml_ml_close_channel (as says the comment at the beginning of the function), and stdlib.ml abides by this as the only way to close an out_channel is close_out which first flushes it.
There was a problem hiding this comment.
However, I have removed this comment in light of your remarks above.
033642d to
31caadd
Compare
runtime/io.c
Outdated
| Lock(channel); | ||
| res = channel->offset - (file_offset)(channel->max - channel->curr); | ||
| Unlock(channel); | ||
| return res; |
There was a problem hiding this comment.
Last nitpicking: for other functions, only the caml_ml_* variant acquires the lock, whereas the raw C function does not do so (for instance see caml_seek_in vs caml_ml_seek_in, or caml_input_scan_line vs caml_ml_input_scan_line). Would it not clearer to be homogeneous in term of channel lock for the functions exposed in caml/io.h and move the lock to caml_ml_pos_in and caml_ml_pos_out?
There was a problem hiding this comment.
That seems indeed sensible. Done in the last version.
31caadd to
548d6f9
Compare
Octachron
left a comment
There was a problem hiding this comment.
The current state looks good except for a windows-only data race.
runtime/io.c
Outdated
| caml_sys_error(NO_ARG); | ||
| } | ||
| #endif | ||
| Lock(channel); |
There was a problem hiding this comment.
The WIN32 only access to channel->flags above is a potential data race, that lock should be moved before the ifdef section.
There was a problem hiding this comment.
My bad; fixed.
548d6f9 to
3c62eba
Compare
|
Merged, thanks for the PR! |
This fixes a few channel-related concurrency issues, mentioned in #10960 (comment), and proposes to document the thread-unsafety or possibly surprising behavior of a few other IO functions.
caml_ml_close_channelandcaml_ml_pos_{in,out}. Lock it earlier incaml_ml_flush. Without this, these functions show non sequentially consistent behaviour, unexpectedSys_errorexceptions and even segfaults in multicore.really_input,really_input_stringandinput_all, because they read by chunks and release the channel lock between chunks, may not read contiguous chunks when there are multiple concurrent readers.Scanfis thread-unsafe.