Skip to content

Fix or document concurrency issues on channels in Multicore#11171

Merged
Octachron merged 1 commit intoocaml:trunkfrom
OlivierNicole:lock_channels
May 18, 2022
Merged

Fix or document concurrency issues on channels in Multicore#11171
Octachron merged 1 commit intoocaml:trunkfrom
OlivierNicole:lock_channels

Conversation

@OlivierNicole
Copy link
Copy Markdown
Contributor

This fixes a few channel-related concurrency issues, mentioned in #10960 (comment), and proposes to document the thread-unsafety or possibly surprising behavior of a few other IO functions.

  • Lock the channel in runtime functions caml_ml_close_channel and caml_ml_pos_{in,out}. Lock it earlier in caml_ml_flush. Without this, these functions show non sequentially consistent behaviour, unexpected Sys_error exceptions and even segfaults in multicore.
  • Document the fact that really_input, really_input_string and input_all, because they read by chunks and release the channel lock between chunks, may not read contiguous chunks when there are multiple concurrent readers.
  • Document the fact that Scanf is thread-unsafe.

@xavierleroy
Copy link
Copy Markdown
Contributor

I need more time to review this, but I like the direction it is taking, thanks a lot !

General question in connection with #11177: once we properly lock channels before operating over them, do we still need the refcount field to be atomic?

@OlivierNicole
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback!

Currently the channel mutex is used exclusively to protect I/O operations. It can probably be used to protect accesses to refcount which would allow to make it non-atomic. Is that something we want?

/* Ensure that every read or write on the channel will cause an
immediate caml_flush_partial or caml_refill, thus raising a Sys_error
exception */
channel->curr = channel->max = channel->end;
Copy link
Copy Markdown
Member

@Octachron Octachron May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletion changes the behavior of runtime warnings: closing a channel no longer disables the warning about unflushed contents.
For instance, this program only emits warnings once this PR applied.

Sys.enable_runtime_warnings true;;
external close : out_channel -> unit = "caml_ml_close_channel"
let f s =
  let o = Out_channel.open_bin ("/tmp/"^s) in
  Out_channel.set_buffered o true;
  Out_channel.output_string o "test";
  close o
let () = f "a"
let () = for i = 0 to 1000 do let a = Array.make 1_000 () in () done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of these runtime warnings, what form do they take?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Octachron I can't find a difference when running your example with and without this PR. Could you share your compilation commands?

Copy link
Copy Markdown
Member

@Octachron Octachron May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simpler to enable the warnings in the code itself, I have fixed the test program along this line, sorry about that. With trunk, there is no warning emitted, whereas I see the following warning with this PR:

[ocaml] (use Sys.enable_runtime_warnings to control these warnings)
[ocaml] (moreover, it has unflushed data)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report, I hadn't seen this. I updated this function, it now resets channel->curr to channel->buff so that this false warning is not triggered. It seems clearer to me than re-introducing the deleted statement.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting channel->curr has the disadvantage of making it possible to write on the buffer of the closed channel without raising an error for a time.
Taking in account that

  • Putch only flushes the channel when channel->currentchannel->end.
  • Getch only refillls the channel when channel->currentchannel->max
    I am not sure if there is a simpler option than the existing
channel->curr = channel->max = channel->end

to forbids all writes or reads on the disconnected buffer.

Copy link
Copy Markdown
Contributor Author

@OlivierNicole OlivierNicole May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you're right, I misunderstood that comment. I went back to the existing

channel->curr = channel->max = channel->end

only it's performed after acquiring the lock now.

@Octachron
Copy link
Copy Markdown
Member

Looking at the channels function used in Stdlib , it seems to me that both caml_ml_set_binary_mode
and caml_ml_set_name should also hold the channel lock.

@hhugo
Copy link
Copy Markdown
Contributor

hhugo commented May 9, 2022

Looking at the channels function used in Stdlib , it seems to me that both caml_ml_set_binary_mode and caml_ml_set_name should also hold the channel lock.

and caml_ml_set_buffered ?

@Octachron
Copy link
Copy Markdown
Member

And also caml_ml_set_buffered indeed.

@OlivierNicole
Copy link
Copy Markdown
Contributor Author

I mentioned those functions in my “unclear if action required” section at #10960 (comment). I suggested that they are probably only used in setup code, so concurrent accesses seem unlikely. But if that's not true, or if we prefer to be bulletproof, I'm happy to add locks in these functions.

@Octachron
Copy link
Copy Markdown
Member

Sorry, I had forgotten your comment! Since performance is not at risk for setup code, it seems better to err on the side of safety.

@OlivierNicole
Copy link
Copy Markdown
Contributor Author

I have added locking where required.

runtime/io.c Outdated
if (result == -1) caml_sys_error (NO_ARG);
/* Resetting curr makes it clear that this is a closed channel without
* unflushed data, so that no warning will be emitted by the channel's
* finalizer. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is inexact? We may still have unflushed data on the buffer, since caml_ml_close_channel does not flush the buffer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a precondition that output channels should be flushed before calling caml_ml_close_channel (as says the comment at the beginning of the function), and stdlib.ml abides by this as the only way to close an out_channel is close_out which first flushes it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I have removed this comment in light of your remarks above.

runtime/io.c Outdated
Lock(channel);
res = channel->offset - (file_offset)(channel->max - channel->curr);
Unlock(channel);
return res;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nitpicking: for other functions, only the caml_ml_* variant acquires the lock, whereas the raw C function does not do so (for instance see caml_seek_in vs caml_ml_seek_in, or caml_input_scan_line vs caml_ml_input_scan_line). Would it not clearer to be homogeneous in term of channel lock for the functions exposed in caml/io.h and move the lock to caml_ml_pos_in and caml_ml_pos_out?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems indeed sensible. Done in the last version.

Copy link
Copy Markdown
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state looks good except for a windows-only data race.

runtime/io.c Outdated
caml_sys_error(NO_ARG);
}
#endif
Lock(channel);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WIN32 only access to channel->flags above is a potential data race, that lock should be moved before the ifdef section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad; fixed.

@Octachron Octachron merged commit 2a87ced into ocaml:trunk May 18, 2022
@Octachron
Copy link
Copy Markdown
Member

Merged, thanks for the PR!

@OlivierNicole OlivierNicole deleted the lock_channels branch May 18, 2022 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants