Skip to content

add missing CAMLparam in caml_ml_close_channel#12436

Closed
damiendoligez wants to merge 1 commit intoocaml:trunkfrom
damiendoligez:fix-close-channel
Closed

add missing CAMLparam in caml_ml_close_channel#12436
damiendoligez wants to merge 1 commit intoocaml:trunkfrom
damiendoligez:fix-close-channel

Conversation

@damiendoligez
Copy link
Copy Markdown
Member

This fixes a random crash, which is made much more probable by a future PR of mine.

The crash is as follows:

  • Domain A wants to close a channel, which happens to go out of scope after the close. It locks the channel, then enters the blocking section.
  • While domain A is blocked, domain B requests a GC
  • A's backup thread takes over and completes the GC. The channel gets collected and its finalizer is called.
  • The finalizer calls pthread_mutex_destroy on a mutex that's still locked and triggers a fatal error.

The fix is to keep the channel alive until it's unlocked.

@yallop
Copy link
Copy Markdown
Member

yallop commented Jul 28, 2023

Is there a similar issue in caml_ml_set_channel_name, since Lock(channel) involves a blocking section?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 28, 2023

Note that the bug was already a bug in OCaml 4; your explanation is in terms of domains but (non-parallel) threads can create the same problems.

Random remark: another way to keep a channel alive is to increase its refcount before and decrement it after the work has been done. I think that simply registering the root as proposed is a much better approach, of course.

As @yallop I started looking for other instances of related bugs in io.c. I suspect that there may be several others, for example:

  • caml_channel_size looks suspicious to me: we use the file descriptor within the critical section, which could have been closed by a finalizer at this point
  • caml_ml_set_channel_name is wrong as @yallop pointed
  • caml_ml_set_binary_mode and caml_ml_set_buffered look suspicious as well
  • caml_ml_pos_in looks suspicious

Basically it looks like we should always root the channel argument for safety reasons, and justify with a comment the rare cases where we intentionally decide not to.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 29, 2023

I propose a more systematic approach in #12445 which adds roots to many more functions.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Jul 30, 2023

I don’t think these other functions are affected - the backup thread only picks up a GC if the domain’s thread has released the domain lock. In caml_ml_set_channel_name et al, the domain lock is never released, so the scenario can’t arise - the STW section won’t start until the C call finishes. (edit: this isn't true... see below)


CAMLprim value caml_ml_close_channel(value vchannel)
{
CAMLparam1 (vchannel);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s worth a comment about the need for the root, given how subtle the trigger is!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal works well within the mindset that registering roots should be avoided unless necessary -- here we cannot avoid it for a subtle reason, so we should document it.

I think that we should think differently, and reflect this in our documentation practices: we should not require justification for registering values (the easy and safe thing), we should require justification for not registering values (the subtle and unsafe thing).

@xavierleroy
Copy link
Copy Markdown
Contributor

I don’t think these other functions are affected - the backup thread only picks up a GC if the domain’s thread has released the domain lock. In caml_ml_set_channel_name et al, the domain lock is never released, so the scenario can’t arise - the STW section won’t start until the C call finishes.

I don't know enough about the OCaml 5 GC to agree or disagree with this. However, if you have multiple threads, both in OCaml 4 and in OCaml 5, Lock(chan) can enter a blocking section (if the channel lock is already taken), and then another thread can run and trigger a GC. So, I agree with @yallop and @gasche that caml_ml_close_channel is not the only problematic function.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Jul 30, 2023

Ah, I see (and I'd failed to see that @yallop had said that too) - the domain lock may be released in Lock.

@xavierleroy
Copy link
Copy Markdown
Contributor

Superseded by #12445, which is now merged.

@xavierleroy xavierleroy closed this Aug 4, 2023
@damiendoligez damiendoligez deleted the fix-close-channel branch March 7, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants