Skip to content

Fixes save_runtime_state logic in caml_thread_yield#12830

Closed
Johan511 wants to merge 3 commits intoocaml:trunkfrom
Johan511:save_runtime
Closed

Fixes save_runtime_state logic in caml_thread_yield#12830
Johan511 wants to merge 3 commits intoocaml:trunkfrom
Johan511:save_runtime

Conversation

@Johan511
Copy link
Copy Markdown
Contributor

@Johan511 Johan511 commented Dec 15, 2023

Modification from caml_state should be saved into this_thread only if this_thread is the active thread
Sometimes this is not the case, for example in tests/parallel/test_c_thread_register.ml (this issue)

let _ =
  let d =
    Domain.spawn begin fun () ->
      spawn_thread passed; (* creating a C thread *)
      Thread.delay 0.5
    end
  in
  let t = Thread.create (fun () -> Thread.delay 1.0) () in
  Thread.join t;
  Domain.join d

A simpler example is also

external spawn_thread : (unit -> unit) -> unit = "spawn_thread"
let passed () = Printf.printf "passed\n"
let _ =
  spawn_thread passed; (* creating a C thread *)
  let t = Thread.create (fun () -> Thread.delay 1.0) () in
  Thread.join t  

@Johan511 Johan511 changed the title fixed save_runtime_state logic in caml_thread_yield Fixes save_runtime_state logic in caml_thread_yield Dec 15, 2023
@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 15, 2023

This is tricky code. I wonder if @Engil or @gadmm would be willing to look at this.

My gut feeling is that there may be something wrong elsewhere in the code, possibly with the semantics of registering foreign thread. It looks like a thread is running OCaml code while not being marked as the "Active" thread, which probably should not happen and probably would cause other issues than specifically this bug.

@Johan511
Copy link
Copy Markdown
Contributor Author

Johan511 commented Dec 15, 2023

In caml_c_thread_register we set the thread from which the function is called to Active_thread->next.
ref

Active_thread not being the current running thread seems to happen in case of C threads. The following PR also seems to have fixed another problem due to that.

code ref

CAMLexport int caml_c_thread_register(void)
   th->prev = Active_thread;
   Active_thread->next->prev = th;
   Active_thread->next = th;
+  restore_runtime_state(th);

I have tried the following change and it seems to pass all tests. It updates the C thread to Active_thread when registering the thread.

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 15, 2023

(Right, I should also have cc-ed @xavierleroy obviously as someone who would know about this.)

@xavierleroy
Copy link
Copy Markdown
Contributor

cc-ed @xavierleroy obviously as someone who would know about this.

The systhreads code has changed too much for me to guess what's going on. It's not obvious to me that the present issue is related to #5295.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Dec 15, 2023

I am trying at the moment to make sense of c thread (un)registering by factoring code in common with thread creation.

Anyway, thanks for the solution to #12773, as I quickly tried to find the issue inside "register c thread", but did not find it.

@OlivierNicole
Copy link
Copy Markdown
Contributor

@dustanddreams confirms that he can no longer reproduce the TSan use-after-free report (see #12773) with either of @Johan511's fixes. In contrast, the report still triggers on current trunk.
(For my part I can never trigger the TSan report on my machine for some reason.)

@Johan511 also tells me that on his machine the simpler example sometimes segfaults.

@Johan511
Copy link
Copy Markdown
Contributor Author

Johan511 commented Dec 15, 2023

It's not obvious to me that the present issue is related to #5295.

I am sorry @xavierleroy the issue does seem unrelated.

I based my statement off this comment which seems to make a similar check.

PR#5295 routes to issue #5295, maybe that's the source of the confusion

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Dec 15, 2023

I think the second route suggested at #12830 (comment) is the right one. This suggested the refactor at #12834 which fixes other bugs as well.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 12, 2024

If I understand correctly, #12834 should have fixed the present issue, thanks to the alternate approach suggested of @Johan511 above. I am thus closing the present PR/issue. Thanks a lot for the report, the proposed fix and the suggestion for the other fix!

@gasche gasche closed this Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants