Skip to content

Race in signal.c causes programs to hang #12253

@talex5

Description

@talex5

When a signal is pending, caml_process_pending_signals_exn first clears the pending flag and then calls caml_execute_signal_exn, which masks the signal while the handler is running.

If another signal occurs after clearing the flag but before setting the mask then the handler runs with a pending signal. This causes caml_enter_blocking_section to go into an infinite loop.

To reproduce the bug easily, add a delay to caml_execute_signal_exn, just before masking the signal. After adding a sleep(1) there, this program hangs reliably on 5.0.0 and on the current 5.1 branch (5.1.0-alpha1-11-g48e0f9f59f):

let handle (_ : int) =
  print_endline "-> signal handler";
  Unix.sleepf 1.5;
  let rec aux () =
    match Unix.(waitpid [WNOHANG]) (-1) with
    | _, _ -> print_endline "reaped one child"; aux ()
    | exception Unix.Unix_error (Unix.ECHILD, _, _) -> ()
  in
  aux ();
  print_endline "<- signal handler"

let () =
  Sys.(set_signal sigchld) (Signal_handle handle);
  let _ = Unix.(create_process "/usr/bin/sleep" [| "sleep"; "0.1" |] stdin stdout stderr) in
  let _ = Unix.(create_process "/usr/bin/sleep" [| "sleep"; "0.2" |] stdin stdout stderr) in
  Unix.sleep 2

strace shows this continuously:

rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0

Originally reported against Eio at ocaml-multicore/eio#495.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions