Bad locking in io.c

**Original bug ID:** 7503
**Reporter:** @stedolan
**Status:** confirmed (set by @mshinwell on 2017-03-15T16:25:51Z)
**Resolution:** open
**Priority:** normal
**Severity:** minor
**Category:** runtime system and C interface
**Related to:** #5141
**Monitored by:** @stedolan @gasche

## Bug description

With ocamlopt, the locking in io.c is not sufficient in the presence of asynchronous operations like signal handlers and finalisers. Here's an example program badlock.ml:

    let () =
      Sys.set_signal Sys.sigint (Sys.Signal_handle (fun _ ->
        print_endline "signalled!"));
      while true; do
        print_endline "hello"
      done

The exact behaviour varies depending on whether systhreads is enabled. With trunk and 4.05.0+beta2,

    ocamlopt badlock.ml -o badlock && ./badlock
    ^C

gets stuck in an infinite loop in caml_flush

    ocamlopt -runtime-variant d badlock.ml -o badlock && ./badlock
    ^C

crashes with an assertion failure (the same issue)

    ocamlopt -thread unix.cmxa threads.cmxa badlock.ml -o badlock && ./badlock
    ^C

deadlocks in pthread_mutex_lock

Version 4.04 and before do not initialise systhreads when merely linked with threads.cmxa if the program does not use any thread features, so they behave as the first case even when compiled with -thread.


I think I understand what's going on in the deadlock case, since I came across this issue when trying to fix a similar problem in the multicore branch (until KC pointed out that the same problem likely exists on trunk). Here's a backtrace from the deadlock in 4.05.0+beta2:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f2cd0d32bc5 in __GI___pthread_mutex_lock (mutex=0x563e7210ea60) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000563e70e55a6e in caml_io_mutex_lock ()
#3  0x0000563e70e6920d in caml_ml_output_bytes (vchannel=<optimized out>, buff=<optimized out>, start=<optimized out>, 
    length=<optimized out>) at io.c:660
#4  0x0000563e70e2e0da in camlPervasives__output_string_1213 () at pervasives.ml:349
#5  0x0000563e70e2e97a in camlPervasives__print_endline_1310 () at pervasives.ml:472
#6  0x0000563e70e786f4 in caml_start_program ()
#7  0x0000563e70e5db09 in caml_execute_signal (signal_number=signal_number@entry=2, in_signal_handler=in_signal_handler@entry=0)
    at signals.c:176
#8  0x0000563e70e5dbda in caml_process_pending_signals () at signals.c:58
#9  0x0000563e70e5dc87 in caml_process_pending_signals () at signals.c:53
#10 caml_leave_blocking_section () at signals.c:131
#11 0x0000563e70e71d2b in caml_write_fd (fd=1, flags=<optimized out>, buf=buf@entry=0x563e720ee7a0, n=n@entry=6) at unix.c:94
#12 0x0000563e70e68480 in caml_flush_partial (channel=0x563e720ee750) at io.c:168
#13 0x0000563e70e68f48 in caml_flush (channel=<optimized out>) at io.c:182
#14 caml_ml_flush (vchannel=<optimized out>) at io.c:612
#15 0x0000563e70e2e9bf in camlPervasives__print_endline_1310 () at pervasives.ml:472
#16 0x0000563e70e24c47 in camlBadlock__entry ()
#17 0x0000563e70e22369 in caml_program ()
#18 0x0000563e70e786f4 in caml_start_program ()
#19 0x0000563e70e5c9b5 in caml_main (argv=0x7ffff6bb5a18) at startup.c:145
#20 0x0000563e70e2203c in main (argc=<optimized out>, argv=<optimized out>) at main.c:37

After performing the write, caml_write_fd calls leave_blocking_section which runs any pending signal handlers. However, the channel's lock is held because it was taken in caml_flush, so when the signal handler tries to re-acquire the lock it deadlocks. (Without -threads, I think it continues and corrupts the channel state, although I'm not sure of the details).

I think that it is unsafe in general to hold mutexes while interacting with the OCaml heap. Due to finalisers and signal handlers, several heap operations can run arbitrary user code (e.g. caml_alloc, caml_enter_blocking_section, and more in multicore), which is to be avoided in a critical section.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad locking in io.c #7503

Bug description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bad locking in io.c #7503

Description

Bug description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions