Skip to content

Possible corruption bug in caml_master_lock in 4.14.1 on macOS #12636

@zbaylin

Description

@zbaylin

A couple of days ago, I upgraded to macOS 14 Sonoma from 13 on my M1 Macbook Pro. After doing so, I noticed that my LSP server would seemingly randomly stop working and would spin, causing 100% CPU usage.

I opened an issue (ocaml/ocaml-lsp#1194) thinking it was maybe a bug in the LSP server itself, but upon further investigation it appears that it might be a bug in the runtime.

Unfortunately, I haven't been able to find an easily reproducible example for this, but I'll continue to try to do so and outline my investigation here.


Once the LSP froze, I attached to it with lldb to see if I could glean some more info that way. Immediately after doing so, a breakpoint is hit (even though I hadn't set any):

* thread #1, stop reason = EXC_BREAKPOINT (code=1, subcode=0x188124220)
    frame #0: 0x0000000188124220 libsystem_platform.dylib`_os_unfair_lock_corruption_abort + 52
libsystem_platform.dylib`:
->  0x188124220 <+52>: brk    #0x1
    0x188124224 <+56>: stp    x20, x21, [sp, #-0x10]!
    0x188124228 <+60>: adrp   x20, 0
    0x18812422c <+64>: add    x20, x20, #0x95d          ; "BUG IN CLIENT OF LIBPLATFORM: os_unfair_lock is corrupt"

The backtrace at this point looks like the following:

* thread #1, stop reason = EXC_BREAKPOINT (code=1, subcode=0x188124220)
  * frame #0: 0x0000000188124220 libsystem_platform.dylib`_os_unfair_lock_corruption_abort + 52
    frame #1: 0x000000018811f788 libsystem_platform.dylib`_os_unfair_lock_lock_slow + 332
    frame #2: 0x00000001880ee3f0 libsystem_pthread.dylib`pthread_mutex_destroy + 64
    frame #3: 0x00000001008144fc ocamllsp`caml_thread_reinitialize [inlined] st_mutex_destroy(m=0x00006000016dc040) at st_posix.h:228:8 [opt]
    frame #4: 0x00000001008144f4 ocamllsp`caml_thread_reinitialize at st_stubs.c:425:7 [opt]
    frame #5: 0x00000001880f6f7c libsystem_pthread.dylib`_pthread_atfork_child_handlers + 76
    frame #6: 0x0000000187faeb90 libsystem_c.dylib`fork + 112
    frame #7: 0x000000010081323c ocamllsp`spawn_unix(v_env=<unavailable>, v_cwd=<unavailable>, v_prog=<unavailable>, v_argv=<unavailable>, v_stdin=<unavailable>, v_stdout=<unavailable>, v_stderr=<unavailable>, v_use_vfork=<unavailable>, v_setpgid=1) at spawn_stubs.c:439:43 [opt]
    frame #8: 0x0000000100849fc0 ocamllsp`caml_c_call + 28
    frame #9: 0x0000000100661e38 ocamllsp`camlSpawn__spawn_inner_997 + 152
    frame #10: 0x00000001003ba90c ocamllsp`camlOcaml_lsp_server__run_in_directory_5864 + 524
    frame #11: 0x000000010044e728 ocamllsp`camlMerlin_kernel__Pparse__apply_rewriter_786 + 296
    frame #12: 0x000000010044ec28 ocamllsp`camlMerlin_kernel__Pparse__rewrite_957 + 144
    frame #13: 0x000000010044eca4 ocamllsp`camlMerlin_kernel__Pparse__apply_rewriters_str_inner_1399 + 68
    frame #14: 0x000000010044ed74 ocamllsp`camlMerlin_kernel__Pparse__apply_rewriters_1089 + 52
    frame #15: 0x0000000100464564 ocamllsp`camlMerlin_kernel__Mppx__fun_1203 + 60
    frame #16: 0x0000000100464428 ocamllsp`camlMerlin_kernel__Mppx__code_begin + 136
    frame #17: 0x000000010044f81c ocamllsp`camlMerlin_kernel__Phase_cache__apply_inner_528 + 852
    frame #18: 0x0000000100465ca0 ocamllsp`camlMerlin_kernel__Mpipeline__fun_1778 + 152
    frame #19: 0x000000010064bdd0 ocamllsp`camlMerlin_utils__Misc__try_finally_inner_3715 + 48
    frame #20: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
    frame #21: 0x000000010046482c ocamllsp`camlMerlin_kernel__Mpipeline__fun_1613 + 172
    frame #22: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
    frame #23: 0x0000000100465d44 ocamllsp`camlMerlin_kernel__Mpipeline__fun_1791 + 60
    frame #24: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
    frame #25: 0x000000010046482c ocamllsp`camlMerlin_kernel__Mpipeline__fun_1613 + 172
    frame #26: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
    frame #27: 0x000000010042acb0 ocamllsp`camlQuery_commands__dispatch_2992 + 7320
    frame #28: 0x0000000100646d24 ocamllsp`camlMerlin_utils__Std__let_ref_3289 + 60
    frame #29: 0x000000010064bdd0 ocamllsp`camlMerlin_utils__Misc__try_finally_inner_3715 + 48
    frame #30: 0x00000001007f8048 ocamllsp`camlStdlib__Fun__protect_317 + 96
    frame #31: 0x000000010045acd8 ocamllsp`camlMerlin_kernel__Mocaml__with_state_438 + 80
    frame #32: 0x000000010037dd44 ocamllsp`camlOcaml_lsp_server__Document__fun_4117 + 116
    frame #33: 0x00000001007a1d18 ocamllsp`camlStdune__Exn_with_backtrace__try_with_422 + 40
    frame #34: 0x00000001006645c8 ocamllsp`camlLev_fiber__do_no_raise_1549 + 24
    frame #35: 0x000000010066fd38 ocamllsp`camlLev_fiber_util__Worker__run_322 + 88
    frame #36: 0x0000000100671354 ocamllsp`camlThread__fun_850 + 44
    frame #37: 0x000000010084a030 ocamllsp`caml_start_program + 104
    frame #38: 0x000000010083ffa0 ocamllsp`caml_callback_exn(closure=<unavailable>, arg=1) at callback.c:111:10 [opt]
    frame #39: 0x0000000100814768 ocamllsp`caml_thread_start(arg=<unavailable>) at st_stubs.c:549:5 [opt]
    frame #40: 0x00000001880f3034 libsystem_pthread.dylib`_pthread_start + 136

This seems to refer to code similar to this found in libplatform in the version of Darwin used by macOS 14: https://github.com/apple/darwin-libplatform/blob/215b09856ab5765b7462a91be7076183076600df/src/os/lock.c#L136

Dumping the register state here is also a bit informative:

General Purpose Registers:
        x0 = 0x00000000000007fa
        x1 = 0x0000000000000000
        x2 = 0x00000000000007fa
        x3 = 0x0000000000000000
        x4 = 0x00000d5200000000
        x5 = 0x0000040300000001
        x6 = 0x0000000000002000
        x7 = 0x0000000000000000
        x8 = 0x00000000000007fa
        x9 = 0x00000000000005fa
       x10 = 0x000000000003fe00
       x11 = 0x0000000100e33a38  ocamllsp`caml_master_lock + 24
       x12 = 0x0000000100e33a38  ocamllsp`caml_master_lock + 24
       x13 = 0x00000000ffff8128
       x14 = 0x0000000000000010
       x15 = 0x00000000ffff7dff
       x16 = 0x0000000000000203
       x17 = 0x00000001e85e8e58
       x18 = 0x0000000000000000
       x19 = 0x0000000000000303
       x20 = 0x0000000000050000
       x21 = 0x00006000016dc048
       x22 = 0x0000000001050002
       x23 = 0x00000000000007fa
       x24 = 0x0000000000000000
       x25 = 0x00000000ffffffff
       x26 = 0x0000000000000303
       x27 = 0x0000000000000000
       x28 = 0x000000016fc367b0
        fp = 0x000000016fc36610
        lr = 0x000000018811f788  libsystem_platform.dylib`_os_unfair_lock_lock_slow + 332
        sp = 0x000000016fc365d0
        pc = 0x0000000188124220  libsystem_platform.dylib`_os_unfair_lock_corruption_abort + 52
      cpsr = 0x80001000

It's unclear to me what specifically x11 and x12 are being used for here, but they both contain pointers inside caml_master_lock, which libplatform seems to believe the lock inside is corrupt.

The backtrace also makes mention of caml_thread_reinitialize, which appears to be calling st_mutex_destroy over all open channels:

for (chan = caml_all_opened_channels;
chan != NULL;
chan = chan->next) {
if (chan->mutex != NULL) {
st_mutex_destroy(chan->mutex);
chan->mutex = NULL;
}
}

Unfortunately that's sort of where my investigation has ended -- I can't seem to figure out if the lock is actually even corrupt or not, let alone where that's happening if so.

If anyone has any pointers for where I should look next, please let me know. I'll work on making a reproducible example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions