-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Possible corruption bug in caml_master_lock in 4.14.1 on macOS #12636
Description
A couple of days ago, I upgraded to macOS 14 Sonoma from 13 on my M1 Macbook Pro. After doing so, I noticed that my LSP server would seemingly randomly stop working and would spin, causing 100% CPU usage.
I opened an issue (ocaml/ocaml-lsp#1194) thinking it was maybe a bug in the LSP server itself, but upon further investigation it appears that it might be a bug in the runtime.
Unfortunately, I haven't been able to find an easily reproducible example for this, but I'll continue to try to do so and outline my investigation here.
Once the LSP froze, I attached to it with lldb to see if I could glean some more info that way. Immediately after doing so, a breakpoint is hit (even though I hadn't set any):
* thread #1, stop reason = EXC_BREAKPOINT (code=1, subcode=0x188124220)
frame #0: 0x0000000188124220 libsystem_platform.dylib`_os_unfair_lock_corruption_abort + 52
libsystem_platform.dylib`:
-> 0x188124220 <+52>: brk #0x1
0x188124224 <+56>: stp x20, x21, [sp, #-0x10]!
0x188124228 <+60>: adrp x20, 0
0x18812422c <+64>: add x20, x20, #0x95d ; "BUG IN CLIENT OF LIBPLATFORM: os_unfair_lock is corrupt"
The backtrace at this point looks like the following:
* thread #1, stop reason = EXC_BREAKPOINT (code=1, subcode=0x188124220)
* frame #0: 0x0000000188124220 libsystem_platform.dylib`_os_unfair_lock_corruption_abort + 52
frame #1: 0x000000018811f788 libsystem_platform.dylib`_os_unfair_lock_lock_slow + 332
frame #2: 0x00000001880ee3f0 libsystem_pthread.dylib`pthread_mutex_destroy + 64
frame #3: 0x00000001008144fc ocamllsp`caml_thread_reinitialize [inlined] st_mutex_destroy(m=0x00006000016dc040) at st_posix.h:228:8 [opt]
frame #4: 0x00000001008144f4 ocamllsp`caml_thread_reinitialize at st_stubs.c:425:7 [opt]
frame #5: 0x00000001880f6f7c libsystem_pthread.dylib`_pthread_atfork_child_handlers + 76
frame #6: 0x0000000187faeb90 libsystem_c.dylib`fork + 112
frame #7: 0x000000010081323c ocamllsp`spawn_unix(v_env=<unavailable>, v_cwd=<unavailable>, v_prog=<unavailable>, v_argv=<unavailable>, v_stdin=<unavailable>, v_stdout=<unavailable>, v_stderr=<unavailable>, v_use_vfork=<unavailable>, v_setpgid=1) at spawn_stubs.c:439:43 [opt]
frame #8: 0x0000000100849fc0 ocamllsp`caml_c_call + 28
frame #9: 0x0000000100661e38 ocamllsp`camlSpawn__spawn_inner_997 + 152
frame #10: 0x00000001003ba90c ocamllsp`camlOcaml_lsp_server__run_in_directory_5864 + 524
frame #11: 0x000000010044e728 ocamllsp`camlMerlin_kernel__Pparse__apply_rewriter_786 + 296
frame #12: 0x000000010044ec28 ocamllsp`camlMerlin_kernel__Pparse__rewrite_957 + 144
frame #13: 0x000000010044eca4 ocamllsp`camlMerlin_kernel__Pparse__apply_rewriters_str_inner_1399 + 68
frame #14: 0x000000010044ed74 ocamllsp`camlMerlin_kernel__Pparse__apply_rewriters_1089 + 52
frame #15: 0x0000000100464564 ocamllsp`camlMerlin_kernel__Mppx__fun_1203 + 60
frame #16: 0x0000000100464428 ocamllsp`camlMerlin_kernel__Mppx__code_begin + 136
frame #17: 0x000000010044f81c ocamllsp`camlMerlin_kernel__Phase_cache__apply_inner_528 + 852
frame #18: 0x0000000100465ca0 ocamllsp`camlMerlin_kernel__Mpipeline__fun_1778 + 152
frame #19: 0x000000010064bdd0 ocamllsp`camlMerlin_utils__Misc__try_finally_inner_3715 + 48
frame #20: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
frame #21: 0x000000010046482c ocamllsp`camlMerlin_kernel__Mpipeline__fun_1613 + 172
frame #22: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
frame #23: 0x0000000100465d44 ocamllsp`camlMerlin_kernel__Mpipeline__fun_1791 + 60
frame #24: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
frame #25: 0x000000010046482c ocamllsp`camlMerlin_kernel__Mpipeline__fun_1613 + 172
frame #26: 0x00000001007b993c ocamllsp`camlCamlinternalLazy__force_lazy_block_362 + 140
frame #27: 0x000000010042acb0 ocamllsp`camlQuery_commands__dispatch_2992 + 7320
frame #28: 0x0000000100646d24 ocamllsp`camlMerlin_utils__Std__let_ref_3289 + 60
frame #29: 0x000000010064bdd0 ocamllsp`camlMerlin_utils__Misc__try_finally_inner_3715 + 48
frame #30: 0x00000001007f8048 ocamllsp`camlStdlib__Fun__protect_317 + 96
frame #31: 0x000000010045acd8 ocamllsp`camlMerlin_kernel__Mocaml__with_state_438 + 80
frame #32: 0x000000010037dd44 ocamllsp`camlOcaml_lsp_server__Document__fun_4117 + 116
frame #33: 0x00000001007a1d18 ocamllsp`camlStdune__Exn_with_backtrace__try_with_422 + 40
frame #34: 0x00000001006645c8 ocamllsp`camlLev_fiber__do_no_raise_1549 + 24
frame #35: 0x000000010066fd38 ocamllsp`camlLev_fiber_util__Worker__run_322 + 88
frame #36: 0x0000000100671354 ocamllsp`camlThread__fun_850 + 44
frame #37: 0x000000010084a030 ocamllsp`caml_start_program + 104
frame #38: 0x000000010083ffa0 ocamllsp`caml_callback_exn(closure=<unavailable>, arg=1) at callback.c:111:10 [opt]
frame #39: 0x0000000100814768 ocamllsp`caml_thread_start(arg=<unavailable>) at st_stubs.c:549:5 [opt]
frame #40: 0x00000001880f3034 libsystem_pthread.dylib`_pthread_start + 136
This seems to refer to code similar to this found in libplatform in the version of Darwin used by macOS 14: https://github.com/apple/darwin-libplatform/blob/215b09856ab5765b7462a91be7076183076600df/src/os/lock.c#L136
Dumping the register state here is also a bit informative:
General Purpose Registers:
x0 = 0x00000000000007fa
x1 = 0x0000000000000000
x2 = 0x00000000000007fa
x3 = 0x0000000000000000
x4 = 0x00000d5200000000
x5 = 0x0000040300000001
x6 = 0x0000000000002000
x7 = 0x0000000000000000
x8 = 0x00000000000007fa
x9 = 0x00000000000005fa
x10 = 0x000000000003fe00
x11 = 0x0000000100e33a38 ocamllsp`caml_master_lock + 24
x12 = 0x0000000100e33a38 ocamllsp`caml_master_lock + 24
x13 = 0x00000000ffff8128
x14 = 0x0000000000000010
x15 = 0x00000000ffff7dff
x16 = 0x0000000000000203
x17 = 0x00000001e85e8e58
x18 = 0x0000000000000000
x19 = 0x0000000000000303
x20 = 0x0000000000050000
x21 = 0x00006000016dc048
x22 = 0x0000000001050002
x23 = 0x00000000000007fa
x24 = 0x0000000000000000
x25 = 0x00000000ffffffff
x26 = 0x0000000000000303
x27 = 0x0000000000000000
x28 = 0x000000016fc367b0
fp = 0x000000016fc36610
lr = 0x000000018811f788 libsystem_platform.dylib`_os_unfair_lock_lock_slow + 332
sp = 0x000000016fc365d0
pc = 0x0000000188124220 libsystem_platform.dylib`_os_unfair_lock_corruption_abort + 52
cpsr = 0x80001000
It's unclear to me what specifically x11 and x12 are being used for here, but they both contain pointers inside caml_master_lock, which libplatform seems to believe the lock inside is corrupt.
The backtrace also makes mention of caml_thread_reinitialize, which appears to be calling st_mutex_destroy over all open channels:
ocaml/otherlibs/systhreads/st_stubs.c
Lines 421 to 428 in 49bff4c
| for (chan = caml_all_opened_channels; | |
| chan != NULL; | |
| chan = chan->next) { | |
| if (chan->mutex != NULL) { | |
| st_mutex_destroy(chan->mutex); | |
| chan->mutex = NULL; | |
| } | |
| } |
Unfortunately that's sort of where my investigation has ended -- I can't seem to figure out if the lock is actually even corrupt or not, let alone where that's happening if so.
If anyone has any pointers for where I should look next, please let me know. I'll work on making a reproducible example.