Skip to content
This repository was archived by the owner on Jun 21, 2024. It is now read-only.
This repository was archived by the owner on Jun 21, 2024. It is now read-only.

Segfault using domains and effects together #770

@talex5

Description

@talex5

Describe the issue

Sometimes when I run code in a domain, the GC segfaults. I saw this in Eio, but I now have a test-case that doesn't use it. It only happens if I also link with the threads library.

To reproduce

There is a gist with the code here:

https://gist.github.com/talex5/3852aeebd437436fc516e4ddc77a7e03

Building this Dockerfile produces the problem for me:

FROM ocaml/opam:debian-11-ocaml-4.12-domains
RUN sudo apt-get install wget
RUN wget https://gist.github.com/talex5/3852aeebd437436fc516e4ddc77a7e03/archive/70dca297f7ffce0498c07363be2569d79d971f9b.zip -O demo.zip
RUN unzip -j demo.zip
RUN opam install dune
RUN strings /home/opam/.opam/4.12/bin/ocamlc | grep OCAML_RUNTIME_BUILD_GIT_HASH_IS
RUN opam exec -- dune exec -- ./test.exe

I get this output:

Sending build context to Docker daemon  2.048kB
Step 1/7 : FROM ocaml/opam:debian-11-ocaml-4.12-domains
 ---> dfeb77d7ef7f
Step 2/7 : RUN sudo apt-get install wget
 ---> Using cache
 ---> f3f22235d2c2
Step 3/7 : RUN wget https://gist.github.com/talex5/3852aeebd437436fc516e4ddc77a7e03/archive/70dca297f7ffce0498c07363be2569d79d971f9b.zip -O demo.zip
 ---> Using cache
 ---> 15e20474ce4f
Step 4/7 : RUN unzip -j demo.zip
 ---> Using cache
 ---> 2d90c4738763
Step 5/7 : RUN opam install dune
 ---> Using cache
 ---> 323d01a88f43
Step 6/7 : RUN strings /home/opam/.opam/4.12/bin/ocamlc | grep OCAML_RUNTIME_BUILD_GIT_HASH_IS
 ---> Running in aa80cc86a82a
OCAML_RUNTIME_BUILD_GIT_HASH_IS_6be47af176
Removing intermediate container aa80cc86a82a
 ---> 28d9e3eb3f3e
Step 7/7 : RUN opam exec -- dune exec -- ./test.exe
 ---> Running in eaffd42e4615
Info: Creating file dune-project with this contents:
| (lang dune 2.9)
OK
OK
OK
Segmentation fault (core dumped)
The command '/bin/sh -c opam exec -- dune exec -- ./test.exe' returned a non-zero code: 139

Multicore OCaml build version

OCAML_RUNTIME_BUILD_GIT_HASH_IS_6be47af176

Did you try running it with the debug runtime and heap verification ON?

Yes. It makes it less likely to crash but it still happens sometimes.

Backtrace

Thread 1 received signal SIGSEGV, Segmentation fault.
caml_darken (v=0, ignored=<optimized out>, state=<optimized out>) at major_gc.c:761
761	major_gc.c: No such file or directory.
(rr) t a a bt

Thread 3 (Thread 207106.207127 (mmap_hardlink_3_test.exe)):
#0  __lll_lock_wait (futex=futex@entry=0x55b832e56090 <all_domains+464>, private=0) at lowlevellock.c:52
#1  0x00007fb03ec89843 in __GI___pthread_mutex_lock (mutex=0x55b832e56090 <all_domains+464>) at ../nptl/pthread_mutex_lock.c:80
#2  0x000055b832dee151 in caml_plat_lock (m=0x55b832e56090 <all_domains+464>) at caml/platform.h:125
#3  backup_thread_func (v=0x55b832e55fe8 <all_domains+296>) at domain.c:623
#4  0x00007fb03ec86ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007fb03ea6cdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 207106.207109 (mmap_hardlink_3_test.exe)):
#0  __lll_lock_wait (futex=futex@entry=0x55b832e55f68 <all_domains+168>, private=0) at lowlevellock.c:52
#1  0x00007fb03ec89843 in __GI___pthread_mutex_lock (mutex=0x55b832e55f68 <all_domains+168>) at ../nptl/pthread_mutex_lock.c:80
#2  0x000055b832dee151 in caml_plat_lock (m=0x55b832e55f68 <all_domains+168>) at caml/platform.h:125
#3  backup_thread_func (v=0x55b832e55ec0 <all_domains>) at domain.c:623
#4  0x00007fb03ec86ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007fb03ea6cdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 207106.207106 (mmap_hardlink_3_test.exe)):
#0  caml_darken (v=0, ignored=<optimized out>, state=<optimized out>) at major_gc.c:761
#1  0x000055b832de7676 in caml_iterate_global_roots (rootlist=0x55b832e55580 <caml_global_roots_old>, rootlist=0x55b832e55580 <caml_global_roots_old>, fdata=0x0, f=0x55b832dcf570 <caml_darken>) at globroots.c:222
#2  caml_scan_global_roots (f=f@entry=0x55b832dcf570 <caml_darken>, fdata=fdata@entry=0x0) at globroots.c:233
#3  0x000055b832dd0008 in cycle_all_domains_callback (domain=domain@entry=0x7fb03e780000, unused=unused@entry=0x0, participating_count=<optimized out>, participating=participating@entry=0x55b832e5f360 <stw_request+64>) at major_gc.c:1093
#4  0x000055b832def4ee in caml_try_run_on_all_domains_with_spin_work (handler=handler@entry=0x55b832dcff40 <cycle_all_domains_callback>, data=data@entry=0x0, leader_setup=leader_setup@entry=0x0, enter_spin_callback=enter_spin_callback@entry=0x0, enter_spin_data=enter_spin_data@entry=0x0) at domain.c:985
#5  0x000055b832def60a in caml_try_run_on_all_domains (handler=handler@entry=0x55b832dcff40 <cycle_all_domains_callback>, data=data@entry=0x0, leader_setup=leader_setup@entry=0x0) at domain.c:999
#6  0x000055b832dd0d4f in major_collection_slice (howmuch=<optimized out>, participant_count=participant_count@entry=0, barrier_participants=barrier_participants@entry=0x0, mode=mode@entry=Slice_interruptible) at major_gc.c:1377
#7  0x000055b832dd0ec8 in caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:1396
#8  0x000055b832dedd91 in caml_poll_gc_work () at domain.c:1038
--Type <RET> for more, q to quit, c to continue without paging--
#9  0x000055b832dee038 in stw_handler (domain=<optimized out>) at domain.c:891
#10 0x000055b832dee0d1 in handle_incoming (s=<optimized out>) at domain.c:219
#11 caml_handle_incoming_interrupts () at domain.c:232
#12 handle_gc_interrupt () at domain.c:1060
#13 0x000055b832def667 in caml_handle_gc_interrupt () at domain.c:1077
#14 0x000055b832dce565 in caml_garbage_collection () at signals_nat.c:92
#15 0x000055b832df0a33 in caml_call_gc ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions