Skip to content

Deadlock with multithreaded fork on OSX 10.12 #895

@alexcrichton

Description

@alexcrichton

Over at rust-lang/rust we recently upgraded jemalloc to 4.5.0, and I believe I've started noticing a deadlock locally in our tests when running on OSX 10.12. Specifically, the following program will deadlock:

#include <assert.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

static void work(void) {
  int child = fork();
  assert(child >= 0);
  if (child == 0) {
    char *args[] = {"sleep", "1000", NULL};
    execvp(args[0], args);
    printf("failed to spawn child: %s\n", strerror(errno));
    assert(0);
  }
  assert(kill(child, SIGKILL) == 0);
}

static void *worker(void *arg) {
  void *a = malloc(4);
  assert(a != NULL);
  work();
  free(a);
  return arg;
}

int main() {
  pthread_t child;
  assert(pthread_create(&child, NULL, worker, NULL) == 0);
  work();
  assert(pthread_join(child, NULL) == 0);
}

I built this via:

$ git clone https://github.com/jemalloc/jemalloc
$ cd jemalloc
$ git rev-parse HEAD
00869e39a334f3d869dfb9f8e651c2de3dded76f
$ ./autogen.sh
$ make -j10
# edit program above into `foo.c`
$ gcc foo.c lib/libjemalloc.dylib -o foo
$ DYLD_LIBRARY_PATH=lib ./foo

The deadlock isn't deterministic for me, but happens pretty frequently. It shouldn't take more than 10 or so tries to witness the deadlock. I'm personally running OSX 10.12.5 locally with what is I believe Xcode 8.3.2.

The backtrace at the time of deadlock looks like:

(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fffacf20f46 libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fffad00d6e5 libsystem_pthread.dylib`pthread_join + 425
    frame #2: 0x000000010e27dc0c foo`main + 124
    frame #3: 0x00007fffacdf2235 libdyld.dylib`start + 1
    frame #4: 0x00007fffacdf2235 libdyld.dylib`start + 1

  thread #2
    frame #0: 0x00007fffacf2131e libsystem_kernel.dylib`__ulock_wait + 10
    frame #1: 0x00007fffad004aff libsystem_platform.dylib`_os_ulock_wait + 25
    frame #2: 0x00007fffad0043d2 libsystem_platform.dylib`_os_unfair_lock_lock_slow + 130
    frame #3: 0x000000010e2b32f8 libjemalloc.2.dylib`je_malloc_mutex_lock_slow [inlined] malloc_mutex_lock_final(mutex=<unavailable>) at mutex.h:141 [opt]
    frame #4: 0x000000010e2b32f0 libjemalloc.2.dylib`je_malloc_mutex_lock_slow(mutex=0x000000010e530358) at mutex.c:83 [opt]
    frame #5: 0x000000010e29249c libjemalloc.2.dylib`je_arena_malloc_hard [inlined] malloc_mutex_lock at mutex.h:205 [opt]
    frame #6: 0x000000010e292480 libjemalloc.2.dylib`je_arena_malloc_hard [inlined] arena_malloc_small at arena.c:1495 [opt]
    frame #7: 0x000000010e292471 libjemalloc.2.dylib`je_arena_malloc_hard(tsdn=0x0000000000000000, arena=<unavailable>, size=<unavailable>, ind=30, zero=<unavailable>) at arena.c:1551 [opt]
    frame #8: 0x000000010e281b72 libjemalloc.2.dylib`a0ialloc [inlined] arena_malloc(tsdn=<unavailable>, arena=0x000000010e52a980, size=6144, slow_path=true) at arena_inlines_b.h:112 [opt]
    frame #9: 0x000000010e281b5e libjemalloc.2.dylib`a0ialloc [inlined] iallocztm(ind=30, slow_path=true) at jemalloc_internal_inlines_c.h:33 [opt]
    frame #10: 0x000000010e281b5e libjemalloc.2.dylib`a0ialloc(size=6144, zero=false, is_internal=true) at jemalloc.c:233 [opt]
    frame #11: 0x000000010e2833f8 libjemalloc.2.dylib`je_malloc at tsd_generic.h:73 [opt]
    frame #12: 0x000000010e2833d2 libjemalloc.2.dylib`je_malloc [inlined] tsd_get(init=true) at tsd_generic.h:140 [opt]
    frame #13: 0x000000010e2833d2 libjemalloc.2.dylib`je_malloc [inlined] tsd_fetch_impl(init=true) at tsd.h:254 [opt]
    frame #14: 0x000000010e2833d2 libjemalloc.2.dylib`je_malloc [inlined] tsd_fetch at tsd.h:272 [opt]
    frame #15: 0x000000010e2833d2 libjemalloc.2.dylib`je_malloc [inlined] imalloc at jemalloc.c:1907 [opt]
    frame #16: 0x000000010e28319e libjemalloc.2.dylib`je_malloc(size=4) at jemalloc.c:1944 [opt]
    frame #17: 0x00007fffacf74282 libsystem_malloc.dylib`malloc_zone_malloc + 107
    frame #18: 0x00007fffacf73200 libsystem_malloc.dylib`malloc + 24
    frame #19: 0x000000010e27dc7b foo`worker + 27
    frame #20: 0x00007fffad00b93b libsystem_pthread.dylib`_pthread_body + 180
    frame #21: 0x00007fffad00b887 libsystem_pthread.dylib`_pthread_start + 286
    frame #22: 0x00007fffad00b08d libsystem_pthread.dylib`thread_start + 13

This in turn led me to #843, although it's not precisely the same as this. Perhaps that'd fix this issue though?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions