Skip to content

Deadlock with mallocx + fork #315

@alexcrichton

Description

@alexcrichton

We recently upgraded to using jemalloc 4.0.4 in Rust and we've started seeing a number of intermittent deadlocks on our buildbots. Reproducing locally it looks like the two deadlocked threads have a stack trace that looks like:

Thread 12 (Thread 0x7f35667fc700 (LWP 26304)):                             
#0  __lll_lock_wait ()                                                     
#1  0x00007f357079368d in _L_lock_1082 ()                                  
#2  0x00007f3570793607 in __GI___pthread_mutex_lock ()
#3  0x00007f357808c0a7 in je_arena_prefork ()                              
#4  0x00007f3578078c7f in je_jemalloc_prefork ()                           
#5  0x00007f3577ca3fc2 in __libc_fork ()                                   

Thread 4 (Thread 0x7f3567dff700 (LWP 26341)):                              
#0  __lll_lock_wait ()                                                     
#1  0x00007f357079368d in _L_lock_1082 ()                                  
#2  0x00007f3570793607 in __GI___pthread_mutex_lock ()
#3  0x00007f357807a8df in je_arena_get_hard ()                             
#4  0x00007f357808cba4 in chunk_purge_default ()                           
#5  0x00007f357808e042 in je_chunk_dalloc_arena ()                         
#6  0x00007f35780848d2 in arena_unstash_purged ()                          
#7  0x00007f35780849e8 in je_arena_maybe_purge ()                          
#8  0x00007f35780840ab in arena_run_dalloc ()                              
#9  0x00007f3578084b0b in arena_dalloc_bin_run ()                          
#10 0x00007f357808784a in arena_dalloc_bin_locked_impl ()                  
#11 0x00007f35780a7cc1 in je_tcache_bin_flush_small ()                     
#12 0x00007f35780a8238 in je_tcache_event_hard ()                          
#13 0x00007f357807ce82 in je_mallocx ()                                    

Using GDB and printing the arguments to pthread_mutex_lock shows that both threads are trying to grab mutexes which are held by the other.

Unfortunately I haven't been able to reproduce this just yet in a smaller example. A simple "fork and mallocx concurrently" program doesn't seem to exhibit this behavior, but I can't seem to trigger quite the right stack trace as was found when the bug was reproduced.

Does this sound like something that may have been introduced recently? Could we help by trying to reduce it further? Let me know if you need anything else from us!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions