Skip to content

Get latest changes from master#24

Closed
imaginary-person wants to merge 0 commit intoimaginary-person:argmax_argminfrom
pytorch:master
Closed

Get latest changes from master#24
imaginary-person wants to merge 0 commit intoimaginary-person:argmax_argminfrom
pytorch:master

Conversation

@imaginary-person
Copy link
Copy Markdown
Owner

Get latest changes from master

@imaginary-person
Copy link
Copy Markdown
Owner Author

Will have to rebase

imaginary-person pushed a commit that referenced this pull request Jul 2, 2021
Summary:
Pull Request resolved: pytorch#60987

We were seeing deadlocks as follows during shutdown:

```
Thread 1 (LWP 2432101):
#0  0x00007efca470190b in __pause_nocancel () from /lib64/libc.so.6
#1  0x00007efca49de485 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
#2  0x00007ef91d4c42c6 in __cuda_CallJitEntryPoint () from /lib64/libnvidia-ptxjitcompiler.so.1
#3  0x00007efc651ac8f1 in ?? () from /lib64/libcuda.so
#4  0x00007efc651aee03 in ?? () from /lib64/libcuda.so
#5  0x00007efc64f76b84 in ?? () from /lib64/libcuda.so
#6  0x00007efc64f77f5d in ?? () from /lib64/libcuda.so
#7  0x00007efc64eac858 in ?? () from /lib64/libcuda.so
#8  0x00007efc64eacfbc in ?? () from /lib64/libcuda.so
#9  0x00007efc7810a924 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#10 0x00007efc780fa2be in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#11 0x00007efc78111044 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#12 0x00007efc7811580a in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#13 0x00007efc78115aa4 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#14 0x00007efc781079ec in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#15 0x00007efc780e6a7a in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#16 0x00007efc7811cfa5 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#17 0x00007efc777ea98c in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#18 0x00007efc777ebd80 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#19 0x00007efc777ea2c9 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#20 0x00007efc778c2e2d in cublasDestroy_v2 () from /usr/local/cuda/lib64/libcublas.so.11
#21 0x00007efc51a3fb56 in std::_Sp_counted_ptr_inplace<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle>, std::allocator<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so
#22 0x00007efc51a3fc5f in std::shared_ptr<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >::~shared_ptr() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so
#23 0x00007efca4648b0c in __run_exit_handlers () from /lib64/libc.so.6
#24 0x00007efca4648c40 in exit () from /lib64/libc.so.6
#25 0x0000558c8852e5f9 in Py_Exit (sts=0) at /tmp/build/80754af9/python_1614362349910/work/Python/pylifecycle.c:2292
#26 0x0000558c8852e6a7 in handle_system_exit () at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:636
#27 0x0000558c8852e742 in PyErr_PrintEx (set_sys_last_vars=<optimized out>, set_sys_last_vars=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:646
#28 0x0000558c88540dd6 in PyRun_SimpleStringFlags (command=0x7efca4dc9050 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=9, pipe_handle=13)\n", flags=0x7ffe3a986110) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:457
#29 0x0000558c88540ead in pymain_run_command (cf=0x7ffe3a986110, command=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:420
#30 pymain_run_python (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:2907
#31 pymain_main (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3460
#32 0x0000558c8854122c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3495
#33 0x00007efca4632493 in __libc_start_main () from /lib64/libc.so.6
#34 0x0000558c884e5e90 in _start () at ../sysdeps/x86_64/elf/start.S:103
```

This was likely caused due to a static singleton that wasn't leaky. Following
the guidance in https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 to
use a leaky singleton instead.
ghstack-source-id: 132847448

Test Plan: Verified locally.

Reviewed By: malfet

Differential Revision: D29468866

fbshipit-source-id: 89250594c5cd2643417b1da584c658b742dc5a5c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant