-
Notifications
You must be signed in to change notification settings - Fork 27.2k
Closed
Description
Steps to reproduce:
- Write a patch that eliminates
globalContext()initialization from the static initializers oflibcaffe2.so. Here is one sample branch: https://github.com/ezyang/pytorch/tree/issue/deadlocks - Build and run
It deadlocks in the following trace:
#0 0x00007ffff78eaec9 in syscall () from /lib64/libc.so.6
#1 0x00007fffcbf4c57e in __cxxabiv1::__cxa_guard_acquire (g=0x7fffdf70fb08 <guard variable for at::globalContext()::globalContext_>) at /opt/conda/conda-bld/comp
ilers_linux-64_1520532893746/work/.build/src/gcc-7.2.0/libstdc++-v3/libsupc++/guard.cc:307
#2 0x00007fffddd812d0 in at::globalContext () at ../aten/src/ATen/Context.cpp:41
#3 0x00007fffccef089b in torch::autograd::register_variable_type_for (baseType=0x5555566f78b0) at ../torch/csrc/autograd/generated/VariableType.cpp:171
#4 0x00007fffcce46588 in torch::autograd::VariableHooks::registerVariableTypeFor (this=0x5555566f7920, context=0x7fffdf70fb20 <at::globalContext()::globalContext
_>, backend=at::Backend::CPU, scalar_type=at::ScalarType::Byte) at ../torch/csrc/autograd/aten_variable_hooks.cpp:21
#5 0x00007fffddfb3e43 in at::Type::registerCPU (context=0x7fffdf70fb20 <at::globalContext()::globalContext_>) at aten/src/ATen/Type.cpp:40
#6 0x00007fffddd8120f in at::Context::Context (this=0x7fffdf70fb20 <at::globalContext()::globalContext_>) at ../aten/src/ATen/Context.cpp:37
#7 0x00007fffddd812eb in at::globalContext () at ../aten/src/ATen/Context.cpp:41
#8 0x00007fffcd04c2f5 in torch::autograd::VariableTypeRegistry::VariableTypeRegistry (this=0x7fffcdce60e8 <torch::autograd::registry>) at ../torch/csrc/autograd/
generated/VariableType.cpp:176
#9 0x00007fffcd0491ef in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at ../torch/csrc/autograd/generated/VariableType.cpp:188
#10 0x00007fffcd049225 in _GLOBAL__sub_I_VariableType.cpp(void) () at ../torch/csrc/autograd/generated/VariableType.cpp:31075
#11 0x00007ffff7deab03 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
(More stack frames follow...)
(gdb) bt
#0 0x00007ffff78eaec9 in syscall () from /lib64/libc.so.6
#1 0x00007fffcbf4c57e in __cxxabiv1::__cxa_guard_acquire (g=0x7fffdf70fb08 <guard variable for at::globalContext()::globalContext_>) at /opt/conda/conda-bld/comp
ilers_linux-64_1520532893746/work/.build/src/gcc-7.2.0/libstdc++-v3/libsupc++/guard.cc:307
#2 0x00007fffddd812d0 in at::globalContext () at ../aten/src/ATen/Context.cpp:41
#3 0x00007fffccef089b in torch::autograd::register_variable_type_for (baseType=0x5555566f78b0) at ../torch/csrc/autograd/generated/VariableType.cpp:171
#4 0x00007fffcce46588 in torch::autograd::VariableHooks::registerVariableTypeFor (this=0x5555566f7920, context=0x7fffdf70fb20 <at::globalContext()::globalContext
_>, backend=at::Backend::CPU, scalar_type=at::ScalarType::Byte) at ../torch/csrc/autograd/aten_variable_hooks.cpp:21
#5 0x00007fffddfb3e43 in at::Type::registerCPU (context=0x7fffdf70fb20 <at::globalContext()::globalContext_>) at aten/src/ATen/Type.cpp:40
#6 0x00007fffddd8120f in at::Context::Context (this=0x7fffdf70fb20 <at::globalContext()::globalContext_>) at ../aten/src/ATen/Context.cpp:37
#7 0x00007fffddd812eb in at::globalContext () at ../aten/src/ATen/Context.cpp:41
#8 0x00007fffcd04c2f5 in torch::autograd::VariableTypeRegistry::VariableTypeRegistry (this=0x7fffcdce60e8 <torch::autograd::registry>) at ../torch/csrc/autograd/
generated/VariableType.cpp:176
#9 0x00007fffcd0491ef in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at ../torch/csrc/autograd/generated/VariableType.cpp:188
#10 0x00007fffcd049225 in _GLOBAL__sub_I_VariableType.cpp(void) () at ../torch/csrc/autograd/generated/VariableType.cpp:31075
#11 0x00007ffff7deab03 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff7def6de in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#13 0x00007ffff7dea914 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#14 0x00007ffff7deeccb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#15 0x00007ffff75eefbb in dlopen_doit () from /lib64/libdl.so.2
#16 0x00007ffff7dea914 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#17 0x00007ffff75ef5bd in _dlerror_run () from /lib64/libdl.so.2
#18 0x00007ffff75ef051 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
So there is a circular call of globalContext(), but ONLY if the variable registration static initializer is called before the Context static initializer. Ugh.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels