Disable test_join_running_workers for TSAN.#46966
Disable test_join_running_workers for TSAN.#46966pritamdamania87 wants to merge 1 commit intogh/pritamdamania87/178/basefrom
Conversation
These tests had false positives in TSAN for modifying thread local
variables:
```
WARNING: ThreadSanitizer: data race (pid=5364)
Write of size 8 at 0x7b2c0004ff70 by thread T2:
#0 free <null> (libtools_build_sanitizers_tsan-py.so+0xde6ad)
#1 __GI__dl_deallocate_tls
Previous write of size 1 at 0x7b2c0004ff71 by thread T3:
#0 at::GradMode::set_enabled(bool) caffe2/aten/src/ATen/core/grad_mode.cpp:20 (libcaffe2_ATen-core.so+0x40e013)
#1 torch::autograd::set_grad_enabled(_object*, _object*) caffe2/torch/csrc/autograd/init.cpp:143 (libcaffe2__C_impl_cuda.so+0x115ef0e)
#2 _PyMethodDef_RawFastCallKeywords
Thread T3 (tid=5385, finished) created by main thread at:
#0 pthread_create <null> (libtools_build_sanitizers_tsan-py.so+0xc5a86)
#1 PyThread_start_new_thread
```
Differential Revision: [D24584411](https://our.internmc.facebook.com/intern/diff/D24584411/)
[ghstack-poisoned]
These tests had false positives in TSAN for modifying thread local
variables:
```
WARNING: ThreadSanitizer: data race (pid=5364)
Write of size 8 at 0x7b2c0004ff70 by thread T2:
#0 free <null> (libtools_build_sanitizers_tsan-py.so+0xde6ad)
#1 __GI__dl_deallocate_tls
Previous write of size 1 at 0x7b2c0004ff71 by thread T3:
#0 at::GradMode::set_enabled(bool) caffe2/aten/src/ATen/core/grad_mode.cpp:20 (libcaffe2_ATen-core.so+0x40e013)
#1 torch::autograd::set_grad_enabled(_object*, _object*) caffe2/torch/csrc/autograd/init.cpp:143 (libcaffe2__C_impl_cuda.so+0x115ef0e)
#2 _PyMethodDef_RawFastCallKeywords
Thread T3 (tid=5385, finished) created by main thread at:
#0 pthread_create <null> (libtools_build_sanitizers_tsan-py.so+0xc5a86)
#1 PyThread_start_new_thread
```
Differential Revision: [D24584411](https://our.internmc.facebook.com/intern/diff/D24584411/)
ghstack-source-id: 115330433
Pull Request resolved: #46966
💊 CI failures summary and remediationsAs of commit 094694f (more details on the Dr. CI page):
XLA failureJob pytorch_xla_linux_bionic_py3_6_clang9_test is failing. Please create an issue with title prefixed by This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 1 time. |
|
Why do we think this is TSAN misreporting the error? In my experience TSAN is pretty accurate. Could this issue perhaps be resolved by using atomic loads/saves? |
TSAN is complaining about this line: |
|
Why shouldn't it have any race? My understanding is that unless you are explicitly marking it as atomic, the compiler is allowed to split up any store/load in multiple low-level operations. (I guess in practice on modern machines this doesn't happen, but it's not something we should rely on). |
|
For example, we had a very similar issue with TSAN reporting a race in libuv when setting global static variables. The maintainers there agreed this was a bug and fixed it by using atomic stores: libuv/libuv#2886 |
As per my understanding, the thread local variable is storage exclusive to only that thread. Two separate threads will never operate on the same thread local memory unless we actually pass a pointer to that thread local to another thread (which I don't think is happening here). |
|
Oh yes right, I had missed that. The trace above though looks like the store races with the destruction of the variable and TSAN claims that the latter is performed by another thread. I also don't know how atomics handle races between loads/stores and destruction... So, well, I don't have much more to contribute on this, sorry for the hold-up. |
|
This pull request has been merged in ad260ae. |
Summary: Pull Request resolved: pytorch#46966 These tests had false positives in TSAN for modifying thread local variables: ``` WARNING: ThreadSanitizer: data race (pid=5364) Write of size 8 at 0x7b2c0004ff70 by thread T2: #0 free <null> (libtools_build_sanitizers_tsan-py.so+0xde6ad) pytorch#1 __GI__dl_deallocate_tls Previous write of size 1 at 0x7b2c0004ff71 by thread T3: #0 at::GradMode::set_enabled(bool) caffe2/aten/src/ATen/core/grad_mode.cpp:20 (libcaffe2_ATen-core.so+0x40e013) pytorch#1 torch::autograd::set_grad_enabled(_object*, _object*) caffe2/torch/csrc/autograd/init.cpp:143 (libcaffe2__C_impl_cuda.so+0x115ef0e) pytorch#2 _PyMethodDef_RawFastCallKeywords Thread T3 (tid=5385, finished) created by main thread at: #0 pthread_create <null> (libtools_build_sanitizers_tsan-py.so+0xc5a86) pytorch#1 PyThread_start_new_thread ``` ghstack-source-id: 115330433 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24584411 fbshipit-source-id: e35f704dfcb7b161a13a4902beaf8b1e41ccd596
Stack from ghstack:
These tests had false positives in TSAN for modifying thread local
variables:
Differential Revision: D24584411