Release GIL as much as possible in dist_autograd pybind.#61593
Release GIL as much as possible in dist_autograd pybind.#61593pritamdamania87 wants to merge 1 commit intogh/pritamdamania87/249/basefrom
Conversation
Following the pattern in #61588 to avoid deadlocks as much as possible. Differential Revision: [D29683451](https://our.internmc.facebook.com/intern/diff/D29683451/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 1bb9e92 (more details on the Dr. CI page and at hud.pytorch.org/pr/61593):
❄️ 1 failure tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
Following the pattern in #61588 to avoid deadlocks as much as possible. Differential Revision: [D29683451](https://our.internmc.facebook.com/intern/diff/D29683451/) ghstack-source-id: 133497897 Pull Request resolved: #61593
rohan-varma
left a comment
There was a problem hiding this comment.
I wonder if we've looked into building some tool/automatic analysis to detect these sort of issues? Currently seems pretty manual and easy to forget to release GIL when calling into pybind functions.
Also, do we know why our internal TSAN tests don't catch these sort of deadlocks (at least it seems like that to me)?
|
@rohan-varma @ezyang pointed me to this: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html, which does seem to be very useful. |
|
This pull request has been merged in 7d2ea9a. |
Stack from ghstack:
Following the pattern in #61588
to avoid deadlocks as much as possible.
Differential Revision: D29683451