[DistAutograd x JIT] Capture global state, dist autograd current context id, before thread switching triggered by JIT future.wait()#36395
[DistAutograd x JIT] Capture global state, dist autograd current context id, before thread switching triggered by JIT future.wait()#36395
Conversation
Differential Revision: D7857991 Differential Version: 101928130
💊 Build failures summary and remediationsAs of commit 6e535e3 (more details on the Dr. CI page):
Extra GitHub checks
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 5 times. |
pritamdamania87
left a comment
There was a problem hiding this comment.
Can we also add a unit test that would've failed without this fix?
|
@xush6528 - thanks for fixing this! Re: a test case - if it's straightforward to construct something, let's do it. My only concern is making this a hard requirement and delaying a straightforward correctness fix like this for a week. The existing c++ test coverage is frankly fairly weak - not that it shouldn't be fixed, but it's not unreasonable to do that out-of-band. |
|
It's an easy test case. make 2 rpcs in torch script and check every time we get gradients back. Without this fix, only for the tensors passed in the first rpc can get gradients back. |
Differential Revision: D7857991 Differential Version: 101948084
|
This pull request has been merged in ae452a8. |
…ext id, before thread switching triggered by JIT future.wait() (pytorch#36395) Summary: Pull Request resolved: pytorch#36395 titled Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork -- test_restore_context_after_swtich_to_jit_thread buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/jit/dist_autograd_fork\#binary.par -r test_restore_context_after_swtich_to_jit_thread ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D7857991 fbshipit-source-id: 168e0e3846a50ea92d4f9450a30ccc6c13e2fcec
…ext id, before thread switching triggered by JIT future.wait() (pytorch#36395) Summary: Pull Request resolved: pytorch#36395 titled Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork -- test_restore_context_after_swtich_to_jit_thread buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/jit/dist_autograd_fork\#binary.par -r test_restore_context_after_swtich_to_jit_thread ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D7857991 fbshipit-source-id: 168e0e3846a50ea92d4f9450a30ccc6c13e2fcec
Stack:
:black_circle: #36395 [DistAutograd x JIT] Capture global state, dist autograd current context id, before thread switching triggered by JIT future.wait() 💛
titled
Differential Revision: D7857991