Support send/recv for the gloo process group#11387
Support send/recv for the gloo process group#11387pietern wants to merge 1 commit intopytorch:masterfrom
Conversation
torch/lib/c10d/ProcessGroupGloo.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupGloo.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupGloo.hpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@teng-li In the MPI backend this is not needed as we capture it in the lambda. For the gloo backend we need to capture it because we don't have a "run" lambda. The operation gets kicked off and completes on the gloo device thread, not on the c10d thread pool. Then the returned work object can call Updated to use |
2a9777e to
dee3c39
Compare
|
build still failing |
|
The gloo bump surfaces a build issue with GCC 7. It's missing the include. Submitted pytorch/gloo#131 to fix. |
dee3c39 to
1c1b782
Compare
|
Updated gloo submodule to include the GCC 7 fix. |
teng-li
left a comment
There was a problem hiding this comment.
Looks good. Watch for the CI
facebook-github-bot
left a comment
There was a problem hiding this comment.
pietern has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
pietern is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This change removes the skips for the existing send/recv tests in the backwards compatibility layer. Pull Request resolved: pytorch#11387 Reviewed By: teng-li Differential Revision: D9729330 Pulled By: pietern fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3
This change removes the skips for the existing send/recv tests in the backwards compatibility layer.