[c10d] Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups#10058
[c10d] Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups#10058teng-li wants to merge 4 commits intopytorch:masterfrom
Conversation
| install(TARGETS c10d ARCHIVE DESTINATION lib) | ||
|
|
||
| option(BUILD_EXAMPLES "Build examples" OFF) | ||
| option(BUILD_EXAMPLES "Build examples" ON) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupMPI.cpp
Outdated
| throw std::runtime_error("Tensors are not equal in size or data type"); | ||
| } | ||
| std::vector<at::Tensor> temp{tensors[i]}; | ||
| checkSingleTensor(temp); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| std::function<void(std::unique_ptr<WorkEntry>&)> runFunc = | ||
| [opts, this](std::unique_ptr<WorkEntry>& entry) { | ||
| auto data = (*entry->src)[0]; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if (outputTensors.size() != 1) { | ||
| throw std::runtime_error( | ||
| "MPI process group only supports a single " | ||
| "tensor op"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/ProcessGroupMPI.cpp
Outdated
| } | ||
| } else { | ||
| if (outputTensors.size() != 1) { | ||
| throw std::runtime_error("Gather: only single tensor op supported"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…ts (#10159) Summary: Provided python binding for these four ops. Also provided nccl binding test. Based on #10058 Please only review init.cpp, and test file. Pull Request resolved: #10159 Reviewed By: yf225 Differential Revision: D9323192 Pulled By: teng-li fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e
…oups (pytorch#10058) Summary: Added - Reduce (both NCCL and MPI) - AllGather (both NCCL and MPI) - Gather (MPI) - Scatter (MPI) for c10d process groups. This basically finalizes all supported ops for C10d to match THD. All ops are tested as well. ``` mpirun -np 8 ./ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` ``` ./ProcessGroupNCCLTest Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` Pull Request resolved: pytorch#10058 Reviewed By: yf225 Differential Revision: D9316312 Pulled By: teng-li fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f
…ts (pytorch#10159) Summary: Provided python binding for these four ops. Also provided nccl binding test. Based on pytorch#10058 Please only review init.cpp, and test file. Pull Request resolved: pytorch#10159 Reviewed By: yf225 Differential Revision: D9323192 Pulled By: teng-li fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e
Added
for c10d process groups. This basically finalizes all supported ops for C10d to match THD.
All ops are tested as well.