Skip to content

[c10d] Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups#10058

Closed
teng-li wants to merge 4 commits intopytorch:masterfrom
teng-li:pg_nccl_ops
Closed

[c10d] Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups#10058
teng-li wants to merge 4 commits intopytorch:masterfrom
teng-li:pg_nccl_ops

Conversation

@teng-li
Copy link
Contributor

@teng-li teng-li commented Jul 31, 2018

Added

  • Reduce (both NCCL and MPI)
  • AllGather (both NCCL and MPI)
  • Gather (MPI)
  • Scatter (MPI)

for c10d process groups. This basically finalizes all supported ops for C10d to match THD.

All ops are tested as well.

mpirun -np 8 ./ProcessGroupMPITest
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
./ProcessGroupNCCLTest
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful

@teng-li teng-li requested a review from apaszke July 31, 2018 06:06
@teng-li teng-li requested a review from pietern as a code owner July 31, 2018 06:06
@teng-li teng-li changed the title [c10d] Added Reduce/Gather/Scatter/AllGather Ops for NCCL and MPI process groups [c10d] Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups Jul 31, 2018
@teng-li teng-li added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 31, 2018
install(TARGETS c10d ARCHIVE DESTINATION lib)

option(BUILD_EXAMPLES "Build examples" OFF)
option(BUILD_EXAMPLES "Build examples" ON)

This comment was marked as off-topic.

This comment was marked as off-topic.

throw std::runtime_error("Tensors are not equal in size or data type");
}
std::vector<at::Tensor> temp{tensors[i]};
checkSingleTensor(temp);

This comment was marked as off-topic.

This comment was marked as off-topic.


std::function<void(std::unique_ptr<WorkEntry>&)> runFunc =
[opts, this](std::unique_ptr<WorkEntry>& entry) {
auto data = (*entry->src)[0];

This comment was marked as off-topic.

This comment was marked as off-topic.

if (outputTensors.size() != 1) {
throw std::runtime_error(
"MPI process group only supports a single "
"tensor op");

This comment was marked as off-topic.

This comment was marked as off-topic.

}
} else {
if (outputTensors.size() != 1) {
throw std::runtime_error("Gather: only single tensor op supported");

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Aug 14, 2018
…ts (#10159)

Summary:
Provided python binding for these four ops. Also provided nccl binding test.

Based on #10058

Please only review init.cpp, and test file.
Pull Request resolved: #10159

Reviewed By: yf225

Differential Revision: D9323192

Pulled By: teng-li

fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
…oups (pytorch#10058)

Summary:
Added
- Reduce (both NCCL and MPI)
- AllGather (both NCCL and MPI)
- Gather (MPI)
- Scatter (MPI)

for c10d process groups. This basically finalizes all supported ops for C10d to match THD.

All ops are tested as well.

```
mpirun -np 8 ./ProcessGroupMPITest
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
Test successful
```

```
./ProcessGroupNCCLTest
Allreduce test successful
Broadcast test successful
Reduce test successful
Allgather test successful
```
Pull Request resolved: pytorch#10058

Reviewed By: yf225

Differential Revision: D9316312

Pulled By: teng-li

fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
…ts (pytorch#10159)

Summary:
Provided python binding for these four ops. Also provided nccl binding test.

Based on pytorch#10058

Please only review init.cpp, and test file.
Pull Request resolved: pytorch#10159

Reviewed By: yf225

Differential Revision: D9323192

Pulled By: teng-li

fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants