Skip to content

Move nccl scatter and gather to C++#9117

Closed
goldsborough wants to merge 5 commits intopytorch:masterfrom
goldsborough:scatter-gather
Closed

Move nccl scatter and gather to C++#9117
goldsborough wants to merge 5 commits intopytorch:masterfrom
goldsborough:scatter-gather

Conversation

@goldsborough
Copy link
Contributor

As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since @apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" torch.cuda.comm is, but I feel like the destination_index parameter for gather should be changed from -1 indicating CPU to None indicating CPU, and -1 indicating the default CUDA device. That would make the code clearer IMO.

@apaszke @colesbury @teng-li @pietern

@ssnl
Copy link
Collaborator

ssnl commented Jul 2, 2018

We used to (and probably still do) have a lot things (public or not) that use -1 as cpu device and >= 0 as cuda device idx. So I’m not sure about that change.

@ssnl
Copy link
Collaborator

ssnl commented Jul 3, 2018

it might be a good opportunity to switching to use the device objects though :)

@goldsborough
Copy link
Contributor Author

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, some small comments that would be good to fix (especially the GIL thing).

I disagree about changing the convention on the device argument. That would make it inconsistent with everything else in our codebase and that would be very confusing and error prone.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@goldsborough goldsborough force-pushed the scatter-gather branch 2 times, most recently from d045170 to 6764e0d Compare July 3, 2018 22:36

This comment was marked as off-topic.

This comment was marked as off-topic.

@goldsborough
Copy link
Contributor Author

@apaszke is this good2go?

@soumith soumith changed the title Move scatter and gather to C++ Move nccl scatter and gather to C++ Jul 5, 2018
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 6, 2018
Summary:
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO.

apaszke colesbury teng-li pietern
Closes pytorch/pytorch#9117

Differential Revision: D8721729

Pulled By: goldsborough

fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO.

apaszke colesbury teng-li pietern
Closes pytorch/pytorch#9117

Differential Revision: D8721729

Pulled By: goldsborough

fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO.

apaszke colesbury teng-li pietern
Closes pytorch#9117

Differential Revision: D8721729

Pulled By: goldsborough

fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants