Move nccl scatter and gather to C++#9117
Move nccl scatter and gather to C++#9117goldsborough wants to merge 5 commits intopytorch:masterfrom
Conversation
|
We used to (and probably still do) have a lot things (public or not) that use -1 as cpu device and >= 0 as cuda device idx. So I’m not sure about that change. |
|
it might be a good opportunity to switching to use the device objects though :) |
|
@pytorchbot retest this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
apaszke
left a comment
There was a problem hiding this comment.
Mostly LGTM, some small comments that would be good to fix (especially the GIL thing).
I disagree about changing the convention on the device argument. That would make it inconsistent with everything else in our codebase and that would be very confusing and error prone.
torch/csrc/cuda/comm.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/csrc/cuda/comm.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/csrc/cuda/comm.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/csrc/cuda/python_comm.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
d045170 to
6764e0d
Compare
torch/csrc/cuda/python_comm.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
6764e0d to
90a105a
Compare
|
@apaszke is this good2go? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes pytorch/pytorch#9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes pytorch/pytorch#9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes pytorch#9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since @apaszke had rewritten broadcast in C++ already.
I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.
I don't know how "public"
torch.cuda.commis, but I feel like thedestination_indexparameter forgathershould be changed from -1 indicating CPU toNoneindicating CPU, and-1indicating the default CUDA device. That would make the code clearer IMO.@apaszke @colesbury @teng-li @pietern