Move nccl scatter and gather to C++ by goldsborough · Pull Request #9117 · pytorch/pytorch

goldsborough · 2018-07-02T23:10:28Z

As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since @apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" torch.cuda.comm is, but I feel like the destination_index parameter for gather should be changed from -1 indicating CPU to None indicating CPU, and -1 indicating the default CUDA device. That would make the code clearer IMO.

@apaszke @colesbury @teng-li @pietern

ssnl · 2018-07-02T23:31:16Z

We used to (and probably still do) have a lot things (public or not) that use -1 as cpu device and >= 0 as cuda device idx. So I’m not sure about that change.

ssnl · 2018-07-03T00:04:45Z

it might be a good opportunity to switching to use the device objects though :)

goldsborough · 2018-07-03T00:10:19Z

@pytorchbot retest this please

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

apaszke

Mostly LGTM, some small comments that would be good to fix (especially the GIL thing).

I disagree about changing the convention on the device argument. That would make it inconsistent with everything else in our codebase and that would be very confusing and error prone.

torch/csrc/cuda/comm.cpp

torch/csrc/cuda/python_comm.cpp

goldsborough · 2018-07-05T18:23:12Z

@apaszke is this good2go?

facebook-github-bot

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes pytorch/pytorch#9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68

Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes pytorch#9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68

goldsborough requested review from apaszke, colesbury, ezyang, gchanan, pietern, soumith, teng-li and zdevito as code owners July 2, 2018 23:10

facebook-github-bot reviewed Jul 3, 2018

View reviewed changes

apaszke reviewed Jul 3, 2018

View reviewed changes

goldsborough force-pushed the scatter-gather branch 2 times, most recently from d045170 to 6764e0d Compare July 3, 2018 22:36

apaszke reviewed Jul 4, 2018

View reviewed changes

torch/csrc/cuda/python_comm.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

goldsborough added 5 commits July 5, 2018 10:18

Move scatter and gather to C++

7dc8ead

Use AutoNoGIL and getStreamOnDevice

f2dd8ad

Make THCState_getStreamOnDevice public

6b331bd

Flip if-clause cases to be positive first

39c39be

s/AutoNoGIL/AutoGIL

90a105a

goldsborough force-pushed the scatter-gather branch from 6764e0d to 90a105a Compare July 5, 2018 17:18

soumith changed the title ~~Move scatter and gather to C++~~ Move nccl scatter and gather to C++ Jul 5, 2018

facebook-github-bot reviewed Jul 5, 2018

View reviewed changes

apaszke approved these changes Jul 6, 2018

View reviewed changes

facebook-github-bot closed this in f1ce15b Jul 6, 2018

Conversation

goldsborough commented Jul 2, 2018

Uh oh!

ssnl commented Jul 2, 2018

Uh oh!

ssnl commented Jul 3, 2018

Uh oh!

goldsborough commented Jul 3, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

goldsborough commented Jul 5, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants