Skip to content

[Don't Review][ci-all Test]Enable GPU-to-GPU comm in TensorPipeAgent#50494

Closed
mrshenli wants to merge 1 commit intomasterfrom
ci-all/mrshenli
Closed

[Don't Review][ci-all Test]Enable GPU-to-GPU comm in TensorPipeAgent#50494
mrshenli wants to merge 1 commit intomasterfrom
ci-all/mrshenli

Conversation

@mrshenli
Copy link
Copy Markdown
Contributor

ci-all test for #44418

@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jan 13, 2021
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 13, 2021

💊 CI failures summary and remediations

As of commit 120f934 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Pull Request resolved: #44418


This commit uses TensorPipe's cuda_ipc channel to conduct
cross-process same-machine GPU-to-GPU communication. On the sender
side, `TensorPipeAgent` grabs a stream to each device used by the
message, let these streams wait for current streams, and passes
the streams to TensorPipe `CudaBuffer`. On the receiver side, it
also grabs a stream for each device used in the message, and uses
these streams to receive tensors and run user functions. After that,
these streams are then used for sending the response back to the
sender. When receiving the response, the sender will grab a new set
of streams and use them for TensorPipe's `CudaBuffer`.

If device maps are provided, `TensorPipeAgent::send` will return a
derived class of `CUDAFuture`, which is specifically tailored for
RPC Messages.

TODOs:
1. Enable sending CUDA RPC to the same process.
2. Add a custom CUDA stream pool.
3. When TensorPipe addressed the error for `cudaPointerGetAttributes()`,
remove `cuda:0` context initialization code in `backend_registry.py`.
4. When TensorPipe can detect availability of peer access, enable all
tests on platforms without peer access.

Differential Revision: [D23626207](https://our.internmc.facebook.com/intern/diff/D23626207/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23626207/)!
ghstack-source-id: 119821241
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants