[Don't Review][ci-all Test]Enable GPU-to-GPU comm in TensorPipeAgent by mrshenli · Pull Request #50494 · pytorch/pytorch

mrshenli · 2021-01-13T19:26:05Z

ci-all test for #44418

facebook-github-bot · 2021-01-13T19:26:24Z

💊 CI failures summary and remediations

As of commit 120f934 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Pull Request resolved: #44418 This commit uses TensorPipe's cuda_ipc channel to conduct cross-process same-machine GPU-to-GPU communication. On the sender side, `TensorPipeAgent` grabs a stream to each device used by the message, let these streams wait for current streams, and passes the streams to TensorPipe `CudaBuffer`. On the receiver side, it also grabs a stream for each device used in the message, and uses these streams to receive tensors and run user functions. After that, these streams are then used for sending the response back to the sender. When receiving the response, the sender will grab a new set of streams and use them for TensorPipe's `CudaBuffer`. If device maps are provided, `TensorPipeAgent::send` will return a derived class of `CUDAFuture`, which is specifically tailored for RPC Messages. TODOs: 1. Enable sending CUDA RPC to the same process. 2. Add a custom CUDA stream pool. 3. When TensorPipe addressed the error for `cudaPointerGetAttributes()`, remove `cuda:0` context initialization code in `backend_registry.py`. 4. When TensorPipe can detect availability of peer access, enable all tests on platforms without peer access. Differential Revision: [D23626207](https://our.internmc.facebook.com/intern/diff/D23626207/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23626207/)! ghstack-source-id: 119821241

mrshenli requested review from mingzhe09088, pritamdamania87, rohan-varma and zhaojuanmao as code owners January 13, 2021 19:26

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jan 13, 2021

mrshenli mentioned this pull request Jan 13, 2021

Enable GPU-to-GPU comm in TensorPipeAgent #44418

Closed

mrshenli force-pushed the ci-all/mrshenli branch from cea0297 to 120f934 Compare January 14, 2021 18:33

mrshenli closed this Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Don't Review][ci-all Test]Enable GPU-to-GPU comm in TensorPipeAgent#50494

[Don't Review][ci-all Test]Enable GPU-to-GPU comm in TensorPipeAgent#50494
mrshenli wants to merge 1 commit intomasterfrom
ci-all/mrshenli

mrshenli commented Jan 13, 2021

Uh oh!

facebook-github-bot commented Jan 13, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrshenli commented Jan 13, 2021

Uh oh!

facebook-github-bot commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

facebook-github-bot commented Jan 13, 2021 •

edited

Loading