[distributed] Handle object collectives and NCCL. by kumpera · Pull Request #79034 · pytorch/pytorch

kumpera · 2022-06-07T17:54:40Z

This fixes all object collectives under NCCL and adds some automated tests for them.

This PR does not fix sending tensors using object collectives.

It simplifies device handling by computing the appropriate one earlier and then ensuring all tensor ops happen on it.

facebook-github-bot · 2022-06-07T17:54:46Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/79034
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 655d136 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

rohan-varma

Awesome refactoring! Looks good overall - only suggestion is around merging all of the tests.

rohan-varma · 2022-06-07T20:04:38Z

This reminds me, we should probably figure out the root cause issue of #65696, because we really don't want to be using ByteTensor and LongTensor style APIs anymore.

rohan-varma

LGTM

kumpera · 2022-06-13T19:22:08Z

@pytorchmergebot merge please

pytorch-bot · 2022-06-13T19:22:10Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: please

usage: @pytorchbot {merge,revert,rebase,help} ...

Try @pytorchbot help for more info.

kumpera · 2022-06-13T19:22:22Z

@pytorchmergebot merge

pytorchmergebot · 2022-06-13T19:23:36Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-06-13T19:24:14Z

Hey @kumpera.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: This fixes all object collectives under NCCL and adds some automated tests for them. This PR *does not* fix sending tensors using object collectives. It simplifies device handling by computing the appropriate one earlier and then ensuring all tensor ops happen on it. Pull Request resolved: #79034 Approved by: https://github.com/rohan-varma Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4ebb326b75024af600a966308965803cdc3437d9 Reviewed By: malfet Differential Revision: D37153126 Pulled By: kumpera fbshipit-source-id: caf01d830545b40d2827c336c350da3cd8ab552c

facebook-github-bot · 2022-06-15T16:14:19Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2022-06-15T16:16:18Z

@pytorchbot successfully started a revert job. Check the current status here

This reverts commit 4ebb326. Reverted #79034 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally

…9034)"" This reverts commit 279634f.

…79034) Test Plan: revert-hammer Differential Revision: D37153126 Original commit changeset: caf01d830545 Original Phabricator Diff: D37153126 fbshipit-source-id: fe023b2eaa028f0677997c63a0472d70df381253

…ctives and NCCL. (#79034) Test Plan: revert-hammer Differential Revision: D37170190 Original commit changeset: fe023b2eaa02 Original Phabricator Diff: D37153126 fbshipit-source-id: 3f28eefe13ed0867f76c39d5866430393af2c153

The docstring for scatter_object_list mentions is doesn't work with NCCL, but this was fixed in #79034 Pull Request resolved: #84596 Approved by: https://github.com/H-Huang

Summary: The docstring for scatter_object_list mentions is doesn't work with NCCL, but this was fixed in #79034 Pull Request resolved: #84596 Approved by: https://github.com/H-Huang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/e96fb5d58c2accd717f0859b510ae7facb6d6aac Reviewed By: izaitsevfb Differential Revision: D39312639 Pulled By: kumpera fbshipit-source-id: dc1b57b7ad464cf00b44ac6dbfca5349e9fd41b1

This fixes all object collectives under NCCL and adds some automated tests for them. This PR *does not* fix sending tensors using object collectives. It simplifies device handling by computing the appropriate one earlier and then ensuring all tensor ops happen on it. Pull Request resolved: pytorch#79034 Approved by: https://github.com/rohan-varma

)" This reverts commit 9c6d2d8. Reverted pytorch#79034 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally

…torch#79034)"" This reverts commit f2cac96.

The docstring for scatter_object_list mentions is doesn't work with NCCL, but this was fixed in pytorch#79034 Pull Request resolved: pytorch#84596 Approved by: https://github.com/H-Huang

kumpera requested review from H-Huang, awgu, mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners June 7, 2022 17:54

facebook-github-bot added the cla signed label Jun 7, 2022

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jun 7, 2022

rohan-varma reviewed Jun 7, 2022

View reviewed changes

Comment thread test/distributed/test_c10d_object_collectives.py Outdated

rohan-varma self-requested a review June 7, 2022 20:01

rohan-varma reviewed Jun 7, 2022

View reviewed changes

Rodrigo Kumpera added 2 commits June 9, 2022 14:13

[distributed] Handle object collectives and NCCL.

d808432

remove useless comment.

655d136

kumpera force-pushed the obj_collectives_fix branch from 658c7f9 to 655d136 Compare June 9, 2022 14:25

rohan-varma self-requested a review June 13, 2022 19:15

rohan-varma approved these changes Jun 13, 2022

View reviewed changes

pytorchmergebot added the Merged label Jun 13, 2022

pytorchmergebot closed this in 4ebb326 Jun 13, 2022

pytorchmergebot added the Reverted label Jun 15, 2022

pytorchmergebot added a commit that referenced this pull request Jun 15, 2022

Revert "[distributed] Handle object collectives and NCCL. (#79034)"

279634f

This reverts commit 4ebb326. Reverted #79034 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally

malfet added a commit that referenced this pull request Jun 15, 2022

Revert "Revert "[distributed] Handle object collectives and NCCL. (#7…

09df27f

…9034)"" This reverts commit 279634f.

zengk95 mentioned this pull request Jun 22, 2022

[Meta] CI Revert Tracker #66178

Closed

kumpera mentioned this pull request Sep 6, 2022

[c10d] Fix docstring of scatter_object_list #84596

Closed

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026

Revert "Revert "[distributed] Handle object collectives and NCCL. (py…

fc50b16

…torch#79034)"" This reverts commit f2cac96.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[distributed] Handle object collectives and NCCL.#79034

[distributed] Handle object collectives and NCCL.#79034
kumpera wants to merge 2 commits intopytorch:masterfrom
kumpera:obj_collectives_fix

kumpera commented Jun 7, 2022

Uh oh!

facebook-github-bot commented Jun 7, 2022 •

edited

Loading

Uh oh!

Uh oh!

rohan-varma left a comment

Uh oh!

Uh oh!

Uh oh!

rohan-varma Jun 7, 2022

Uh oh!

rohan-varma left a comment

Uh oh!

kumpera commented Jun 13, 2022

Uh oh!

pytorch-bot Bot commented Jun 13, 2022

Uh oh!

kumpera commented Jun 13, 2022

Uh oh!

pytorchmergebot commented Jun 13, 2022

Uh oh!

github-actions Bot commented Jun 13, 2022

Uh oh!

facebook-github-bot commented Jun 15, 2022

Uh oh!

pytorchmergebot commented Jun 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kumpera commented Jun 7, 2022

Uh oh!

facebook-github-bot commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

Uh oh!

rohan-varma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rohan-varma Jun 7, 2022

Choose a reason for hiding this comment

Uh oh!

rohan-varma left a comment

Choose a reason for hiding this comment

Uh oh!

kumpera commented Jun 13, 2022

Uh oh!

pytorch-bot Bot commented Jun 13, 2022

Uh oh!

kumpera commented Jun 13, 2022

Uh oh!

pytorchmergebot commented Jun 13, 2022

Uh oh!

github-actions Bot commented Jun 13, 2022

Uh oh!

facebook-github-bot commented Jun 15, 2022

Uh oh!

pytorchmergebot commented Jun 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Jun 7, 2022 •

edited

Loading