[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device by wayi1 · Pull Request #49711 · pytorch/pytorch

wayi1 · 2020-12-21T22:52:47Z

Stack from ghstack:

[Gradient Compression] Directly let world_size = group_to_use.size() #49715 [Gradient Compression] Directly let world_size = group_to_use.size()
[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device #49711 [Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device
[Gradient Compression] Change wait() to value() in some callbacks of PowerSGD communication hook #49709 [Gradient Compression] Change wait() to value() in some callbacks of PowerSGD communication hook
[Gradient Compression] Warm-start of PowerSGD #49451 [Gradient Compression] Warm-start of PowerSGD

torch.cuda.synchronize uses the current device by default. Explicitly specify this device for better readability.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25672267

…nchronize to the current device `torch.cuda.synchronize` uses the current device by default. Explicitly specify this device for better readability. Differential Revision: [D25672267](https://our.internmc.facebook.com/intern/diff/D25672267/) [ghstack-poisoned]

facebook-github-bot · 2020-12-21T22:52:56Z

💊 CI failures summary and remediations

As of commit b288aee (more details on the Dr. CI page):

4/5 failures possibly* introduced in this PR
- 1/4 non-CircleCI failure(s)
1/5 broken upstream at merge base 838d1f6 on Dec 21 from 1:01pm to 7:47pm

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at b288aee57d Update on "[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device"
+ git reset --hard b288aee57d1e940572635509a8133c59e82f82ad
HEAD is now at b288aee57d Update on "[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device"
+ git merge --allow-unrelated-histories --no-edit --no-ff 7b4a7661d6de659c8423015a2f3e93308eb83850
Auto-merging torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
CONFLICT (content): Merge conflict in torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at b288aee57d Update on "[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device"
+ git reset --hard b288aee57d1e940572635509a8133c59e82f82ad
HEAD is now at b288aee57d Update on "[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device"
+ git merge --allow-unrelated-histories --no-edit --no-ff 7b4a7661d6de659c8423015a2f3e93308eb83850
Auto-merging torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
CONFLICT (content): Merge conflict in torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_linux_xenial_py3_6_gcc5_4_build (3/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at b288aee57d Update on "[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device"
+ git reset --hard b288aee57d1e940572635509a8133c59e82f82ad
HEAD is now at b288aee57d Update on "[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device"
+ git merge --allow-unrelated-histories --no-edit --no-ff 7b4a7661d6de659c8423015a2f3e93308eb83850
Auto-merging torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
CONFLICT (content): Merge conflict in torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

1 job timed out:

pytorch_linux_bionic_py3_8_gcc9_coverage_test1

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_linux_bionic_py3_8_gcc9_coverage_test1 on Dec 21 from 1:01pm to 7:47pm (476cabd - fdf02ef)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 9 times.

…rch.cuda.synchronize to the current device" `torch.cuda.synchronize` uses the current device by default. Explicitly specify this device for better readability. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25672267](https://our.internmc.facebook.com/intern/diff/D25672267/) [ghstack-poisoned]

facebook-github-bot · 2020-12-23T09:16:05Z

This pull request has been merged in 88c33ff.

…nchronize to the current device (pytorch#49711) Summary: Pull Request resolved: pytorch#49711 `torch.cuda.synchronize` uses the current device by default. Explicitly specify this device for better readability. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 119017654 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25672267 fbshipit-source-id: 62a2266727a2ea76175f3c438daf20951091c771

wayi1 requested review from mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners December 21, 2020 22:52

facebook-github-bot added the cla signed label Dec 21, 2020

wayi1 mentioned this pull request Dec 21, 2020

[Gradient Compression] Warm-start of PowerSGD #49451

Closed

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Dec 21, 2020

wayi1 mentioned this pull request Dec 21, 2020

[Gradient Compression] Change wait() to value() in some callbacks of PowerSGD communication hook #49709

Closed

wayi1 mentioned this pull request Dec 21, 2020

[Gradient Compression] Replace the assertions in PowerSGD comm hook by stream syncrhonization #49435

Closed

wayi1 mentioned this pull request Dec 21, 2020

[Gradient Compression] Directly let world_size = group_to_use.size() #49715

Closed

rohan-varma approved these changes Dec 23, 2020

View reviewed changes

facebook-github-bot closed this in 88c33ff Dec 23, 2020

facebook-github-bot added the Merged label Dec 23, 2020

facebook-github-bot deleted the gh/SciPioneer/41/head branch December 26, 2020 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device#49711

[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device#49711
wayi1 wants to merge 3 commits intogh/SciPioneer/41/basefrom
gh/SciPioneer/41/head

wayi1 commented Dec 21, 2020 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 21, 2020 •

edited

Loading

Uh oh!

facebook-github-bot commented Dec 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wayi1 commented Dec 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 3 new failures recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_build (1/3)

pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/3)

pytorch_linux_xenial_py3_6_gcc5_4_build (3/3)

🚧 1 fixed upstream failure:

Uh oh!

facebook-github-bot commented Dec 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wayi1 commented Dec 21, 2020 •

edited

Loading

facebook-github-bot commented Dec 21, 2020 •

edited

Loading