[Gradient Compression] Warm-start of PowerSGD#49451
[Gradient Compression] Warm-start of PowerSGD#49451wayi1 wants to merge 8 commits intogh/SciPioneer/38/basefrom
Conversation
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) ghstack-source-id: 118695432 Pull Request resolved: #49451
💊 CI failures summary and remediationsAs of commit 390dd4e (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. This comment has been revised 59 times. |
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118736193 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118751780 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118758002 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118948215 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118984540 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
rohan-varma
left a comment
There was a problem hiding this comment.
LGTM. I suppose that we'll follow up with information regarding data around the accuracy/speed tradeoffs for all of these configurations? Would be great to get some data around how the different flags trade off.
Have collected some experimental results for only one experiment so far, but still need experiment on a more training cases to draw a conclusion. |
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]
|
This pull request has been merged in c348fae. |
Summary: Pull Request resolved: pytorch#49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 119014132 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25583086 fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee
Summary: Pull Request resolved: pytorch#49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 119014132 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25583086 fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee
Stack from ghstack:
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.
This can give a better compression performance in terms of both accuracy and speed.
Also add a unit test for batched PowerSGD to test_c10d.py.
Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
Differential Revision: D25583086