Skip to content

Commit adde81d

Browse files
author
wayi
committed
Update on "[Gradient Compression] Allow BatchedPowerSGD to run vanilla allreduce for the first K iterations"
Similar to #50973, allow the batched version to run vanilla allreduce for the first K iterations. This may be useful if the batched version can be applied to some use cases where the accuracy requirement is not very strict. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D26077709](https://our.internmc.facebook.com/intern/diff/D26077709/) [ghstack-poisoned]
2 parents b75bc7c + a2a5115 commit adde81d

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ def batched_powerSGD_hook(state: PowerSGDState, bucket) -> torch.futures.Future:
430430
# Run vanilla allreduce in the first `start_powerSGD_iter` iterations.
431431
if state.iter < state.start_powerSGD_iter:
432432
state.maybe_increase_iter(bucket)
433-
return default.allreduce_fut(group_to_use, input_tensor)
433+
return default._allreduce_fut(group_to_use, input_tensor)
434434

435435
# Apply PowerSGD after `start_powerSGD_iter` iterations.
436436
device = input_tensor.device

0 commit comments

Comments
 (0)