Skip to content

Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

@wayi1

Description

@wayi1

🚀 Feature

PowerSGD can be potentially used for gradient compression: https://arxiv.org/abs/1905.13727. Investigate this algorithm in the context of communication hook.

Motivation

PowerSGD can still keep the associativity/linearity of the gradients after compression, and hence it can still be implemented efficiently by using the native communication library like NCCL. It can compress every tensor of size M*N that represents a variable into 2 smaller tensors M * rank and N * rank for communication. Note that 3D or higher-rank tensors can also be supported, and the compression ratio can be computed by viewing the higher-rank tensor as a 2D tensor.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @xush6528

Metadata

Metadata

Assignees

Labels

module: ddpIssues/PRs related distributed data parallel trainingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions