Investigate Applying PowerSGD to Communication Hook for Gradient Compression

## 🚀 Feature
PowerSGD can be potentially used for gradient compression: https://arxiv.org/abs/1905.13727. Investigate this algorithm in the context of communication hook.

## Motivation

PowerSGD can still keep the associativity/linearity of the gradients after compression, and hence it can still be implemented efficiently by using the native communication library like NCCL. It can compress every tensor of size M*N that represents a variable into 2 smaller tensors M * rank and N * rank for communication. Note that 3D or higher-rank tensors can also be supported, and the compression ratio can be computed by viewing the higher-rank tensor as a 2D tensor.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @xush6528

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

🚀 Feature

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Description

🚀 Feature

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions