Skip to content

Use allgatherv for sparse allreduce #22226

@pietern

Description

@pietern

The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv, and avoid the padding trick, once #22225 is merged. This is mostly a win for memory usage is there is severe size imbalance between processes. The runtime likely won't change much, do to the nature of the underlying allgather implementation (it takes N steps where each step takes an amount of time proportional to the size of the largest contribution).

Metadata

Metadata

Assignees

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixoncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions