Skip to content

[RFC] Enable Copy Engine all-gather in FSDP #176418

@kwen2501

Description

@kwen2501

🚀 The feature, motivation and pitch

Motivation

Recipe

  • Allocate all-gather buffer from symmetric memory
  • Init NCCL with zero-CTA policy

Alternatives

No response

Additional context

No response

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @xmfan @zhaojuanmao @mrshenli @rohan-varma @chauhang @mori360 @ppwwyyxx

Metadata

Metadata

Assignees

Labels

bot-triagedThis is a label only to be used by the auto triage botmodule: fsdpmodule: symm_memIssues and PRs of Symmetric Memoryoncall: distributedAdd this issue/PR to distributed oncall triage queue

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions