Skip to content

Add reduce-scatter coalescing for FSDP/ZeRO1#5938

Closed
jeffhataws wants to merge 7 commits intopytorch:masterfrom
jeffhataws:cc_coalesce_reducescatter
Closed

Add reduce-scatter coalescing for FSDP/ZeRO1#5938
jeffhataws wants to merge 7 commits intopytorch:masterfrom
jeffhataws:cc_coalesce_reducescatter

Conversation

@jeffhataws
Copy link
Copy Markdown
Collaborator

@jeffhataws jeffhataws commented Nov 29, 2023

(Replaced by #5956)

This PR adds reduce-scatter coalescence support and use that in FSDP/ZeRO1. This also enables using reduce-scatter's scale param in FSD.. This PR is companion to #5624 and to be used in conjunction with openxla openxla/xla#5740 .

This is a revival of #4145 . Will need to address the comments.

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Nov 29, 2023

Build seems to fail with oom, let's use a larger machine. can you modify

https://github.com/pytorch/xla/blob/master/.github/workflows/build_and_test.yml#L22

in your pr and add

runner: linux.12xlarge

I think we used larger machine to run the test, but use default machine to build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants