Use reduce-scatter coalescing for FSDP#6024
Merged
Conversation
Collaborator
|
I guess we need to rebase to the master once the dependent PR is landed? |
dd3bdac to
1bfae0e
Compare
Collaborator
Author
I had rebased over the dependent PR #5956 |
1bfae0e to
66636c8
Compare
alanwaketan
reviewed
Dec 14, 2023
Collaborator
alanwaketan
left a comment
There was a problem hiding this comment.
The change makes a lot of sense to support coalescing reduce-scatter. Just one question, what if I don't need this feature and want to preserve the initial behavior where the reduce-scatter is fired immediately?
I wish I have the resources to perform through-out performance tests in TPU but...
Therefore, will it be possible to add this as an optional feature?
f06545d to
3dce325
Compare
Collaborator
|
Let me know when it's ready for review? |
3dce325 to
aac286b
Compare
Collaborator
|
@alanwaketan can you take a look at this one? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR uses reduce-scatter coalescence in FSDP in addition to reduce-scatter's scale param. This PR is companion to #5950 and #5956 and to be used in conjunction with openxla openxla/xla#5740 .
This is a revival of #4145.