Skip to content

Add all-gather and reduce-scatter coalescence support and use that in…#4145

Open
hjm-aws wants to merge 1 commit intopytorch:masterfrom
hjm-aws:cc_coalesce
Open

Add all-gather and reduce-scatter coalescence support and use that in…#4145
hjm-aws wants to merge 1 commit intopytorch:masterfrom
hjm-aws:cc_coalesce

Conversation

@hjm-aws
Copy link
Copy Markdown
Collaborator

@hjm-aws hjm-aws commented Nov 1, 2022

… FSDP. Allow using reduce-scatter's scale param in FSDP.

This PR depends on tensorflow PR tensorflow/tensorflow#58377

… FSDP. Allow using reduce-scatter's scale param in FSDP.
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Oct 18, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Oct 18, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Oct 19, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Oct 20, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Oct 20, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Nov 16, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Nov 18, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Nov 19, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)
JackCaoG pushed a commit that referenced this pull request Dec 2, 2023
* Add all-gather and reduce-scatter coalescence support for FSDP.

Also allow using reduce-scatter's scale param in FSDP.
(revived #4145)

* clang-format-7 and python lint fixes

* Fix "SyntaxError: 'return' outside function" error

* Code/test fixes to get run_tests.sh to run on CPU

* Fix allgather to be compatible with openxla allgather tuple change without token

* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

* Separate out the reduce-scatter-coalesce changes into a separate PR

* Some cleanups

* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class

* Use token_handler.GetInput to capture token

* Clean up

* Clean up

* Switch to GetOperandListWithToken naming for func GetOperandList
jeffhataws added a commit that referenced this pull request Dec 5, 2023
Also allow using reduce-scatter's scale param in FSDP.
(revived #4145)

Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

Switch to GetOperandListWithToken naming for func GetOperandList

Add separate BuildReduceScatterCoalesced builder

Use token_handler.GetInput to consume the token

If bucket_size_mb is default 0, reduce-scatter every tensor rather than coalesce

Fix error checking in xm.reduce_scatter

Move FSDP changes to another PR
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Dec 8, 2023
* Add all-gather and reduce-scatter coalescence support for FSDP.

Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)

* clang-format-7 and python lint fixes

* Fix "SyntaxError: 'return' outside function" error

* Code/test fixes to get run_tests.sh to run on CPU

* Fix allgather to be compatible with openxla allgather tuple change without token

* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

* Separate out the reduce-scatter-coalesce changes into a separate PR

* Some cleanups

* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class

* Use token_handler.GetInput to capture token

* Clean up

* Clean up

* Switch to GetOperandListWithToken naming for func GetOperandList
jeffhataws added a commit to jeffhataws/xla that referenced this pull request Dec 11, 2023
* Add all-gather and reduce-scatter coalescence support for FSDP.

Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)

* clang-format-7 and python lint fixes

* Fix "SyntaxError: 'return' outside function" error

* Code/test fixes to get run_tests.sh to run on CPU

* Fix allgather to be compatible with openxla allgather tuple change without token

* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

* Separate out the reduce-scatter-coalesce changes into a separate PR

* Some cleanups

* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class

* Use token_handler.GetInput to capture token

* Clean up

* Clean up

* Switch to GetOperandListWithToken naming for func GetOperandList
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* Add all-gather and reduce-scatter coalescence support for FSDP.

Also allow using reduce-scatter's scale param in FSDP.
(revived pytorch#4145)

* clang-format-7 and python lint fixes

* Fix "SyntaxError: 'return' outside function" error

* Code/test fixes to get run_tests.sh to run on CPU

* Fix allgather to be compatible with openxla allgather tuple change without token

* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

* Separate out the reduce-scatter-coalesce changes into a separate PR

* Some cleanups

* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class

* Use token_handler.GetInput to capture token

* Clean up

* Clean up

* Switch to GetOperandListWithToken naming for func GetOperandList
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
* Add all-gather and reduce-scatter coalescence support for FSDP.

Also allow using reduce-scatter's scale param in FSDP.
(revived #4145)

* clang-format-7 and python lint fixes

* Fix "SyntaxError: 'return' outside function" error

* Code/test fixes to get run_tests.sh to run on CPU

* Fix allgather to be compatible with openxla allgather tuple change without token

* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

* Separate out the reduce-scatter-coalesce changes into a separate PR

* Some cleanups

* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class

* Use token_handler.GetInput to capture token

* Clean up

* Clean up

* Switch to GetOperandListWithToken naming for func GetOperandList
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* Add all-gather and reduce-scatter coalescence support for FSDP.

Also allow using reduce-scatter's scale param in FSDP.
(revived #4145)

* clang-format-7 and python lint fixes

* Fix "SyntaxError: 'return' outside function" error

* Code/test fixes to get run_tests.sh to run on CPU

* Fix allgather to be compatible with openxla allgather tuple change without token

* Fix reduce-scatter-coalesce to be compatible with openxla reduce-scatter tuple change without token

* Separate out the reduce-scatter-coalesce changes into a separate PR

* Some cleanups

* Add separate BuildAllGatherCoalesced builder and AllGatherCoalesced class

* Use token_handler.GetInput to capture token

* Clean up

* Clean up

* Switch to GetOperandListWithToken naming for func GetOperandList
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant