[functional_collective] remove the logic that forces torch-xla to use legacy funcol by yifuwang · Pull Request #123776 · pytorch/pytorch

yifuwang · 2024-04-10T22:07:14Z

Stack from ghstack (oldest at bottom):

[functional collective] change the Python APIs to only use the native funcol ops #123777
-> [functional_collective] remove the logic that forces torch-xla to use legacy funcol #123776

After pytorch/xla#6887, torch-xla now also uses
the all_reduce from native funcol. So we can remove this logic.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang

… legacy funcol After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]

pytorch-bot · 2024-04-10T22:07:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123776

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2be8459 with merge base 585cd11 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh)
test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]

wanchaol

Nice! I wonder if we should update the xla pin to include the xla side of changes? cc @alanwaketan

alanwaketan · 2024-04-12T17:40:43Z

Nice! I wonder if we should update the xla pin to include the xla side of changes? cc @alanwaketan

Let's do it. Just grep xla.txt and use the hash of the head of torch-xla.

…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]

… funcol ops (#123777) ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. Pull Request resolved: #123777 Approved by: https://github.com/wanchaol ghstack dependencies: #123776

… legacy funcol (pytorch#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: pytorch#123776 Approved by: https://github.com/wanchaol

… funcol ops (pytorch#123777) ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. Pull Request resolved: pytorch#123777 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#123776

… legacy funcol (pytorch#123776) After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. Pull Request resolved: pytorch#123776 Approved by: https://github.com/wanchaol

… funcol ops (pytorch#123777) ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` https://github.com/pytorch/pytorch/blob/b7fac76fc259394136bc77b3e39d5705919e5c4c/test/distributed/test_c10d_functional_native.py#L114-L116 `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. Pull Request resolved: pytorch#123777 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#123776

[functional_collective] remove the logic that forces torch-xla to use…

01f4b19

… legacy funcol After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]

pytorch-bot Bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 10, 2024

yifuwang mentioned this pull request Apr 10, 2024

[functional collective] change the Python APIs to only use the native funcol ops #123777

Closed

yifuwang marked this pull request as ready for review April 10, 2024 22:13

yifuwang requested review from wanchaol and wconstab April 10, 2024 22:13

Update on "[functional_collective] remove the logic that forces torch…

0e6462e

…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]

yifuwang requested a review from yf225 April 11, 2024 21:15

wanchaol approved these changes Apr 12, 2024

View reviewed changes

Update on "[functional_collective] remove the logic that forces torch…

2be8459

…-xla to use legacy funcol" After pytorch/xla#6887, torch-xla now also uses the all_reduce from native funcol. So we can remove this logic. [ghstack-poisoned]

yifuwang added the topic: not user facing topic category label Apr 13, 2024

pytorchmergebot closed this in 2da3e11 Apr 13, 2024

pytorchmergebot added the Merged label Apr 13, 2024

github-actions Bot deleted the gh/yifuwang/77/head branch May 14, 2024 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[functional_collective] remove the logic that forces torch-xla to use legacy funcol#123776

[functional_collective] remove the logic that forces torch-xla to use legacy funcol#123776
yifuwang wants to merge 3 commits intogh/yifuwang/77/basefrom
gh/yifuwang/77/head

yifuwang commented Apr 10, 2024 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 10, 2024 •

edited

Loading

Uh oh!

wanchaol left a comment

Uh oh!

alanwaketan commented Apr 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yifuwang commented Apr 10, 2024 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123776

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

alanwaketan commented Apr 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yifuwang commented Apr 10, 2024 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Apr 10, 2024 •

edited

Loading