Add Comm-Compute Preserving Bucketer by eellison · Pull Request #163960 · pytorch/pytorch

eellison · 2025-09-26T14:49:43Z

Stack from ghstack (oldest at bottom):

tl;dr performs bucketing while preserving comm-compute overlap.

In comm-compute overlap we will have a graph with:

def foo(...):
     ag = all_gather(...)
     hiding_compute = mm(...)
     wait(ag)

There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap.

Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set.

We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in.

TODO:

need to instrument fx graph so inductor respects these relationships.
the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through
more memory aware handling

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

[ghstack-poisoned]

pytorch-bot · 2025-09-26T14:49:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163960

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1072d8a with merge base 3a7db34 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 63342e8 Pull Request resolved: #163960

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 16b9e45 Pull Request resolved: #163960

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 9fe12da Pull Request resolved: #163960

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: b9532ae Pull Request resolved: #163960

tl;dr performs bucketing while preserving comm-compute overlap. In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in. TODO: - need to instrument fx graph so inductor respects these relationships. - the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through - more memory aware handling cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 81a45da Pull Request resolved: #163960

ruisizhang123

LGTM, thank you!

torch/_inductor/fx_passes/overlap_preserving_bucketer.py

tl;dr performs bucketing while preserving comm-compute overlap. In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in. TODO: - need to instrument fx graph so inductor respects these relationships. - the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through - more memory aware handling cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 1ce95b2 Pull Request resolved: #163960

ezyang · 2025-09-29T03:37:52Z

torch/_inductor/fx_passes/bucketing.py

    insert_before: Optional[torch.fx.Node] = None,
    wait_insertion_point: Optional[torch.fx.Node] = None,
-) -> dict[torch.fx.Node, torch.fx.Node]:
+) -> tuple[list[torch.fx.Node], dict[torch.fx.Node, torch.fx.Node]]:


severe tuple blindness lol

ezyang · 2025-09-29T03:39:21Z

I didn't do a detailed review, but ACKing the high level approach

torch/_inductor/fx_passes/overlap_scheduling.py

ghstack-source-id: 1ce95b2 Pull Request resolved: pytorch#163960

tl;dr performs bucketing while preserving comm-compute overlap. In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in. TODO: - need to instrument fx graph so inductor respects these relationships. - the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through - more memory aware handling cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: cbeac24 Pull Request resolved: #163960

eellison · 2025-09-29T20:46:07Z

@pytorchbot merge

pytorchmergebot · 2025-09-29T20:48:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

tl;dr performs bucketing while preserving comm-compute overlap. In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in. TODO: - need to instrument fx graph so inductor respects these relationships. - the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through - more memory aware handling cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: bcb9860 Pull Request resolved: #163960

pytorchmergebot · 2025-09-29T22:38:41Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

eellison · 2025-09-30T01:30:46Z

@pytorchbot merge

pytorchmergebot · 2025-09-30T01:32:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This pr adds the autobucketing pass at aten-level to simplefsdp. It runs autobucketing + aot_eager backend without inductor. The aten fx autobucketing pass can be find in this PR: pytorch/pytorch#163960. Key updates are: 1. Support customized `aot_eger_autobucketing` backend to perform autobucketing optimization. 2. In simplefsdp, the model_backend can be replaced by user's customized passes using `compile.model_backend_override`.

Add Comm-Compute Preserving Bucketer

ef55c54

[ghstack-poisoned]

This was referenced Sep 24, 2025

[inductor] do comm compute overlap at aten fx level #163215

Closed

refactor bucketing #163754

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 26, 2025

eellison mentioned this pull request Sep 26, 2025

Helper to augment graph with additional deps #163959

Closed

eellison added a commit that referenced this pull request Sep 26, 2025

Add Comm-Compute Preserving Bucketer

5a07135

ghstack-source-id: 63342e8 Pull Request resolved: #163960

Update on "Add Comm-Compute Preserving Bucketer"

2cc5a4b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 26, 2025

Add Comm-Compute Preserving Bucketer

96cc176

ghstack-source-id: 16b9e45 Pull Request resolved: #163960

Update on "Add Comm-Compute Preserving Bucketer"

7f81d21

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 26, 2025

Add Comm-Compute Preserving Bucketer

17e3d9c

ghstack-source-id: 9fe12da Pull Request resolved: #163960

eellison requested review from IvanKobzarev and ruisizhang123 September 26, 2025 16:20

Update on "Add Comm-Compute Preserving Bucketer"

9747e66

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

eellison requested a review from fmassa September 26, 2025 16:36

eellison added a commit that referenced this pull request Sep 26, 2025

Add Comm-Compute Preserving Bucketer

945aee5

ghstack-source-id: b9532ae Pull Request resolved: #163960

eellison added the topic: not user facing topic category label Sep 26, 2025

eellison requested a review from ezyang September 26, 2025 17:11

eellison added a commit that referenced this pull request Sep 26, 2025

Add Comm-Compute Preserving Bucketer

21fe553

ghstack-source-id: 81a45da Pull Request resolved: #163960

ruisizhang123 approved these changes Sep 26, 2025

View reviewed changes

torch/_inductor/fx_passes/overlap_preserving_bucketer.py Show resolved Hide resolved

torch/_inductor/fx_passes/overlap_preserving_bucketer.py Show resolved Hide resolved

eellison added a commit that referenced this pull request Sep 26, 2025

Add Comm-Compute Preserving Bucketer

1fb1d1f

ghstack-source-id: 1ce95b2 Pull Request resolved: #163960

v0i0 approved these changes Sep 27, 2025

View reviewed changes

ezyang reviewed Sep 29, 2025

View reviewed changes

IvanKobzarev approved these changes Sep 29, 2025

View reviewed changes

torch/_inductor/fx_passes/overlap_scheduling.py Outdated Show resolved Hide resolved

IvanKobzarev pushed a commit to IvanKobzarev/pytorch that referenced this pull request Sep 29, 2025

Add Comm-Compute Preserving Bucketer

69f412d

ghstack-source-id: 1ce95b2 Pull Request resolved: pytorch#163960

pytorchmergebot removed the merging label Sep 29, 2025

eellison added a commit that referenced this pull request Sep 29, 2025

Add Comm-Compute Preserving Bucketer

54e2b4b

ghstack-source-id: cbeac24 Pull Request resolved: #163960

pytorchmergebot added the merging label Sep 29, 2025

eellison added a commit that referenced this pull request Sep 29, 2025

Add Comm-Compute Preserving Bucketer

5bd0f61

ghstack-source-id: bcb9860 Pull Request resolved: #163960

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 29, 2025

pytorchmergebot removed the merging label Sep 29, 2025

ruisizhang123 mentioned this pull request Sep 29, 2025

add aten/inudctor autobucketing pass meta-pytorch/autoparallel#173

Merged

pytorchmergebot added the merging label Sep 30, 2025

pytorchmergebot added the Merged label Sep 30, 2025

pytorchmergebot closed this in 7d59e37 Sep 30, 2025

pytorchmergebot removed the merging label Sep 30, 2025

ruisizhang123 mentioned this pull request Sep 30, 2025

[autoparallel] add aten autobucketing pass pytorch/torchtitan#1774

Closed

This was referenced Oct 3, 2025

Add hop for additional control dependencies #164568

Closed

respect aten planned overlap in inductor #164569

Closed

ruisizhang123 mentioned this pull request Oct 9, 2025

add auto_eager_graph_pass pytorch/torchtitan#1813

Merged

github-actions bot deleted the gh/eellison/831/head branch October 31, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Comm-Compute Preserving Bucketer#163960

Add Comm-Compute Preserving Bucketer#163960
eellison wants to merge 10 commits intogh/eellison/831/basefrom
gh/eellison/831/head

eellison commented Sep 26, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

Uh oh!

ruisizhang123 left a comment

Uh oh!

Uh oh!

Uh oh!

ezyang Sep 29, 2025

Uh oh!

ezyang commented Sep 29, 2025

Uh oh!

Uh oh!

eellison commented Sep 29, 2025

Uh oh!

pytorchmergebot commented Sep 29, 2025

Uh oh!

pytorchmergebot commented Sep 29, 2025

Uh oh!

eellison commented Sep 30, 2025

Uh oh!

pytorchmergebot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

eellison commented Sep 26, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163960

✅ No Failures

Uh oh!

ruisizhang123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ezyang Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang commented Sep 29, 2025

Uh oh!

Uh oh!

eellison commented Sep 29, 2025

Uh oh!

pytorchmergebot commented Sep 29, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 29, 2025

Merge failed

Uh oh!

eellison commented Sep 30, 2025

Uh oh!

pytorchmergebot commented Sep 30, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

eellison commented Sep 26, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading