Skip to content

refactor bucketing#163754

Closed
eellison wants to merge 4 commits intogh/eellison/829/basefrom
gh/eellison/829/head
Closed

refactor bucketing#163754
eellison wants to merge 4 commits intogh/eellison/829/basefrom
gh/eellison/829/head

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163754

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 59fe555 with merge base 3a7db34 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison added a commit that referenced this pull request Sep 24, 2025
ghstack-source-id: 1c7e04a
Pull Request resolved: #163754
Preparatory refactory

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
IvanKobzarev pushed a commit to IvanKobzarev/pytorch that referenced this pull request Sep 29, 2025
ghstack-source-id: 69c5e34
Pull Request resolved: pytorch#163754
Preparatory refactory

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@eellison eellison added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 29, 2025
@eellison
Copy link
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: inductor / unit-test / inductor-test / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Sep 29, 2025
In comm-compute overlap we will have a graph with:

```
def foo(...):
     ag = all_gather(...)
     hiding_compute = mm(...)
     wait(ag)
```

There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap.

Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set.

This pr adds `AugmentedGraphHelper` that adds the apis, and allows querying for dependency with this augmented graph.

Pull Request resolved: #163959
Approved by: https://github.com/v0i0, https://github.com/IvanKobzarev
ghstack dependencies: #163215, #163754
pytorchmergebot added a commit that referenced this pull request Sep 29, 2025
This reverts commit e1bd5b6.

Reverted #163754 on behalf of https://github.com/yangw-dev due to seems fails inductor/test_aten_comm_compute_reordering for macos test, see https://hud.pytorch.org/pytorch/pytorch/commit/c9b5af9a384e7ef5f95613abe1622f5f55133c3a#51526707590-box ([comment](#163215 (comment)))
@pytorchmergebot
Copy link
Collaborator

@eellison your PR has been reverted as part of the stack under #163215.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Sep 29, 2025
Preparatory refactory

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #163960

pytorchmergebot pushed a commit that referenced this pull request Sep 30, 2025
In comm-compute overlap we will have a graph with:

```
def foo(...):
     ag = all_gather(...)
     hiding_compute = mm(...)
     wait(ag)
```

There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap.

Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set.

This pr adds `AugmentedGraphHelper` that adds the apis, and allows querying for dependency with this augmented graph.

Pull Request resolved: #163959
Approved by: https://github.com/v0i0, https://github.com/IvanKobzarev
ghstack dependencies: #163215, #163754
pytorchmergebot pushed a commit that referenced this pull request Sep 30, 2025
tl;dr performs bucketing while preserving comm-compute overlap.

In comm-compute overlap we will have a graph with:

```
def foo(...):
     ag = all_gather(...)
     hiding_compute = mm(...)
     wait(ag)
```

There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap.

Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set.

We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in.

TODO:
- need to instrument fx graph so inductor respects these relationships.
- the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through
- more memory aware handling

Pull Request resolved: #163960
Approved by: https://github.com/ruisizhang123, https://github.com/v0i0, https://github.com/IvanKobzarev
ghstack dependencies: #163215, #163754, #163959
@github-actions github-actions bot deleted the gh/eellison/829/head branch October 31, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants