[inductor_collectives] Make reorder_collectives_preserve_peak pass grouping nodes#157706
[inductor_collectives] Make reorder_collectives_preserve_peak pass grouping nodes#157706IvanKobzarev wants to merge 4 commits intogh/IvanKobzarev/121/basefrom
Conversation
…ouping nodes [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157706
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c2f642c with merge base d27d361 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…eak pass grouping nodes" cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
|
@IvanKobzarev has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…eak pass grouping nodes" cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D77861765](https://our.internmc.facebook.com/intern/diff/D77861765) [ghstack-poisoned]
|
@IvanKobzarev has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
| def _temp_group_visit_leaves(snode, fn): | ||
| from torch._inductor.scheduler import GroupedSchedulerNode | ||
|
|
||
| if isinstance(snode, GroupedSchedulerNode) and snode.temp_grouping: |
There was a problem hiding this comment.
is temp_grouping a new concept you created? Does this imply that some other GroupedSchedulerNodes that are not temporary could exist at this time? If so, is calling 'fn(node)' the right thing to do for them?
There was a problem hiding this comment.
Yes, need to deduplicate this.
torch/_inductor/comms.py
Outdated
| compute_time += runtimes[snode] | ||
| if isinstance(snode, GroupedSchedulerNode) and snode.temp_grouping: | ||
|
|
||
| def _fn(_snode): |
There was a problem hiding this comment.
nit, move this fn def to above next to def of compute_time int, and rename it to def accumulate_time then call it in both branches (temp_visit(accumulate)) vs accumulate(snode))
torch/_inductor/comms.py
Outdated
| if contains_collective(prev_gsnode): | ||
| return False | ||
|
|
||
| if contains_ungroupable(prev_gsnode): |
There was a problem hiding this comment.
nit: a bit arbitrary that "contains_collective" is a separate catetory than '"contains_ungroupable". Would it be more readable to rename "contains_ungroupable" to "contains_gemm_like" ?
wconstab
left a comment
There was a problem hiding this comment.
LGTM! the grouping method is quite unintrusive.
…eak pass grouping nodes" cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D77861765](https://our.internmc.facebook.com/intern/diff/D77861765) [ghstack-poisoned]
|
@IvanKobzarev has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
|
@IvanKobzarev has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| return self.initial_exposed - self.final_exposed | ||
|
|
||
|
|
||
| def is_gemm_like(node: Optional[Union[IRNode, Operation]]) -> bool: |
There was a problem hiding this comment.
is "gemm-like" here a proxy for "has meaningful flops?" (which would include other fallback ops like e.g. custom matmul triton kernels from users).
I wonder if there are any other inductor utils to tell us this info with higher fidelity... Maybe @eellison will have ideas when he's back
There was a problem hiding this comment.
Hi, yes. At the moment this is super basic "heuristic" to not group collectives with ops with which we want to reorder. Agree, that will be good to have something based on "flops".
Differential Revision: [D77861763](https://our.internmc.facebook.com/intern/diff/D77861763) Pull Request resolved: #157708 Approved by: https://github.com/wconstab ghstack dependencies: #157706
Stack from ghstack (oldest at bottom):
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
Differential Revision: D77861765