[inductor] FlexibleLayout for ExternKernelChoice for mms#161351
Closed
coconutruben wants to merge 38 commits intogh/coconutruben/52/basefrom
Closed
[inductor] FlexibleLayout for ExternKernelChoice for mms#161351coconutruben wants to merge 38 commits intogh/coconutruben/52/basefrom
coconutruben wants to merge 38 commits intogh/coconutruben/52/basefrom
Conversation
\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` [ghstack-poisoned]
This was referenced Aug 23, 2025
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161351
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e08c924 with merge base 468c1f9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This was referenced Aug 21, 2025
coconutruben
added a commit
that referenced
this pull request
Aug 23, 2025
\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` ghstack-source-id: 725eb11 Pull Request resolved: #161351
\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
markc-614
pushed a commit
to markc-614/pytorch
that referenced
this pull request
Sep 17, 2025
# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
mansiag05
pushed a commit
to mansiag05/pytorch
that referenced
this pull request
Sep 22, 2025
# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
cleonard530
pushed a commit
to cleonard530/pytorch
that referenced
this pull request
Sep 22, 2025
# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
dsashidh
pushed a commit
to dsashidh/pytorch
that referenced
this pull request
Sep 26, 2025
…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
why
here, and provide deeper passes more chances to change it
what
if all the kernel template choices (KTC) are with a ExternKernelChoice
template, we switch to a FlexibleLayout before generating the choice
add a test to make sure that works as intended (FlexibleLayout for
only extern, and FixedLayout if Triton is involved)
caveats:
V.choices.get_mm_configs yet, we turn off the optimization
if either of those backends are in use. This will be relaxed
once they support this too
(not a single call) to get_mm_configs, it's also turned
off there. The next diff unifies Triton + ATEN to a single
call to get_mm_configs and that in turn allows the optimization
there too
testing
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
Differential Revision: D81520584