[inductor] FlexibleLayout for ExternKernelChoice for mms by coconutruben · Pull Request #161351 · pytorch/pytorch

coconutruben · 2025-08-23T02:58:20Z

Stack from ghstack (oldest at bottom):

why

if we only use ExternKernelChoice we're not doing any codegen
if we're not doing any codegen, we can use a FlexibleLayout
here, and provide deeper passes more chances to change it

what

if all the kernel template choices (KTC) are with a ExternKernelChoice
template, we switch to a FlexibleLayout before generating the choice
add a test to make sure that works as intended (FlexibleLayout for
only extern, and FixedLayout if Triton is involved)
caveats:
- because CPP, CUTLASS, and CK are not using
  V.choices.get_mm_configs yet, we turn off the optimization
  if either of those backends are in use. This will be relaxed
  once they support this too
- because Triton templates are still using their own calls
  (not a single call) to get_mm_configs, it's also turned
  off there. The next diff unifies Triton + ATEN to a single
  call to get_mm_configs and that in turn allows the optimization
  there too

testing

python3 -bb -m pytest test/inductor/test_max_autotune.py -v

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D81520584

\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` [ghstack-poisoned]

pytorch-bot · 2025-08-23T02:58:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161351

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e08c924 with merge base 468c1f9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` ghstack-source-id: 725eb11 Pull Request resolved: #161351

\# why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it \# what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

@coconutruben

…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

@coconutruben

…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

@coconutruben

…orch#161351)" This reverts commit f08487a. Reverted pytorch#161351 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

# why enable caching/overriding/filtering based on src hash later # what - KernelTemplate has a src_hash that is None by default - sha256 on TritonTemplate of the template src code - None on ExternKernelChoice to have same API # testing n/a (not in use in this change) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D81821149](https://our.internmc.facebook.com/intern/diff/D81821149) Pull Request resolved: pytorch#161468 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350, pytorch#162293

) # why - if we only use ExternKernelChoice we're not doing any codegen - if we're not doing any codegen, we can use a FlexibleLayout here, and provide deeper passes more chances to change it # what - if all the kernel template choices (KTC) are with a ExternKernelChoice template, we switch to a FlexibleLayout before generating the choice - add a test to make sure that works as intended (FlexibleLayout for only extern, and FixedLayout if Triton is involved) - caveats: - because CPP, CUTLASS, and CK are not using V.choices.get_mm_configs yet, we turn off the optimization if either of those backends are in use. This will be relaxed once they support this too - because Triton templates are still using their own calls (not a single call) to get_mm_configs, it's also turned off there. The next diff unifies Triton + ATEN to a single call to get_mm_configs and that in turn allows the optimization there too # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520584](https://our.internmc.facebook.com/intern/diff/D81520584) Pull Request resolved: pytorch#161351 Approved by: https://github.com/eellison, https://github.com/jansel

…torch#161350) # why - now everything is in place to just gather templates and run the V.choices.get_mm_configs once per op - enables any overrides inside V.choices.get_mm_configs to have a full view of the options for an op, not just for one template # what - replace multiple calls to V.choices.get_mm_configs with calls to gather the active templates, and then using those in a single call # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D81520571](https://our.internmc.facebook.com/intern/diff/D81520571) Pull Request resolved: pytorch#161350 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: pytorch#161351

…torch#162293) # why - eventually we want all templates to go through this - we're exposing this through diode as a sort of interface/API - avoid later renaming # what - rename get_mm_configs to get_template_configs - rename _finalize_mm_configs to _finalize_template_configs # testing - lintrunner - ci Differential Revision: [D81820641](https://our.internmc.facebook.com/intern/diff/D81820641) Pull Request resolved: pytorch#162293 Approved by: https://github.com/eellison ghstack dependencies: pytorch#161351, pytorch#161350

@coconutruben

…figs (pytorch#162293)" This reverts commit 30191fc. Reverted pytorch#162293 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

@coconutruben

…figs (pytorch#161350)" This reverts commit 623e623. Reverted pytorch#161350 on behalf of https://github.com/huydhn due to Check with @coconutruben and the internal failures look real ([comment](pytorch#161351 (comment)))

This was referenced Aug 23, 2025

[inductor][addmm] remove inp(unexpanded) path #161208

Closed

[inductor][mm] restructure decompose k #161026

Closed

coconutruben mentioned this pull request Aug 23, 2025

[inductor][ez] move template heuristics into dir #161097

Closed

pytorch-bot Bot added ciflow/inductor module: inductor labels Aug 23, 2025

coconutruben added the topic: not user facing topic category label Aug 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] FlexibleLayout for ExternKernelChoice for mms#161351

[inductor] FlexibleLayout for ExternKernelChoice for mms#161351
coconutruben wants to merge 38 commits intogh/coconutruben/52/basefrom
gh/coconutruben/52/head

coconutruben commented Aug 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Aug 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

coconutruben commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what

testing

Uh oh!

pytorch-bot Bot commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161351

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coconutruben commented Aug 23, 2025 •

edited

Loading

pytorch-bot Bot commented Aug 23, 2025 •

edited

Loading