[inductor][mm] restructure decompose k#161026
[inductor][mm] restructure decompose k#161026coconutruben wants to merge 19 commits intogh/coconutruben/29/basefrom
Conversation
\# why - make it easier to integrate into lookup table later \# what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161026
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 23d847e with merge base 2efcf9d ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
\# why - make it easier to integrate into lookup table later \# what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do \# testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` ghstack-source-id: cf790e5 Pull Request resolved: #161026
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
5 similar comments
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) [ghstack-poisoned]
|
@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Starting merge as part of PR stack under #161098 |
# why - simplify the expansion of heuristics beyond just triton (e.g. decomposeK) # what - move template heuristics and registry into its own folder - adjust imports accordingly # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D80670917](https://our.internmc.facebook.com/intern/diff/D80670917) Pull Request resolved: #161097 Approved by: https://github.com/PaulZhang12, https://github.com/jansel ghstack dependencies: #161026
# why - enable it to go through commont template heuristics point - make easier to use in common extension point e.g. lookup table # what - break template heuristic into base + triton - move k_split generation logic into a templateheuristic for decompose k - register through normal mechanism - to make testing work, add a context manager to temporarily set template heuristics for a template/op to empty (effectively skipping it). This is used for decompose k test to disable triton choices # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D80670918](https://our.internmc.facebook.com/intern/diff/D80670918) Pull Request resolved: #161098 Approved by: https://github.com/jansel ghstack dependencies: #161026, #161097
# why - make it easier to integrate into lookup table later # what - current version generates templates on the fly and uses them to generate a single choice - lookup table and performance model work best when there is a stable set of templates (with predictable names) and those are then parametrized - this change makes it so that there is a single DecomposeK template with a stable name, and the k split is the only parametrization we do # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v ``` Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913) Pull Request resolved: pytorch#161026 Approved by: https://github.com/PaulZhang12, https://github.com/jansel
# why - simplify the expansion of heuristics beyond just triton (e.g. decomposeK) # what - move template heuristics and registry into its own folder - adjust imports accordingly # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D80670917](https://our.internmc.facebook.com/intern/diff/D80670917) Pull Request resolved: pytorch#161097 Approved by: https://github.com/PaulZhang12, https://github.com/jansel ghstack dependencies: pytorch#161026
) # why - enable it to go through commont template heuristics point - make easier to use in common extension point e.g. lookup table # what - break template heuristic into base + triton - move k_split generation logic into a templateheuristic for decompose k - register through normal mechanism - to make testing work, add a context manager to temporarily set template heuristics for a template/op to empty (effectively skipping it). This is used for decompose k test to disable triton choices # testing ``` python3 -bb -m pytest test/inductor/test_max_autotune.py -v ``` Differential Revision: [D80670918](https://our.internmc.facebook.com/intern/diff/D80670918) Pull Request resolved: pytorch#161098 Approved by: https://github.com/jansel ghstack dependencies: pytorch#161026, pytorch#161097
Stack from ghstack (oldest at bottom):
why
what
to generate a single choice
stable set of templates (with predictable names) and those
are then parametrized
with a stable name, and the k split is the only parametrization we do
testing
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
Differential Revision: D80670913