Skip to content

[inductor][mm] restructure decompose k#161026

Closed
coconutruben wants to merge 19 commits intogh/coconutruben/29/basefrom
gh/coconutruben/29/head
Closed

[inductor][mm] restructure decompose k#161026
coconutruben wants to merge 19 commits intogh/coconutruben/29/basefrom
gh/coconutruben/29/head

Conversation

@coconutruben
Copy link
Copy Markdown
Contributor

@coconutruben coconutruben commented Aug 20, 2025

Stack from ghstack (oldest at bottom):

why

  • make it easier to integrate into lookup table later

what

  • current version generates templates on the fly and uses them
    to generate a single choice
  • lookup table and performance model work best when there is a
    stable set of templates (with predictable names) and those
    are then parametrized
  • this change makes it so that there is a single DecomposeK template
    with a stable name, and the k split is the only parametrization we do

testing

python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D80670913

\# why

- make it easier to integrate into lookup table later

\# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

\# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Aug 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161026

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 23d847e with merge base 2efcf9d (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

coconutruben added a commit that referenced this pull request Aug 20, 2025
\# why

- make it easier to integrate into lookup table later

\# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

\# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

ghstack-source-id: cf790e5
Pull Request resolved: #161026
@coconutruben coconutruben added the topic: not user facing topic category label Aug 20, 2025
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2025
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@PaulZhang12 PaulZhang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

5 similar comments
@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)

[ghstack-poisoned]
@coconutruben
Copy link
Copy Markdown
Contributor Author

@coconutruben has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #161098

pytorchmergebot pushed a commit that referenced this pull request Aug 28, 2025
# why

- simplify the expansion of heuristics beyond just triton (e.g.
  decomposeK)

# what

- move template heuristics and registry into its own folder
- adjust imports accordingly

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D80670917](https://our.internmc.facebook.com/intern/diff/D80670917)
Pull Request resolved: #161097
Approved by: https://github.com/PaulZhang12, https://github.com/jansel
ghstack dependencies: #161026
pytorchmergebot pushed a commit that referenced this pull request Aug 28, 2025
# why

- enable it to go through commont template heuristics point
- make easier to use in common extension point e.g. lookup table

# what

- break template heuristic into base + triton
- move k_split generation logic into a templateheuristic for decompose k
- register through normal mechanism

- to make testing work, add a context manager to temporarily set
  template heuristics for a template/op to empty (effectively skipping
  it). This is used for decompose k test to disable triton choices

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D80670918](https://our.internmc.facebook.com/intern/diff/D80670918)
Pull Request resolved: #161098
Approved by: https://github.com/jansel
ghstack dependencies: #161026, #161097
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
# why

- make it easier to integrate into lookup table later

# what

- current version generates templates on the fly and uses them
  to generate a single choice
- lookup table and performance model work best when there is a
  stable set of templates (with predictable names) and those
  are then parametrized
- this change makes it so that there is a single DecomposeK template
  with a stable name, and the k split is the only parametrization we do

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1 -v
```

Differential Revision: [D80670913](https://our.internmc.facebook.com/intern/diff/D80670913)
Pull Request resolved: pytorch#161026
Approved by: https://github.com/PaulZhang12, https://github.com/jansel
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
# why

- simplify the expansion of heuristics beyond just triton (e.g.
  decomposeK)

# what

- move template heuristics and registry into its own folder
- adjust imports accordingly

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D80670917](https://our.internmc.facebook.com/intern/diff/D80670917)
Pull Request resolved: pytorch#161097
Approved by: https://github.com/PaulZhang12, https://github.com/jansel
ghstack dependencies: pytorch#161026
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
)

# why

- enable it to go through commont template heuristics point
- make easier to use in common extension point e.g. lookup table

# what

- break template heuristic into base + triton
- move k_split generation logic into a templateheuristic for decompose k
- register through normal mechanism

- to make testing work, add a context manager to temporarily set
  template heuristics for a template/op to empty (effectively skipping
  it). This is used for decompose k test to disable triton choices

# testing

```
python3 -bb -m pytest test/inductor/test_max_autotune.py -v
```

Differential Revision: [D80670918](https://our.internmc.facebook.com/intern/diff/D80670918)
Pull Request resolved: pytorch#161098
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#161026, pytorch#161097
@github-actions github-actions Bot deleted the gh/coconutruben/29/head branch September 28, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants