[Helion + torch.compile] Refactor TemplateBuffer as extensible base class by yf225 · Pull Request #177063 · pytorch/pytorch

yf225 · 2026-03-10T20:08:41Z

Stack from ghstack (oldest at bottom):

[Helion + torch.compile] Add unit test for ExternalTritonTemplateKernel fusion #177065
[Helion + torch.compile] Refactor template codegen pipeline for extensibility #177064
-> [Helion + torch.compile] Refactor TemplateBuffer as extensible base class #177063
[Helion + torch.compile] Fix MultiOutput write deps to eliminate fusion workarounds #177062

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.init
Move mutation_outputs setup from TritonTemplateBuffer to base class
Move get_outputs(), get_allowed_prologue_inps() up
Extract _read_deps_from_inputs() helper from extract_read_writes()
Remove can_fuse_multi_output_epilogue() (unused)
Simplify TritonTemplateBuffer to delegate to super().init()
Remove redundant self.outputs from CppTemplateBuffer

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer [ghstack-poisoned]

pytorch-bot · 2026-03-10T20:08:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177063

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ddfde89 with merge base 59b048f ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-10T20:08:48Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer [ghstack-poisoned]

…ible base class" Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer [ghstack-poisoned]

…lass Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer ghstack-source-id: 91e9bd0 Pull Request resolved: #177063

…ible base class" Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

…lass Move common fields and methods from TritonTemplateBuffer up to TemplateBuffer so that external template backends (e.g. Helion) can reuse the same mutation-tracking and prologue-fusion infrastructure: - Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__ - Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs) - Move get_allowed_prologue_inps() to base class - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (always returned False, unused) - Simplify TritonTemplateBuffer.__init__() to delegate to super() get_outputs() stays on TritonTemplateBuffer since it is the only subclass that currently passes mutated_inputs; other subclasses (CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own output lists independently. ghstack-source-id: 91e9bd0 Pull Request resolved: #177063

…ible base class" Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

yf225 · 2026-03-11T05:30:35Z

@pytorchbot merge -f "unrelated failures"

pytorchmergebot · 2026-03-11T05:32:23Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Move common fields and methods from TritonTemplateBuffer up to TemplateBuffer so that external template backends (e.g. Helion) can reuse the same mutation-tracking and prologue-fusion infrastructure: - Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__ - Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs) - Move get_allowed_prologue_inps() to base class - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (always returned False, unused) - Simplify TritonTemplateBuffer.__init__() to delegate to super() get_outputs() stays on TritonTemplateBuffer since it is the only subclass that currently passes mutated_inputs; other subclasses (CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own output lists independently. ghstack-source-id: 64bafb1 Pull Request resolved: pytorch#177063

…lass Move common fields and methods from TritonTemplateBuffer up to TemplateBuffer so that external template backends (e.g. Helion) can reuse the same mutation-tracking and prologue-fusion infrastructure: - Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__ - Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs) - Move get_allowed_prologue_inps() to base class - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (always returned False, unused) - Simplify TritonTemplateBuffer.__init__() to delegate to super() get_outputs() stays on TritonTemplateBuffer since it is the only subclass that currently passes mutated_inputs; other subclasses (CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own output lists independently. ghstack-source-id: 64bafb1 Pull Request resolved: pytorch/pytorch#177063

…e base class (#177063)" This reverts commit f72b01e.

…e base class (#177063)" (#177360) This reverts commit f72b01e. Pull Request resolved: #177360 Approved by: https://github.com/huydhn

…ble base class (#177367) This is a reland of #177063. Move common fields and methods from TritonTemplateBuffer up to TemplateBuffer so that external template backends (e.g. Helion) can reuse the same mutation-tracking and prologue-fusion infrastructure: - Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__ - Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs) - Move get_allowed_prologue_inps() to base class - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (always returned False, unused) - Simplify TritonTemplateBuffer.__init__() to delegate to super() get_outputs() stays on TritonTemplateBuffer since it is the only subclass that currently passes mutated_inputs; other subclasses (CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own output lists independently. Pull Request resolved: #177367 Approved by: https://github.com/shunting314 ghstack dependencies: #177302

…ble base class (pytorch#177367) This is a reland of pytorch#177063. Move common fields and methods from TritonTemplateBuffer up to TemplateBuffer so that external template backends (e.g. Helion) can reuse the same mutation-tracking and prologue-fusion infrastructure: - Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__ - Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs) - Move get_allowed_prologue_inps() to base class - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (always returned False, unused) - Simplify TritonTemplateBuffer.__init__() to delegate to super() get_outputs() stays on TritonTemplateBuffer since it is the only subclass that currently passes mutated_inputs; other subclasses (CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own output lists independently. Pull Request resolved: pytorch#177367 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#177302

…lass (pytorch#177063) Move common fields and methods up from TritonTemplateBuffer to TemplateBuffer so that all template subclasses (Triton, CuteDSL, external backends) share them: - Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__ - Move mutation_outputs setup from TritonTemplateBuffer to base class - Move get_outputs(), get_allowed_prologue_inps() up - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (unused) - Simplify TritonTemplateBuffer to delegate to super().__init__() - Remove redundant self.outputs from CppTemplateBuffer Pull Request resolved: pytorch#177063 Approved by: https://github.com/jansel ghstack dependencies: pytorch#177062

…e base class (pytorch#177063)" (pytorch#177360) This reverts commit f72b01e. Pull Request resolved: pytorch#177360 Approved by: https://github.com/huydhn

…ble base class (pytorch#177367) This is a reland of pytorch#177063. Move common fields and methods from TritonTemplateBuffer up to TemplateBuffer so that external template backends (e.g. Helion) can reuse the same mutation-tracking and prologue-fusion infrastructure: - Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__ - Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs) - Move get_allowed_prologue_inps() to base class - Extract _read_deps_from_inputs() helper from extract_read_writes() - Remove can_fuse_multi_output_epilogue() (always returned False, unused) - Simplify TritonTemplateBuffer.__init__() to delegate to super() get_outputs() stays on TritonTemplateBuffer since it is the only subclass that currently passes mutated_inputs; other subclasses (CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own output lists independently. Pull Request resolved: pytorch#177367 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#177302

pytorch-bot bot added ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests module: inductor labels Mar 10, 2026

yf225 changed the title ~~Refactor TemplateBuffer as extensible base class~~ [Inductor] Refactor TemplateBuffer as extensible base class Mar 10, 2026

yf225 changed the title ~~[Inductor] Refactor TemplateBuffer as extensible base class~~ [Helion + torch.compile] Refactor TemplateBuffer as extensible base class Mar 10, 2026

yf225 added the topic: not user facing topic category label Mar 10, 2026

yf225 requested review from eellison, jansel, oulgen and shunting314 March 10, 2026 21:09

jansel approved these changes Mar 11, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 11, 2026

pytorchmergebot closed this in f72b01e Mar 11, 2026

pytorchmergebot added Merged and removed merging labels Mar 11, 2026

yf225 mentioned this pull request Mar 12, 2026

[Helion + torch.compile] Fix MultiOutput write deps and extend fusion score matching #177302

Closed

yf225 mentioned this pull request Mar 12, 2026

[Helion + torch.compile] Normalize MultiOutput write deps for consistent fusion matching #177315

Closed

yf225 added a commit that referenced this pull request Mar 13, 2026

Revert "[Helion + torch.compile] Refactor TemplateBuffer as extensibl…

ecfd1c2

…e base class (#177063)" This reverts commit f72b01e.

yf225 mentioned this pull request Mar 13, 2026

[Re-land] [Helion + torch.compile] Refactor TemplateBuffer as extensible base class #177367

Closed

github-actions bot deleted the gh/yf225/135/head branch April 11, 2026 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Helion + torch.compile] Refactor TemplateBuffer as extensible base class#177063

[Helion + torch.compile] Refactor TemplateBuffer as extensible base class#177063
yf225 wants to merge 5 commits intogh/yf225/135/basefrom
gh/yf225/135/head

yf225 commented Mar 10, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 10, 2026

Uh oh!

yf225 commented Mar 11, 2026

Uh oh!

pytorchmergebot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yf225 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177063

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

pytorch-bot bot commented Mar 10, 2026

This PR needs a release notes: label

Uh oh!

yf225 commented Mar 11, 2026

Uh oh!

pytorchmergebot commented Mar 11, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yf225 commented Mar 10, 2026 •

edited

Loading

pytorch-bot bot commented Mar 10, 2026 •

edited

Loading

This PR needs a `release notes:` label