Skip to content

[Helion + torch.compile] Refactor TemplateBuffer as extensible base class#177063

Closed
yf225 wants to merge 5 commits intogh/yf225/135/basefrom
gh/yf225/135/head
Closed

[Helion + torch.compile] Refactor TemplateBuffer as extensible base class#177063
yf225 wants to merge 5 commits intogh/yf225/135/basefrom
gh/yf225/135/head

Conversation

@yf225
Copy link
Copy Markdown
Contributor

@yf225 yf225 commented Mar 10, 2026

Stack from ghstack (oldest at bottom):

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

  • Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.init
  • Move mutation_outputs setup from TritonTemplateBuffer to base class
  • Move get_outputs(), get_allowed_prologue_inps() up
  • Extract _read_deps_from_inputs() helper from extract_read_writes()
  • Remove can_fuse_multi_output_epilogue() (unused)
  • Simplify TritonTemplateBuffer to delegate to super().init()
  • Remove redundant self.outputs from CppTemplateBuffer

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177063

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ddfde89 with merge base 59b048f (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 10, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

[ghstack-poisoned]
@yf225 yf225 changed the title Refactor TemplateBuffer as extensible base class [Inductor] Refactor TemplateBuffer as extensible base class Mar 10, 2026
@yf225 yf225 changed the title [Inductor] Refactor TemplateBuffer as extensible base class [Helion + torch.compile] Refactor TemplateBuffer as extensible base class Mar 10, 2026
…ible base class"

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

[ghstack-poisoned]
@yf225 yf225 added the topic: not user facing topic category label Mar 10, 2026
yf225 added a commit that referenced this pull request Mar 10, 2026
…lass

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

ghstack-source-id: 91e9bd0
Pull Request resolved: #177063
…ible base class"


Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yf225 added a commit that referenced this pull request Mar 11, 2026
…lass

Move common fields and methods from TritonTemplateBuffer up to
TemplateBuffer so that external template backends (e.g. Helion) can
reuse the same mutation-tracking and prologue-fusion infrastructure:

- Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__
- Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs)
- Move get_allowed_prologue_inps() to base class
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (always returned False, unused)
- Simplify TritonTemplateBuffer.__init__() to delegate to super()

get_outputs() stays on TritonTemplateBuffer since it is the only
subclass that currently passes mutated_inputs; other subclasses
(CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own
output lists independently.

ghstack-source-id: 91e9bd0
Pull Request resolved: #177063
…ible base class"


Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
@yf225
Copy link
Copy Markdown
Contributor Author

yf225 commented Mar 11, 2026

@pytorchbot merge -f "unrelated failures"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

yf225 added a commit to yf225/pytorch that referenced this pull request Mar 11, 2026
Move common fields and methods from TritonTemplateBuffer up to
TemplateBuffer so that external template backends (e.g. Helion) can
reuse the same mutation-tracking and prologue-fusion infrastructure:

- Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__
- Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs)
- Move get_allowed_prologue_inps() to base class
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (always returned False, unused)
- Simplify TritonTemplateBuffer.__init__() to delegate to super()

get_outputs() stays on TritonTemplateBuffer since it is the only
subclass that currently passes mutated_inputs; other subclasses
(CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own
output lists independently.

ghstack-source-id: 64bafb1
Pull Request resolved: pytorch#177063
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
…lass

Move common fields and methods from TritonTemplateBuffer up to
TemplateBuffer so that external template backends (e.g. Helion) can
reuse the same mutation-tracking and prologue-fusion infrastructure:

- Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__
- Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs)
- Move get_allowed_prologue_inps() to base class
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (always returned False, unused)
- Simplify TritonTemplateBuffer.__init__() to delegate to super()

get_outputs() stays on TritonTemplateBuffer since it is the only
subclass that currently passes mutated_inputs; other subclasses
(CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own
output lists independently.

ghstack-source-id: 64bafb1
Pull Request resolved: pytorch/pytorch#177063
yf225 added a commit that referenced this pull request Mar 13, 2026
pytorchmergebot pushed a commit that referenced this pull request Mar 13, 2026
pytorchmergebot pushed a commit that referenced this pull request Mar 13, 2026
…ble base class (#177367)

This is a reland of #177063.

Move common fields and methods from TritonTemplateBuffer up to
TemplateBuffer so that external template backends (e.g. Helion) can
reuse the same mutation-tracking and prologue-fusion infrastructure:

- Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__
- Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs)
- Move get_allowed_prologue_inps() to base class
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (always returned False, unused)
- Simplify TritonTemplateBuffer.__init__() to delegate to super()

get_outputs() stays on TritonTemplateBuffer since it is the only
subclass that currently passes mutated_inputs; other subclasses
(CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own
output lists independently.

Pull Request resolved: #177367
Approved by: https://github.com/shunting314
ghstack dependencies: #177302
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 24, 2026
…ble base class (pytorch#177367)

This is a reland of pytorch#177063.

Move common fields and methods from TritonTemplateBuffer up to
TemplateBuffer so that external template backends (e.g. Helion) can
reuse the same mutation-tracking and prologue-fusion infrastructure:

- Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__
- Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs)
- Move get_allowed_prologue_inps() to base class
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (always returned False, unused)
- Simplify TritonTemplateBuffer.__init__() to delegate to super()

get_outputs() stays on TritonTemplateBuffer since it is the only
subclass that currently passes mutated_inputs; other subclasses
(CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own
output lists independently.

Pull Request resolved: pytorch#177367
Approved by: https://github.com/shunting314
ghstack dependencies: pytorch#177302
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…lass (pytorch#177063)

Move common fields and methods up from TritonTemplateBuffer to
TemplateBuffer so that all template subclasses (Triton, CuteDSL,
external backends) share them:

- Add mutated_inputs, allowed_prologue_inps to TemplateBuffer.__init__
- Move mutation_outputs setup from TritonTemplateBuffer to base class
- Move get_outputs(), get_allowed_prologue_inps() up
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (unused)
- Simplify TritonTemplateBuffer to delegate to super().__init__()
- Remove redundant self.outputs from CppTemplateBuffer

Pull Request resolved: pytorch#177063
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#177062
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…ble base class (pytorch#177367)

This is a reland of pytorch#177063.

Move common fields and methods from TritonTemplateBuffer up to
TemplateBuffer so that external template backends (e.g. Helion) can
reuse the same mutation-tracking and prologue-fusion infrastructure:

- Add mutated_inputs, allowed_prologue_inps params to TemplateBuffer.__init__
- Build mutation_outputs list in base class (parallel to ExternKernel.mutation_outputs)
- Move get_allowed_prologue_inps() to base class
- Extract _read_deps_from_inputs() helper from extract_read_writes()
- Remove can_fuse_multi_output_epilogue() (always returned False, unused)
- Simplify TritonTemplateBuffer.__init__() to delegate to super()

get_outputs() stays on TritonTemplateBuffer since it is the only
subclass that currently passes mutated_inputs; other subclasses
(CppTemplateBuffer, CuteDSLTemplateBuffer, etc.) manage their own
output lists independently.

Pull Request resolved: pytorch#177367
Approved by: https://github.com/shunting314
ghstack dependencies: pytorch#177302
@github-actions github-actions bot deleted the gh/yf225/135/head branch April 11, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants