[Helion + torch.compile] Add `ExternalTritonTemplateKernel` for external template prologue/epilogue fusion by yf225 · Pull Request #176571 · pytorch/pytorch

yf225 · 2026-03-05T06:54:39Z

Add support for external template backends (e.g. Helion) to fuse prologue and epilogue pointwise ops into their kernels via Inductor's existing fusion infrastructure.

Template Codegen Lifecycle
==========================

_codegen_single_template (simd.py)
├── Build prologue groups from prologue_nodes
├── Remove prologue-fused inputs from kernel.args.input_buffers
├── kernel.codegen_template_body(...)  →  dispatches by kernel type:
│   │
│   ├── [Standard] TritonTemplateKernel
│   │   ├── with self:
│   │   │   ├── render()  →  PartialRender with hook placeholders
│   │   │   ├── Codegen ALL epilogues into each store subgraph
│   │   │   └── codegen_prologues_in_subgraphs
│   │   └── Finalize hooks (<DEF_KERNEL>, <ARGDEFS>, <LOAD_INPUT_*>, <STORE_OUTPUT_*>)
│   │   └── return src_code
│   │
│   └── [External] ExternalTritonTemplateKernel
│       ├── Build prologue list from prologue groups
│       ├── _find_eligible_epilogues  →  epilogues reading exactly 1 template output
│       ├── Compute _unfused_epilogues  →  everything else (non-MultiOutput)
│       ├── Build prologue_sources dict: buf_name → source_bufs
│       ├── with self:
│       │   ├── _setup_epilogue_hook  →  one per epilogue-fusable output
│       │   ├── _setup_prologue_hook  →  one per named input with a prologue
│       │   ├── Codegen each eligible epilogue into its store subgraph
│       │   └── codegen_prologues_in_subgraphs
│       ├── _build_fusion_spec  →  TemplateFusionSpec
│       ├── tb.fuse(spec)  →  TemplateFusionOutput (backend splices into Triton AST)
│       ├── Finalize hook placeholders (_STORE_OUTPUT_*, _LOAD_INPUT_*)
│       └── return src_code
│
├── kernel.get_unfused_epilogues()
│   ├── [Standard]  →  []
│   └── [External]  →  self._unfused_epilogues
│
├── mark_run on template_node, fused epilogues, prologues
├── define_kernel(src_code)
└── return kernel

call_kernel
├── [Standard] Emit kernel call + workspace deallocation
└── [External] Emit kernel call + multi-output unpacking + codegen_node for each unfused epilogue

Helion-side changes are in pytorch/helion#1520.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

pytorch-bot · 2026-03-05T06:54:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176571

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 9 Unrelated Failures

As of commit 6ccaafd with merge base 6ad9c43 ():

NEW FAILURE - The following job has failed:

Lint / lintrunner-noclang-partial / linux-job (gh)
>>> Lint for test/inductor/test_select_algorithm.py:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / inductor-test / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (detected as infra flaky with no log or failing log classifier)
inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test/inductor/test_triton_kernels.py::TestUserKernelEpilogueFusion::test_fusion_sigmoid_epilogue
inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test/inductor/test_triton_kernels.py::TestUserKernelEpilogueFusion::test_fusion_relu_epilogue
pull / linux-jammy-py3.10-clang15 / test (openreg, 1, 1, lf.linux.2xlarge) (gh) (similar failure)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.10-clang18-asan / test (openreg, 1, 1, lf.linux.4xlarge) (gh) (similar failure)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.10-gcc11 / test (openreg, 1, 1, lf.linux.2xlarge) (gh) (similar failure)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.14-clang15 / test (openreg, 1, 1, lf.linux.2xlarge) (gh) (similar failure)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.14t-clang15 / test (openreg, 1, 1, lf.linux.2xlarge) (gh) (similar failure)
RuntimeError: test_openreg 1/1 failed!

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-05T06:54:46Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

yf225 · 2026-03-06T06:54:00Z

torch/_inductor/codegen/simd.py

@@ -463,7 +463,7 @@ def simplify_indexing(index: sympy.Expr):
        self.rsplit_size = 0
        self.saved_partial_accumulate: list[PartialAccumulate] = []

-    def codegen_template_override(
+    def codegen_template_body(


The overall codegen lifecycle:

_codegen_single_template (simd.py) ├── Build prologue groups from prologue_nodes ├── Remove prologue-fused inputs from kernel.args.input_buffers ├── kernel.codegen_template_body(...) → dispatches by kernel type: │ │ │ ├── [Standard] TritonTemplateKernel │ │ ├── with self: │ │ │ ├── render() → PartialRender with hook placeholders │ │ │ ├── Codegen ALL epilogues into each store subgraph │ │ │ └── codegen_prologues_in_subgraphs │ │ └── Finalize hooks (<DEF_KERNEL>, <ARGDEFS>, <LOAD_INPUT_*>, <STORE_OUTPUT_*>) │ │ └── return src_code │ │ │ └── [External] ExternalTritonTemplateKernel │ ├── Build prologue list from prologue groups │ ├── _find_eligible_epilogues → epilogues reading exactly 1 template output │ ├── Compute _unfused_epilogues → everything else (non-MultiOutput) │ ├── Build prologue_sources dict: buf_name → source_bufs │ ├── with self: │ │ ├── _setup_epilogue_hook → one per epilogue-fusable output │ │ ├── _setup_prologue_hook → one per named input with a prologue │ │ ├── Codegen each eligible epilogue into its store subgraph │ │ └── codegen_prologues_in_subgraphs │ ├── _build_fusion_spec → TemplateFusionSpec │ ├── tb.fuse(spec) → TemplateFusionOutput (backend splices into Triton AST) │ ├── Finalize hook placeholders (_STORE_OUTPUT_*, _LOAD_INPUT_*) │ └── return src_code │ ├── kernel.get_unfused_epilogues() │ ├── [Standard] → [] │ └── [External] → self._unfused_epilogues │ ├── mark_run on template_node, fused epilogues, prologues ├── define_kernel(src_code) └── return kernel call_kernel ├── [Standard] Emit kernel call + workspace deallocation └── [External] Emit kernel call + multi-output unpacking + codegen_node for each unfused epilogue

yf225 · 2026-03-06T06:59:06Z

torch/_inductor/codegen/simd.py

@@ -463,7 +463,7 @@ def simplify_indexing(index: sympy.Expr):
        self.rsplit_size = 0
        self.saved_partial_accumulate: list[PartialAccumulate] = []

-    def codegen_template_override(


This is intentionally removed in favor of codegen_template_body() as the new extension point.

yf225 · 2026-03-06T19:54:47Z

torch/_inductor/ir.py

This is intentionally removed as it's no longer needed - instead we directly add the ir.MultiOutput handling logic in can_fuse_multi_outputs_template().

yf225 · 2026-03-08T03:58:23Z

@pytorchbot rebase

pytorchmergebot · 2026-03-08T04:00:05Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2026-03-08T04:00:08Z

Successfully rebased helion_inductor_fusion_pr1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout helion_inductor_fusion_pr1 && git pull --rebase)

yf225 · 2026-03-10T20:37:33Z

Closing this PR in favor of this PR stack: #177062

pytorch-bot bot added ciflow/inductor module: inductor labels Mar 5, 2026

yf225 changed the title ~~[Helion + torch.compile] Add ExternalTritonTemplateKernel for external template prologue/epilogue fusion~~ [Helion + torch.compile] Add ExternalTritonTemplateKernel for external template prologue/epilogue fusion Mar 5, 2026

yf225 added the topic: not user facing topic category label Mar 5, 2026

yf225 force-pushed the helion_inductor_fusion_pr1 branch 6 times, most recently from 971cd8d to 2229e9d Compare March 6, 2026 06:52

yf225 commented Mar 6, 2026

View reviewed changes

yf225 force-pushed the helion_inductor_fusion_pr1 branch 12 times, most recently from 7b775e9 to 6529c27 Compare March 6, 2026 19:48

yf225 commented Mar 6, 2026

View reviewed changes

yf225 mentioned this pull request Mar 6, 2026

[Helion + torch.compile] prologue / epilogue fusion with pointwise ops pytorch/helion#1520

Closed

yf225 force-pushed the helion_inductor_fusion_pr1 branch 2 times, most recently from 2429606 to ee6f6ee Compare March 6, 2026 21:27

yf225 marked this pull request as ready for review March 6, 2026 21:27

yf225 requested review from eellison, jansel, oulgen and shunting314 March 6, 2026 21:28

yf225 force-pushed the helion_inductor_fusion_pr1 branch 2 times, most recently from 545a723 to 8040ee5 Compare March 6, 2026 22:18

yf225 marked this pull request as draft March 7, 2026 02:03

yf225 force-pushed the helion_inductor_fusion_pr1 branch 3 times, most recently from b93b7c4 to 0b9593c Compare March 7, 2026 04:40

yf225 marked this pull request as ready for review March 7, 2026 05:29

yf225 force-pushed the helion_inductor_fusion_pr1 branch from 0b9593c to f4dba7f Compare March 7, 2026 07:27

pytorchmergebot force-pushed the helion_inductor_fusion_pr1 branch from f4dba7f to 8217bdf Compare March 8, 2026 04:00

yf225 marked this pull request as draft March 9, 2026 01:42

yf225 added 2 commits March 9, 2026 14:54

test

950374e

wip

6ccaafd

yf225 force-pushed the helion_inductor_fusion_pr1 branch from 8217bdf to 6ccaafd Compare March 9, 2026 21:54

yf225 closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Helion + torch.compile] Add `ExternalTritonTemplateKernel` for external template prologue/epilogue fusion#176571

[Helion + torch.compile] Add `ExternalTritonTemplateKernel` for external template prologue/epilogue fusion#176571
yf225 wants to merge 2 commits intomainfrom
helion_inductor_fusion_pr1

yf225 commented Mar 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 5, 2026

Uh oh!

yf225 Mar 6, 2026 •

edited

Loading

Uh oh!

yf225 Mar 6, 2026 •

edited

Loading

Uh oh!

yf225 Mar 6, 2026

Uh oh!

yf225 commented Mar 8, 2026

Uh oh!

pytorchmergebot commented Mar 8, 2026

Uh oh!

pytorchmergebot commented Mar 8, 2026

Uh oh!

yf225 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yf225 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176571

❌ 1 New Failure, 9 Unrelated Failures

Uh oh!

pytorch-bot bot commented Mar 5, 2026

This PR needs a release notes: label

Uh oh!

yf225 Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yf225 Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yf225 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

yf225 commented Mar 8, 2026

Uh oh!

pytorchmergebot commented Mar 8, 2026

Uh oh!

pytorchmergebot commented Mar 8, 2026

Uh oh!

yf225 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yf225 commented Mar 5, 2026 •

edited

Loading

pytorch-bot bot commented Mar 5, 2026 •

edited

Loading

This PR needs a `release notes:` label

yf225 Mar 6, 2026 •

edited

Loading

yf225 Mar 6, 2026 •

edited

Loading