[Helion + torch.compile] Add ExternalTritonTemplateKernel for external template prologue/epilogue fusion#176571
[Helion + torch.compile] Add ExternalTritonTemplateKernel for external template prologue/epilogue fusion#176571
ExternalTritonTemplateKernel for external template prologue/epilogue fusion#176571Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176571
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 9 Unrelated FailuresAs of commit 6ccaafd with merge base 6ad9c43 ( NEW FAILURE - The following job has failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
ExternalTritonTemplateKernel for external template prologue/epilogue fusion
971cd8d to
2229e9d
Compare
| @@ -463,7 +463,7 @@ def simplify_indexing(index: sympy.Expr): | |||
| self.rsplit_size = 0 | |||
| self.saved_partial_accumulate: list[PartialAccumulate] = [] | |||
|
|
|||
| def codegen_template_override( | |||
| def codegen_template_body( | |||
There was a problem hiding this comment.
The overall codegen lifecycle:
_codegen_single_template (simd.py)
├── Build prologue groups from prologue_nodes
├── Remove prologue-fused inputs from kernel.args.input_buffers
├── kernel.codegen_template_body(...) → dispatches by kernel type:
│ │
│ ├── [Standard] TritonTemplateKernel
│ │ ├── with self:
│ │ │ ├── render() → PartialRender with hook placeholders
│ │ │ ├── Codegen ALL epilogues into each store subgraph
│ │ │ └── codegen_prologues_in_subgraphs
│ │ └── Finalize hooks (<DEF_KERNEL>, <ARGDEFS>, <LOAD_INPUT_*>, <STORE_OUTPUT_*>)
│ │ └── return src_code
│ │
│ └── [External] ExternalTritonTemplateKernel
│ ├── Build prologue list from prologue groups
│ ├── _find_eligible_epilogues → epilogues reading exactly 1 template output
│ ├── Compute _unfused_epilogues → everything else (non-MultiOutput)
│ ├── Build prologue_sources dict: buf_name → source_bufs
│ ├── with self:
│ │ ├── _setup_epilogue_hook → one per epilogue-fusable output
│ │ ├── _setup_prologue_hook → one per named input with a prologue
│ │ ├── Codegen each eligible epilogue into its store subgraph
│ │ └── codegen_prologues_in_subgraphs
│ ├── _build_fusion_spec → TemplateFusionSpec
│ ├── tb.fuse(spec) → TemplateFusionOutput (backend splices into Triton AST)
│ ├── Finalize hook placeholders (_STORE_OUTPUT_*, _LOAD_INPUT_*)
│ └── return src_code
│
├── kernel.get_unfused_epilogues()
│ ├── [Standard] → []
│ └── [External] → self._unfused_epilogues
│
├── mark_run on template_node, fused epilogues, prologues
├── define_kernel(src_code)
└── return kernel
call_kernel
├── [Standard] Emit kernel call + workspace deallocation
└── [External] Emit kernel call + multi-output unpacking + codegen_node for each unfused epilogue
| @@ -463,7 +463,7 @@ def simplify_indexing(index: sympy.Expr): | |||
| self.rsplit_size = 0 | |||
| self.saved_partial_accumulate: list[PartialAccumulate] = [] | |||
|
|
|||
| def codegen_template_override( | |||
There was a problem hiding this comment.
This is intentionally removed in favor of codegen_template_body() as the new extension point.
7b775e9 to
6529c27
Compare
There was a problem hiding this comment.
This is intentionally removed as it's no longer needed - instead we directly add the ir.MultiOutput handling logic in can_fuse_multi_outputs_template().
2429606 to
ee6f6ee
Compare
545a723 to
8040ee5
Compare
b93b7c4 to
0b9593c
Compare
0b9593c to
f4dba7f
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
f4dba7f to
8217bdf
Compare
8217bdf to
6ccaafd
Compare
|
Closing this PR in favor of this PR stack: #177062 |
Add support for external template backends (e.g. Helion) to fuse prologue and epilogue pointwise ops into their kernels via Inductor's existing fusion infrastructure.
Helion-side changes are in pytorch/helion#1520.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo