[Helion + torch.compile] Add unit test for ExternalTritonTemplateKernel fusion#177065
[Helion + torch.compile] Add unit test for ExternalTritonTemplateKernel fusion#177065yf225 wants to merge 34 commits intogh/yf225/137/basefrom
Conversation
Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177065
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 4 Pending, 3 Unrelated FailuresAs of commit 6c88a20 with merge base d1f78bd ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: 094f66e Pull Request resolved: #177065
…tes" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: 3753929 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: c0d1028 Pull Request resolved: #177065
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: c0d1028 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: 87c5cfe Pull Request resolved: #177065
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: c0d1028 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: f139305 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: 929a239 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: fa84eba Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…er fusion updates Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: c1e6a3b Pull Request resolved: #177065
Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check ghstack-source-id: fa84eba Pull Request resolved: pytorch#177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…teKernel fusion Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel, testing the full render-based fusion pipeline with a mock triton template. ghstack-source-id: c4d94e0 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D96563983](https://our.internmc.facebook.com/intern/diff/D96563983) [ghstack-poisoned]
…teKernel fusion Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel. The _render() method calls kernel._setup_fusion_hooks() to set up all fusion hooks in one call, then reads kernel._prologue_source_buffers and kernel._extra_store_targets to build the template source with the appropriate placeholders. ghstack-source-id: 4b7c486 Pull Request resolved: #177065
…and scheduler fusion updates" Add ExternalTritonTemplateKernel class that subclasses TritonTemplateKernel for external template backends (e.g. Helion). Key methods: _compute_fusion_metadata(), _setup_fusion_hooks(), _find_eligible_epilogues(), _setup_epilogue_hook(), _setup_prologue_hook(), call_kernel(), emit_kernel_override(). Extend TemplateBuffer base class with fields needed by external backends: epilogue_fusable_outputs, _multi_output_children, _named_inputs, and add realize_template_input() and build_multi_outputs() class methods. Add MultiOutputLayout handling to extract_read_writes(). Set epilogue_fusable_outputs in TritonTemplateBuffer. Scheduler changes: - Generalize prologue fusion: check allowed_prologue_inps instead of isinstance(TritonTemplateBuffer) - Add multi-output template epilogue guard requiring ComputedBuffer - Replace can_fuse_multi_output_epilogue delegation with inline MultiOutput parent check cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D96563983](https://our.internmc.facebook.com/intern/diff/D96563983) [ghstack-poisoned]
…teKernel fusion Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel. The _render() method calls kernel._setup_fusion_hooks() to set up all fusion hooks in one call, then reads kernel._prologue_source_buffers and kernel._extra_store_targets to build the template source with the appropriate placeholders. ghstack-source-id: 86661b2 Pull Request resolved: #177065
…emplateKernel fusion" Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel. The _render() method calls kernel._setup_fusion_hooks() to set up all fusion hooks in one call, then reads kernel._prologue_source_buffers and kernel._extra_store_targets to build the template source with the appropriate placeholders. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…teKernel fusion Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel. The _render() method calls kernel._setup_fusion_hooks() to set up all fusion hooks in one call, then reads kernel._prologue_source_buffers and kernel._extra_store_targets to build the template source with the appropriate placeholders. ghstack-source-id: 1a84b60 Pull Request resolved: #177065
|
@pytorchbot merge -f "unrelated failures" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@yf225 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…tput templates (#177597) TemplateBuffer subclasses with MultiOutputLayout (e.g. Helion kernels) don't have a single dtype. Add an explicit error in TemplateBuffer.dtype for this case, and guard the scheduler's low-precision heuristic with is_multi_outputs_template() so it skips the check rather than crashing. Pull Request resolved: #177597 Approved by: https://github.com/shunting314 ghstack dependencies: #177492, #177065
…el fusion (pytorch#177065) Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel. The _render() method calls kernel._setup_fusion_hooks() to set up all fusion hooks in one call, then reads kernel._prologue_source_buffers and kernel._extra_store_targets to build the template source with the appropriate placeholders. Pull Request resolved: pytorch#177065 Approved by: https://github.com/jansel ghstack dependencies: pytorch#177492
…tput templates (pytorch#177597) TemplateBuffer subclasses with MultiOutputLayout (e.g. Helion kernels) don't have a single dtype. Add an explicit error in TemplateBuffer.dtype for this case, and guard the scheduler's low-precision heuristic with is_multi_outputs_template() so it skips the check rather than crashing. Pull Request resolved: pytorch#177597 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#177492, pytorch#177065
…el fusion (pytorch#177065) Add test_external_template_prologue_epilogue_fusion that exercises: - Prologue fusion: sigmoid(b) fused into template as <LOAD_INPUT_B> - Epilogue fusion: relu(...) * bias fused into template as <STORE_OUTPUT_0> - Extra inputs: bias is read by the epilogue but is not among the template's original inputs, exercising kernel._extra_inputs Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and creates an ExternalTritonTemplateKernel. The _render() method calls kernel._setup_fusion_hooks() to set up all fusion hooks in one call, then reads kernel._prologue_source_buffers and kernel._extra_store_targets to build the template source with the appropriate placeholders. Pull Request resolved: pytorch#177065 Approved by: https://github.com/jansel ghstack dependencies: pytorch#177492
…tput templates (pytorch#177597) TemplateBuffer subclasses with MultiOutputLayout (e.g. Helion kernels) don't have a single dtype. Add an explicit error in TemplateBuffer.dtype for this case, and guard the scheduler's low-precision heuristic with is_multi_outputs_template() so it skips the check rather than crashing. Pull Request resolved: pytorch#177597 Approved by: https://github.com/shunting314 ghstack dependencies: pytorch#177492, pytorch#177065
Stack from ghstack (oldest at bottom):
Add test_external_template_prologue_epilogue_fusion that exercises:
template's original inputs, exercising kernel._extra_inputs
Uses a _MockExternalTemplateBuffer that subclasses TemplateBuffer and
creates an ExternalTritonTemplateKernel. The _render() method calls
kernel._setup_fusion_hooks() to set up all fusion hooks in one call,
then reads kernel._prologue_source_buffers and kernel._extra_store_targets
to build the template source with the appropriate placeholders.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo
Differential Revision: D96849203