[inductor] Use custom triton kernel subclass when available by kundaMwiza · Pull Request #167456 · pytorch/pytorch

kundaMwiza · 2025-11-10T09:48:38Z

This refactor replaces direct uses of TritonKernel in cases where a subclass type is available since out of tree / custom backends can:

have their own configs that they would like to place in inductor_meta via a TritonKernel subclass for the autotuner to handle
have their own triton heuristics for the different types of operations (pointwise, reduction e.t.c). These heuristics can currently only be reached by patching. This change allows custom backends to inject their own imports directly via a subclass

Example out of tree backends with their own heuristic modules:

Ascend NPU: https://github.com/Ascend/pytorch/blob/045a034dbcec287a5997aa13fd129a1cd6b1e215/torch_npu/_inductor/npu_triton_heuristics.py#L4

Intel XPU: https://github.com/intel/intel-extension-for-pytorch/blob/5dcc9d57e5422cf295e1a1ee97896d6b6a554a85/intel_extension_for_pytorch/_inductor/xpu/triton_ops/autotune.py

It also adds a triton_meta_common method that is analogous to inductor_meta_common that is overridable, so that compile options can be directly provided.

Test plan:

Added unit tests to test_triton_extension_backend.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @chenyang78

pytorch-bot · 2025-11-10T09:48:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167456

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit b342108 with merge base ed18b31 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) (gh) (similar failure)
test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kundaMwiza · 2025-11-10T09:49:59Z

@pytorchbot label "topic: not user facing"

jansel

Failing tests?

Is there a test we could add to check this new behavior?

kundaMwiza · 2025-11-19T10:51:41Z

torch/_inductor/runtime/triton_heuristics.py

    filename=None,
    inductor_meta=None,
    custom_kernel=False,
+    caching_autotuner_cls: type[CachingAutotuner] = CachingAutotuner,


Allows custom heuristics modules to pass in their subclasses

kundaMwiza · 2025-11-19T10:53:00Z

torch/_inductor/codegen/wrapper.py

    def define_subgraph_launcher_fn(self, name: str, subgraph_code):
        self.subgraph_definitions.splice(subgraph_code.value)

+    @classmethod


I couldn't put the kernel type on the class because there would be an import cycle. Happy to know of other alternatives to this

kundaMwiza · 2025-11-19T10:54:39Z

test/inductor/test_triton_extension_backend.py



 @unittest.skipIf(IS_FBCODE, "cpp_extension doesn't work in fbcode right now")
-@test_torchinductor.skip_if_cpp_wrapper(


This decorator only works on test methods, so this test class is currently not discoverable on main

kundaMwiza · 2025-11-19T10:55:27Z

test/inductor/test_triton_extension_backend.py

+        )
+
+    @requires_cuda_and_triton
+    def test_codegen_with_custom_heuristics_module(self):


@jansel Added some tests

kundaMwiza · 2025-11-19T11:02:36Z

test/inductor/extension_backends/triton/extension_triton_heuristics.py

+    Construct @triton.heuristics() based on size_hints.
+    """
+    configs = [triton_heuristics.Config({"XBLOCK": 32})]
+    return triton_heuristics.cached_autotune(


Example out of tree backends with their own heuristic modules:

Ascend NPU: https://github.com/Ascend/pytorch/blob/045a034dbcec287a5997aa13fd129a1cd6b1e215/torch_npu/_inductor/npu_triton_heuristics.py#L4

Intel XPU: https://github.com/intel/intel-extension-for-pytorch/blob/5dcc9d57e5422cf295e1a1ee97896d6b6a554a85/intel_extension_for_pytorch/_inductor/xpu/triton_ops/autotune.py

kundaMwiza · 2025-11-21T05:09:45Z

@pytorchbot merge

pytorchmergebot · 2025-11-21T05:11:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

yangw-dev · 2025-11-23T00:51:09Z

@pytorchbot revert -m "failed internal test Diff D87660150 , errorl ModuleNotFoundError: No module named 'extension_backends'" -c ghfirst

pytorchmergebot · 2025-11-23T00:52:40Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

kundaMwiza · 2026-01-21T14:47:35Z

@jansel The failing jobs are also broken on trunk - they are just the dynamic shapes variants of the tests. I had to rebase this PR with main rather than viable/strict because of merge conflicts with main

Use classmethod instead of staticmethod

…utotuner

kundaMwiza · 2026-01-28T09:32:26Z

@pytorchbot merge

pytorchmergebot · 2026-01-28T09:34:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 10, 2025

kundaMwiza changed the title ~~[inductor] Use custom triton kernel subclass if available~~ [inductor] Use custom triton kernel subclass when available Nov 10, 2025

pytorch-bot bot added the topic: not user facing topic category label Nov 10, 2025

pytorchbot added the open source label Nov 10, 2025

bdhirsh requested review from eellison and jansel November 11, 2025 14:27

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 11, 2025

jansel requested changes Nov 11, 2025

View reviewed changes

kundaMwiza force-pushed the mwizak/use-custom-triton-kernel-subclass-if-available branch 2 times, most recently from e7d7329 to 04cc3ca Compare November 19, 2025 10:48

kundaMwiza commented Nov 19, 2025

View reviewed changes

kundaMwiza requested a review from jansel November 19, 2025 10:56

kundaMwiza commented Nov 19, 2025

View reviewed changes

jansel approved these changes Nov 20, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 21, 2025

pytorchmergebot added the merging label Nov 21, 2025

pytorchmergebot added the Merged label Nov 21, 2025

pytorchmergebot closed this in 4ee6b3d Nov 21, 2025

pytorchmergebot removed the merging label Nov 21, 2025

kundaMwiza and others added 21 commits January 27, 2026 16:01

Use triton kernel subclass

f45b54c

Use classmethod instead of staticmethod

Add overridable triton_meta_common method

8cee155

Remove gen_common_triton_imports import

081ea7f

Register device heuristic config

63c7203

Simplify and allow out of tree backend to pass a subclass of CachingA…

658df31

…utotuner

Make triton kernel class type resolution more robust

40106d9

Add extension heuristic tests

7944801

lint

747d4f6

Use GPU type for device

33412ac

Fix method name

6a3f972

Revert unintended change

5ebb74b

Remove heuristic config use

1dec357

Update comment

a871cca

Make extension_backends a package

8a2236c

Allow Triton CPU or Triton GPU to be used

20a3b7f

Lint

43d8576

lint

869a2a4

Lint

468b5da

Remove config option

c81207d

Fix bad merge

a36b131

Add disable_ftz to triton_meta_common

7fe953c

kundaMwiza force-pushed the mwizak/use-custom-triton-kernel-subclass-if-available branch from 289a5a3 to 7fe953c Compare January 27, 2026 16:27

Lint

b342108

pytorchmergebot added the merging label Jan 28, 2026

pytorchmergebot closed this in 52b13c6 Jan 28, 2026

pytorchmergebot removed the merging label Jan 28, 2026



		@unittest.skipIf(IS_FBCODE, "cpp_extension doesn't work in fbcode right now")
		@test_torchinductor.skip_if_cpp_wrapper(

Conversation

kundaMwiza commented Nov 10, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167456

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

kundaMwiza commented Nov 10, 2025

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

kundaMwiza Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

kundaMwiza Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

kundaMwiza Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

kundaMwiza Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

kundaMwiza Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

kundaMwiza commented Nov 21, 2025

Uh oh!

pytorchmergebot commented Nov 21, 2025

Merge started

Uh oh!

yangw-dev commented Nov 23, 2025

Uh oh!

pytorchmergebot commented Nov 23, 2025

Uh oh!

kundaMwiza commented Jan 21, 2026

Uh oh!

kundaMwiza commented Jan 28, 2026

Uh oh!

pytorchmergebot commented Jan 28, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kundaMwiza commented Nov 10, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 10, 2025 •

edited

Loading