Skip to content

[inductor] Use custom triton kernel subclass when available#167456

Closed
kundaMwiza wants to merge 22 commits intopytorch:mainfrom
graphcore:mwizak/use-custom-triton-kernel-subclass-if-available
Closed

[inductor] Use custom triton kernel subclass when available#167456
kundaMwiza wants to merge 22 commits intopytorch:mainfrom
graphcore:mwizak/use-custom-triton-kernel-subclass-if-available

Conversation

@kundaMwiza
Copy link
Copy Markdown
Collaborator

@kundaMwiza kundaMwiza commented Nov 10, 2025

This refactor replaces direct uses of TritonKernel in cases where a subclass type is available since out of tree / custom backends can:

  • have their own configs that they would like to place in inductor_meta via a TritonKernel subclass for the autotuner to handle
  • have their own triton heuristics for the different types of operations (pointwise, reduction e.t.c). These heuristics can currently only be reached by patching. This change allows custom backends to inject their own imports directly via a subclass

Example out of tree backends with their own heuristic modules:

Ascend NPU: https://github.com/Ascend/pytorch/blob/045a034dbcec287a5997aa13fd129a1cd6b1e215/torch_npu/_inductor/npu_triton_heuristics.py#L4

Intel XPU: https://github.com/intel/intel-extension-for-pytorch/blob/5dcc9d57e5422cf295e1a1ee97896d6b6a554a85/intel_extension_for_pytorch/_inductor/xpu/triton_ops/autotune.py

It also adds a triton_meta_common method that is analogous to inductor_meta_common that is overridable, so that compile options can be directly provided.

Test plan:

Added unit tests to test_triton_extension_backend.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @chenyang78

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Nov 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167456

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit b342108 with merge base ed18b31 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kundaMwiza kundaMwiza changed the title [inductor] Use custom triton kernel subclass if available [inductor] Use custom triton kernel subclass when available Nov 10, 2025
@kundaMwiza
Copy link
Copy Markdown
Collaborator Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 10, 2025
@bdhirsh bdhirsh requested review from eellison and jansel November 11, 2025 14:27
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 11, 2025
Copy link
Copy Markdown
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing tests?

Is there a test we could add to check this new behavior?

@kundaMwiza kundaMwiza force-pushed the mwizak/use-custom-triton-kernel-subclass-if-available branch 2 times, most recently from e7d7329 to 04cc3ca Compare November 19, 2025 10:48
filename=None,
inductor_meta=None,
custom_kernel=False,
caching_autotuner_cls: type[CachingAutotuner] = CachingAutotuner,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows custom heuristics modules to pass in their subclasses

def define_subgraph_launcher_fn(self, name: str, subgraph_code):
self.subgraph_definitions.splice(subgraph_code.value)

@classmethod
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't put the kernel type on the class because there would be an import cycle. Happy to know of other alternatives to this



@unittest.skipIf(IS_FBCODE, "cpp_extension doesn't work in fbcode right now")
@test_torchinductor.skip_if_cpp_wrapper(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decorator only works on test methods, so this test class is currently not discoverable on main

)

@requires_cuda_and_triton
def test_codegen_with_custom_heuristics_module(self):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jansel Added some tests

@kundaMwiza kundaMwiza requested a review from jansel November 19, 2025 10:56
Construct @triton.heuristics() based on size_hints.
"""
configs = [triton_heuristics.Config({"XBLOCK": 32})]
return triton_heuristics.cached_autotune(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kundaMwiza
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 21, 2025
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@yangw-dev
Copy link
Copy Markdown
Contributor

@pytorchbot revert -m "failed internal test Diff D87660150 , errorl ModuleNotFoundError: No module named 'extension_backends'" -c ghfirst

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@kundaMwiza
Copy link
Copy Markdown
Collaborator Author

@jansel The failing jobs are also broken on trunk - they are just the dynamic shapes variants of the tests. I had to rebase this PR with main rather than viable/strict because of merge conflicts with main

@kundaMwiza kundaMwiza force-pushed the mwizak/use-custom-triton-kernel-subclass-if-available branch from 289a5a3 to 7fe953c Compare January 27, 2026 16:27
@kundaMwiza
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor open source Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants