Skip to content

[Inductor][NVGEMM] Add infrastructure for registering custom kernels with NVGEMM Cutlass API#176543

Closed
NikhilAPatel wants to merge 3 commits intogh/NikhilAPatel/112/basefrom
gh/NikhilAPatel/112/head
Closed

[Inductor][NVGEMM] Add infrastructure for registering custom kernels with NVGEMM Cutlass API#176543
NikhilAPatel wants to merge 3 commits intogh/NikhilAPatel/112/basefrom
gh/NikhilAPatel/112/head

Conversation

…ppers

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176543

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 3 Unrelated Failures

As of commit ff29833 with merge base 572f0d0 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 5, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

… kernel wrappers"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
@NikhilAPatel NikhilAPatel changed the title [Inductor][NVGEMM] Add infrastructure for vendored CuTeDSL kernel wrappers [Inductor][NVGEMM] Add infrastructure for registering custom kernels with NVGEMM Cutlass API Mar 5, 2026
@NikhilAPatel NikhilAPatel marked this pull request as ready for review March 6, 2026 00:03
@NikhilAPatel NikhilAPatel requested a review from mlazos March 6, 2026 00:03
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #176549

pytorchmergebot pushed a commit that referenced this pull request Mar 9, 2026
pytorchmergebot pushed a commit that referenced this pull request Mar 9, 2026
pytorchmergebot pushed a commit that referenced this pull request Mar 9, 2026
Instead of cloning this directly from the Cutlass repo via `setup.py`, we need to own it ourselves inside of Inductor to do some Tensor mode reordering due to the differences between how Inductor and this kernel need the dims ordered

Pull Request resolved: #176546
Approved by: https://github.com/mlazos
ghstack dependencies: #176543, #176544, #176545
pytorchmergebot pushed a commit that referenced this pull request Mar 9, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…kernels with NVGEMM Cutlass API (pytorch#176543)"

This reverts commit 9e49f44.

Reverted pytorch#176543 on behalf of https://github.com/zou3519 due to broke CI ([comment](pytorch#176543 (comment)))
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Instead of cloning this directly from the Cutlass repo via `setup.py`, we need to own it ourselves inside of Inductor to do some Tensor mode reordering due to the differences between how Inductor and this kernel need the dims ordered

Pull Request resolved: pytorch#176546
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Needed to remove some docstrings from pytorch#176546 in order to fit in the 2000 LoC limit. This PR adds them back.

Pull Request resolved: pytorch#176549
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
  Shared helpers (module-level):
  - _round_up — deduplicated from two test methods
  - _prep_k — deduplicated from two test methods
  - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
  - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set

  Bug fixes:
  - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
  - ceildiv moved to top-level import (was imported inside two test methods)

  New test coverage:
  - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
  - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
  - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
  - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
  - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings

  Consistency fixes:
  - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
  - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()

Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants