[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching by NikhilAPatel · Pull Request #176845 · pytorch/pytorch

NikhilAPatel · 2026-03-08T21:43:54Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]

pytorch-bot · 2026-03-08T21:43:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176845

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 0112f05 with merge base 572f0d0 ():

NEW FAILURES - The following jobs have failed:

inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test/inductor/test_triton_kernels.py::TestUserKernelEpilogueFusion::test_fusion_custom_kernel_with_linebreaks
inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test/inductor/test_triton_kernels.py::TestUserKernelEpilogueFusion::test_no_fusion_for_multiple_reads_on_mutated_tensor

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-py3.14t-clang15 / test (dynamo_wrapped, 2, 3, lf.linux.2xlarge) (gh) (disabled by #162677 but the issue was closed recently and a rebase is needed to make it pass)
test_overrides.py::TestTorchFunctionMode::test_reentrant_mode_idiom

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-08T21:44:01Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

ghstack-source-id: 88e59c6 Pull Request resolved: #176845

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

ghstack-source-id: 8a04297 Pull Request resolved: pytorch/pytorch#176845

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

pytorchmergebot · 2026-03-17T17:36:01Z

Starting merge as part of PR stack under #176859

pytorchmergebot · 2026-03-22T23:10:59Z

Starting merge as part of PR stack under #176859

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: #176847 Approved by: https://github.com/mlazos ghstack dependencies: #176543, #176544, #176545, #176546, #176547, #176548, #176549, #176845

…176859) Pull Request resolved: #176859 Approved by: https://github.com/mlazos ghstack dependencies: #176543, #176544, #176545, #176546, #176547, #176548, #176549, #176845, #176847

…rch#176845) Pull Request resolved: pytorch#176845 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: pytorch#176847 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845

…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847

…rch#176845) Pull Request resolved: pytorch#176845 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: pytorch#176847 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845

…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847

…rch#176845) Pull Request resolved: pytorch#176845 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: pytorch#176847 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845

…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching

fc37522

[ghstack-poisoned]

This was referenced Mar 6, 2026

[Inductor][NVGEMM] Add infrastructure for registering custom kernels with NVGEMM Cutlass API #176543

Closed

[Inductor] Add FP4 tensor creation support for autotuning #176544

Closed

NikhilAPatel mentioned this pull request Mar 6, 2026

[Inductor][NVGEMM] Patch cutlass_api FP4 dtype mapping #176545

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Mar 8, 2026

This was referenced Mar 6, 2026

[Inductor][NVGEMM] Add CuTeDSL Blockscaled GEMM Kernel #176546

Closed

[Inductor][NVGEMM] Register CuTeDSL Blockscaled GEMM with NVGEMM Backend #176547

Closed

This was referenced Mar 8, 2026

[Inductor][NVGEMM] Enable nvMatmulHeuristics for FP4 blockscaled GEMM #176548

Closed

[Inductor][NVGEMM] Restore fp4 kernel docstrings #176549

Closed

NikhilAPatel added a commit that referenced this pull request Mar 8, 2026

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching

1a28c88

ghstack-source-id: 88e59c6 Pull Request resolved: #176845

NikhilAPatel added topic: not user facing topic category ciflow/b200 labels Mar 8, 2026

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

e05ec31

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

NikhilAPatel mentioned this pull request Mar 8, 2026

[Inductor][NVGEMM] Refactor tests #176847

Closed

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

5cabf02

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

NikhilAPatel mentioned this pull request Mar 9, 2026

[Inductor] Fix benchmark_example_value losing dtype on view unwrap #176859

Closed

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

4e647e8

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

NikhilAPatel added ciflow/trunk Trigger trunk jobs on your pull request ci-no-td Do not run TD on this PR labels Mar 9, 2026

NikhilAPatel added 3 commits March 9, 2026 13:29

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

ab0b0b3

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

d09123e

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

4d7101e

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

pytorch-bot bot added the ciflow/torchtitan Run TorchTitan integration tests label Mar 10, 2026

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

490e736

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

NikhilAPatel requested a review from mlazos March 11, 2026 00:03

pytorch deleted a comment from pytorch-bot bot Mar 11, 2026

sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching

6a81be7

ghstack-source-id: 8a04297 Pull Request resolved: pytorch/pytorch#176845

Update on "[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics mat…

0112f05

…ching" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

pytorchmergebot added the Merged label Mar 22, 2026

pytorchmergebot closed this in c20451d Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching#176845

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching#176845
NikhilAPatel wants to merge 12 commits intogh/NikhilAPatel/122/basefrom
gh/NikhilAPatel/122/head

NikhilAPatel commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2026

Uh oh!

pytorchmergebot commented Mar 17, 2026

Uh oh!

pytorchmergebot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NikhilAPatel commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176845

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

pytorch-bot bot commented Mar 8, 2026

This PR needs a release notes: label

Uh oh!

pytorchmergebot commented Mar 17, 2026

Uh oh!

pytorchmergebot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NikhilAPatel commented Mar 8, 2026 •

edited

Loading

pytorch-bot bot commented Mar 8, 2026 •

edited

Loading

This PR needs a `release notes:` label