[Inductor][NVGEMM] Refactor tests by NikhilAPatel · Pull Request #176847 · pytorch/pytorch

NikhilAPatel · 2026-03-08T22:17:27Z

Stack from ghstack (oldest at bottom):

[Inductor] Fix benchmark_example_value losing dtype on view unwrap #176859
-> [Inductor][NVGEMM] Refactor tests #176847
[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching #176845
[Inductor][NVGEMM] Restore fp4 kernel docstrings #176549
[Inductor][NVGEMM] Enable nvMatmulHeuristics for FP4 blockscaled GEMM #176548
[Inductor][NVGEMM] Register CuTeDSL Blockscaled GEMM with NVGEMM Backend #176547
[Inductor][NVGEMM] Add CuTeDSL Blockscaled GEMM Kernel #176546
[Inductor][NVGEMM] Patch cutlass_api FP4 dtype mapping #176545
[Inductor] Add FP4 tensor creation support for autotuning #176544
[Inductor][NVGEMM] Add infrastructure for registering custom kernels with NVGEMM Cutlass API #176543

Shared helpers (module-level):
- _round_up — deduplicated from two test methods
- _prep_k — deduplicated from two test methods
- _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
- _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set
Bug fixes:
- Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
- ceildiv moved to top-level import (was imported inside two test methods)
New test coverage:
- test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
- test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
- test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
- test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
- test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings
Consistency fixes:
- All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
- TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]

pytorch-bot · 2026-03-08T22:17:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176847

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Unrelated Failure

As of commit df2f946 with merge base 572f0d0 ():

NEW FAILURES - The following jobs have failed:

inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test/inductor/test_triton_kernels.py::TestUserKernelEpilogueFusion::test_fusion_custom_kernel_with_linebreaks
inductor / unit-test / inductor-test / test (inductor_cpp_wrapper, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test/inductor/test_triton_kernels.py::TestUserKernelEpilogueFusion::test_no_fusion_for_multiple_reads_on_mutated_tensor
pull / linux-jammy-py3.14t-clang15 / test (dynamo_wrapped, 2, 3, lf.linux.2xlarge) (gh)
test/test_serialization.py::TestSerialization::test_serialization_4gb_file

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: ac4c20a Pull Request resolved: #176847

pytorch-bot · 2026-03-08T22:18:30Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

ghstack-source-id: c3b4ae3 Pull Request resolved: pytorch/pytorch#176847

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

pytorchmergebot · 2026-03-17T17:36:02Z

Starting merge as part of PR stack under #176859

pytorchmergebot · 2026-03-22T23:11:00Z

Starting merge as part of PR stack under #176859

…176859) Pull Request resolved: #176859 Approved by: https://github.com/mlazos ghstack dependencies: #176543, #176544, #176545, #176546, #176547, #176548, #176549, #176845, #176847

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: pytorch#176847 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845

…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: pytorch#176847 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845

…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847

Shared helpers (module-level): - _round_up — deduplicated from two test methods - _prep_k — deduplicated from two test methods - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4) - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set Bug fixes: - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets - ceildiv moved to top-level import (was imported inside two test methods) New test coverage: - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5) - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1) - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded) - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings Consistency fixes: - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config() Pull Request resolved: pytorch#176847 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845

…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847

[Inductor][NVGEMM] Refactor tests

1902e0e

[ghstack-poisoned]

This was referenced Mar 8, 2026

[Inductor][NVGEMM] Add infrastructure for registering custom kernels with NVGEMM Cutlass API #176543

Closed

[Inductor] Add FP4 tensor creation support for autotuning #176544

Closed

[Inductor][NVGEMM] Patch cutlass_api FP4 dtype mapping #176545

Closed

NikhilAPatel mentioned this pull request Mar 8, 2026

[Inductor][NVGEMM] Add CuTeDSL Blockscaled GEMM Kernel #176546

Closed

pytorch-bot bot added ciflow/inductor module: inductor topic: not user facing topic category labels Mar 8, 2026

NikhilAPatel added a commit that referenced this pull request Mar 8, 2026

[Inductor][NVGEMM] Refactor tests

e7cd1c2

ghstack-source-id: ac4c20a Pull Request resolved: #176847

NikhilAPatel added ciflow/b200 and removed topic: not user facing topic category labels Mar 8, 2026

NikhilAPatel added the topic: not user facing topic category label Mar 8, 2026

NikhilAPatel mentioned this pull request Mar 9, 2026

[Inductor] Fix benchmark_example_value losing dtype on view unwrap #176859

Closed

NikhilAPatel added ciflow/trunk Trigger trunk jobs on your pull request ci-no-td Do not run TD on this PR labels Mar 9, 2026

NikhilAPatel added 4 commits March 9, 2026 13:29

NikhilAPatel requested a review from mlazos March 11, 2026 00:03

sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026

[Inductor][NVGEMM] Refactor tests

d94a558

ghstack-source-id: c3b4ae3 Pull Request resolved: pytorch/pytorch#176847

mlazos approved these changes Mar 16, 2026

View reviewed changes

NikhilAPatel added 3 commits March 16, 2026 14:53

pytorchmergebot added the Merged label Mar 22, 2026

pytorchmergebot closed this in e50dd94 Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor][NVGEMM] Refactor tests#176847

[Inductor][NVGEMM] Refactor tests#176847
NikhilAPatel wants to merge 10 commits intogh/NikhilAPatel/123/basefrom
gh/NikhilAPatel/123/head

NikhilAPatel commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2026

Uh oh!

pytorchmergebot commented Mar 17, 2026

Uh oh!

pytorchmergebot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NikhilAPatel commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176847

❌ 3 New Failures, 1 Unrelated Failure

Uh oh!

pytorch-bot bot commented Mar 8, 2026

This PR needs a release notes: label

Uh oh!

pytorchmergebot commented Mar 17, 2026

Uh oh!

pytorchmergebot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NikhilAPatel commented Mar 8, 2026 •

edited

Loading

pytorch-bot bot commented Mar 8, 2026 •

edited

Loading

This PR needs a `release notes:` label