Skip to content

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching#176845

Closed
NikhilAPatel wants to merge 12 commits intogh/NikhilAPatel/122/basefrom
gh/NikhilAPatel/122/head
Closed

[Inductor][NVGEMM] Drop tile_k from nvMatmulHeuristics matching#176845
NikhilAPatel wants to merge 12 commits intogh/NikhilAPatel/122/basefrom
gh/NikhilAPatel/122/head

Conversation

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 8, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176845

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 0112f05 with merge base 572f0d0 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 8, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
@NikhilAPatel NikhilAPatel added ciflow/trunk Trigger trunk jobs on your pull request ci-no-td Do not run TD on this PR labels Mar 9, 2026
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the ciflow/torchtitan Run TorchTitan integration tests label Mar 10, 2026
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
@NikhilAPatel NikhilAPatel requested a review from mlazos March 11, 2026 00:03
@pytorch pytorch deleted a comment from pytorch-bot bot Mar 11, 2026
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
…ching"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #176859

1 similar comment
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #176859

pytorchmergebot pushed a commit that referenced this pull request Mar 22, 2026
  Shared helpers (module-level):
  - _round_up — deduplicated from two test methods
  - _prep_k — deduplicated from two test methods
  - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
  - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set

  Bug fixes:
  - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
  - ceildiv moved to top-level import (was imported inside two test methods)

  New test coverage:
  - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
  - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
  - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
  - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
  - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings

  Consistency fixes:
  - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
  - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()

Pull Request resolved: #176847
Approved by: https://github.com/mlazos
ghstack dependencies: #176543, #176544, #176545, #176546, #176547, #176548, #176549, #176845
dshi7 pushed a commit to dshi7/pytorch that referenced this pull request Mar 23, 2026
  Shared helpers (module-level):
  - _round_up — deduplicated from two test methods
  - _prep_k — deduplicated from two test methods
  - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
  - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set

  Bug fixes:
  - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
  - ceildiv moved to top-level import (was imported inside two test methods)

  New test coverage:
  - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
  - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
  - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
  - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
  - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings

  Consistency fixes:
  - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
  - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()

Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
  Shared helpers (module-level):
  - _round_up — deduplicated from two test methods
  - _prep_k — deduplicated from two test methods
  - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
  - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set

  Bug fixes:
  - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
  - ceildiv moved to top-level import (was imported inside two test methods)

  New test coverage:
  - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
  - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
  - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
  - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
  - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings

  Consistency fixes:
  - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
  - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()

Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
nklshy-aws pushed a commit to nklshy-aws/pytorch that referenced this pull request Apr 7, 2026
  Shared helpers (module-level):
  - _round_up — deduplicated from two test methods
  - _prep_k — deduplicated from two test methods
  - _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
  - _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set

  Bug fixes:
  - Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
  - ceildiv moved to top-level import (was imported inside two test methods)

  New test coverage:
  - test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
  - test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
  - test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
  - test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
  - test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings

  Consistency fixes:
  - All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
  - TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()

Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/b200 ciflow/inductor ciflow/torchtitan Run TorchTitan integration tests ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants