[Inductor][NVGEMM] Enable nvMatmulHeuristics for FP4 blockscaled GEMM#176548
[Inductor][NVGEMM] Enable nvMatmulHeuristics for FP4 blockscaled GEMM#176548NikhilAPatel wants to merge 9 commits intogh/NikhilAPatel/117/basefrom
Conversation
Authored with Claude. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176548
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 Unrelated FailuresAs of commit af585a9 with merge base 572f0d0 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…scaled GEMM" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…scaled GEMM" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
mlazos
left a comment
There was a problem hiding this comment.
Should we test this as well?
…scaled GEMM" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
…scaled GEMM" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
|
Starting merge as part of PR stack under #176549 |
…led GEMM (#176548)" This reverts commit 9db6179. Reverted #176548 on behalf of https://github.com/zou3519 due to broke CI ([comment](#176543 (comment)))
Shared helpers (module-level):
- _round_up — deduplicated from two test methods
- _prep_k — deduplicated from two test methods
- _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
- _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set
Bug fixes:
- Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
- ceildiv moved to top-level import (was imported inside two test methods)
New test coverage:
- test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
- test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
- test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
- test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
- test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings
Consistency fixes:
- All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
- TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()
Pull Request resolved: #176847
Approved by: https://github.com/mlazos
ghstack dependencies: #176543, #176544, #176545, #176546, #176547, #176548, #176549, #176845
…rch#176845) Pull Request resolved: pytorch#176845 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549
Shared helpers (module-level):
- _round_up — deduplicated from two test methods
- _prep_k — deduplicated from two test methods
- _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
- _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set
Bug fixes:
- Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
- ceildiv moved to top-level import (was imported inside two test methods)
New test coverage:
- test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
- test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
- test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
- test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
- test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings
Consistency fixes:
- All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
- TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()
Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847
…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847
…pytorch#176548) Pull Request resolved: pytorch#176548 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547
Needed to remove some docstrings from pytorch#176546 in order to fit in the 2000 LoC limit. This PR adds them back. Pull Request resolved: pytorch#176549 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548
…led GEMM (pytorch#176548)" This reverts commit 9db6179. Reverted pytorch#176548 on behalf of https://github.com/zou3519 due to broke CI ([comment](pytorch#176543 (comment)))
…pytorch#176548) Pull Request resolved: pytorch#176548 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547
Needed to remove some docstrings from pytorch#176546 in order to fit in the 2000 LoC limit. This PR adds them back. Pull Request resolved: pytorch#176549 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548
…rch#176845) Pull Request resolved: pytorch#176845 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549
Shared helpers (module-level):
- _round_up — deduplicated from two test methods
- _prep_k — deduplicated from two test methods
- _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
- _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set
Bug fixes:
- Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
- ceildiv moved to top-level import (was imported inside two test methods)
New test coverage:
- test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
- test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
- test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
- test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
- test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings
Consistency fixes:
- All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
- TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()
Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847
…rch#176845) Pull Request resolved: pytorch#176845 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549
Shared helpers (module-level):
- _round_up — deduplicated from two test methods
- _prep_k — deduplicated from two test methods
- _create_tensor_with_layout — unified layout creation for all dtypes (float16, bf16, fp8, fp4)
- _nvgemm_config — standard config dict with autotune_fallback_to_aten: False always set
Bug fixes:
- Added missing torch._dynamo.reset() to test_scaled_gemm_mxfp8, test_scaled_gemm_nvf4, test_grouped_gemm, test_grouped_gemm_varying_offsets
- ceildiv moved to top-level import (was imported inside two test methods)
New test coverage:
- test_matmul: added ("contiguous", "aligned_offset") and ("contiguous", "padded") layout combos (7 combos, up from 5)
- test_scaled_gemm_mxfp8: added shape parametrization (4 shapes, was 1)
- test_grouped_gemm: added layout_a parametrization (contiguous, aligned_offset, view, padded)
- test_grouped_gemm_varying_offsets: split out from original test_grouped_gemm — tests different offset distributions separately
- test_fp8_heuristic_configs: new heuristics integration test for FP8 precision strings
Consistency fixes:
- All tests now use _nvgemm_config() with autotune_fallback_to_aten: False
- TestNVUniversalGemmDynamicShapes also uses _nvgemm_config()
Pull Request resolved: pytorch#176847
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845
…ytorch#176859) Pull Request resolved: pytorch#176859 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#176543, pytorch#176544, pytorch#176545, pytorch#176546, pytorch#176547, pytorch#176548, pytorch#176549, pytorch#176845, pytorch#176847
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo