[ROCm][CI] Fix failing FP8 tests on RDNA4 (#174873) by mstankov-amd · Pull Request #3090 · ROCm/pytorch

mstankov-amd · 2026-03-20T10:27:28Z

Summary

This PR fixes FP8 inductor test failures that occur on AMD RDNA4 GPUs when testing matrix multiplications with small M dimensions (M < 16).

Problem

On gfx120x GPUs, FP8 scaled matrix multiplication tests fail with:

92.4% NaN outputs when M < BLOCK_M (typically 16)
Large numerical mismatches between eager and compiled results
Only occurs in max-autotune mode

Root cause: Autotuned Triton kernels on gfx120x generate incorrect tensor indexing for small M values, using partial indices instead of full computed indices in load/store operations.

Solution

Added GPU-specific compile mode selection for small M values
gfx120x with M < 16: use compile_mode="default"
All other cases: use compile_mode="max-autotune"

Pull Request resolved: pytorch#174873
Approved by: https://github.com/jeffdaily

(cherry picked from commit d667ffe)

Cherry-picked to release/2.10 branch via #3091

Cherry-picked to release/2.9 branch via #3092

mstankov-amd · 2026-03-20T10:30:45Z

!cherry-pick --onto release/2.10

mstankov-amd · 2026-03-20T10:31:12Z

!cherry-pick --onto release/2.9

## Summary This PR fixes FP8 inductor test failures that occur on AMD RDNA4 GPUs when testing matrix multiplications with small M dimensions (M < 16). ## Problem On gfx120x GPUs, FP8 scaled matrix multiplication tests fail with: - 92.4% NaN outputs when M < BLOCK_M (typically 16) - Large numerical mismatches between eager and compiled results - Only occurs in `max-autotune` mode **Root cause:** Autotuned Triton kernels on gfx120x generate incorrect tensor indexing for small M values, using partial indices instead of full computed indices in load/store operations. ## Solution - Added GPU-specific compile mode selection for small M values - gfx120x with M < 16: use `compile_mode="default"` - All other cases: use `compile_mode="max-autotune"` Pull Request resolved: pytorch#174873 Approved by: https://github.com/jeffdaily (cherry picked from commit d667ffe)

Cherry-pick of #3090 Co-authored-by: Milica Stankovic <milica.stankovic@amd.com>

rocm-repo-management-api-6 · 2026-03-20T11:09:18Z

Created branch autogenerated/release/2.10_cherry-pick_pr-3090 and #3091

Comment processed by Build

## Summary This PR fixes FP8 inductor test failures that occur on AMD RDNA4 GPUs when testing matrix multiplications with small M dimensions (M < 16). ## Problem On gfx120x GPUs, FP8 scaled matrix multiplication tests fail with: - 92.4% NaN outputs when M < BLOCK_M (typically 16) - Large numerical mismatches between eager and compiled results - Only occurs in `max-autotune` mode **Root cause:** Autotuned Triton kernels on gfx120x generate incorrect tensor indexing for small M values, using partial indices instead of full computed indices in load/store operations. ## Solution - Added GPU-specific compile mode selection for small M values - gfx120x with M < 16: use `compile_mode="default"` - All other cases: use `compile_mode="max-autotune"` Pull Request resolved: pytorch#174873 Approved by: https://github.com/jeffdaily (cherry picked from commit d667ffe)

rocm-repo-management-api-6 · 2026-03-20T11:19:04Z

Created branch autogenerated/release/2.9_cherry-pick_pr-3090 and #3092

Comment processed by Build

Cherry-pick of #3090 Co-authored-by: Milica Stankovic <milica.stankovic@amd.com>

## Summary This PR fixes FP8 inductor test failures that occur on AMD RDNA4 GPUs when testing matrix multiplications with small M dimensions (M < 16). ## Problem On gfx120x GPUs, FP8 scaled matrix multiplication tests fail with: - 92.4% NaN outputs when M < BLOCK_M (typically 16) - Large numerical mismatches between eager and compiled results - Only occurs in `max-autotune` mode **Root cause:** Autotuned Triton kernels on gfx120x generate incorrect tensor indexing for small M values, using partial indices instead of full computed indices in load/store operations. ## Solution - Added GPU-specific compile mode selection for small M values - gfx120x with M < 16: use `compile_mode="default"` - All other cases: use `compile_mode="max-autotune"` Pull Request resolved: pytorch#174873 Approved by: https://github.com/jeffdaily (cherry picked from commit d667ffe)

mstankov-amd merged commit c92f998 into release/2.11 Mar 20, 2026
1 check passed

mstankov-amd deleted the fix_faling_fp8_on_gfx120x branch March 20, 2026 10:29

rocm-repo-management-api-6 bot mentioned this pull request Mar 20, 2026

[AUTOGENERATED] [release/2.10] [ROCm][CI] Fix failing FP8 tests on RDNA4 (#174873) #3091

Merged

rocm-repo-management-api-6 bot added a commit that referenced this pull request Mar 20, 2026

[ROCm][CI] Fix failing FP8 tests on RDNA4 (pytorch#174873)

c19ca7a

Cherry-pick of #3090 Co-authored-by: Milica Stankovic <milica.stankovic@amd.com>

rocm-repo-management-api-6 bot mentioned this pull request Mar 20, 2026

[AUTOGENERATED] [release/2.9] [ROCm][CI] Fix failing FP8 tests on RDNA4 (#174873) #3092

Merged

rocm-repo-management-api-6 bot added a commit that referenced this pull request Mar 20, 2026

[ROCm][CI] Fix failing FP8 tests on RDNA4 (pytorch#174873)

eb12231

Cherry-pick of #3090 Co-authored-by: Milica Stankovic <milica.stankovic@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm][CI] Fix failing FP8 tests on RDNA4 (#174873)#3090

[ROCm][CI] Fix failing FP8 tests on RDNA4 (#174873)#3090
mstankov-amd merged 1 commit intorelease/2.11from
fix_faling_fp8_on_gfx120x

mstankov-amd commented Mar 20, 2026 •

edited by rocm-repo-management-api-6 bot

Loading

Uh oh!

Uh oh!

mstankov-amd commented Mar 20, 2026

Uh oh!

mstankov-amd commented Mar 20, 2026

Uh oh!

rocm-repo-management-api-6 bot commented Mar 20, 2026

Uh oh!

rocm-repo-management-api-6 bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mstankov-amd commented Mar 20, 2026 • edited by rocm-repo-management-api-6 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Uh oh!

Uh oh!

mstankov-amd commented Mar 20, 2026

Uh oh!

mstankov-amd commented Mar 20, 2026

Uh oh!

rocm-repo-management-api-6 bot commented Mar 20, 2026

Uh oh!

rocm-repo-management-api-6 bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mstankov-amd commented Mar 20, 2026 •

edited by rocm-repo-management-api-6 bot

Loading