[release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix by naromero77amd · Pull Request #2106 · ROCm/pytorch

naromero77amd · 2025-05-08T19:04:46Z

Align TunableOp UTs, features, and bug fixes with upstream PyTorch main

UTs:
pytorch#148982
pytorch#149930
pytorch#150142
pytorch#150463

Feature: offline tuning for submatrices:
pytorch#151138

Bug Fix: ScaledGEMM rowwise
pytorch#152403

Add unit test for new TunableOp BLAS logging feature. Requires this PR to be merged in first: pytorch#148979 Pull Request resolved: pytorch#148982 Approved by: https://github.com/jeffdaily

…49930) This PR is cleanup only. There are no feature changes or bug fixes. We create a TunableOp context manager for setting up and cleanup. We re-write TunableOp unit tests in terms of this context manager. Ultimately reduces the amount of copy-paste code. Pull Request resolved: pytorch#149930 Approved by: https://github.com/jeffdaily (cherry picked from commit 45b1173)

…ytorch#150142) Improvements to unit tests and warnings for unsupported cases in offline tuning. Here are more details: - Previously we only compared the OpSig for the untuned vs. tuned entries. This was not strict enough so we now compare OpSig+ParamSig. - The main offline and online UTs are now stricter to make sure we exercise the code paths for the four combinations of transA and transB. - Offline tuning does not support some tensor shapes. Emit warning and skip tuning. Pull Request resolved: pytorch#150142 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit ca2ffc2)

…rch#150463) This PR fixes two race conditions that occur when UT tests are run: - In a particular order within a single shard. - Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name. There were two other minor improvements to the UTs: - matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed. - bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful. Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions. Pull Request resolved: pytorch#150463 Approved by: https://github.com/jeffdaily (cherry picked from commit d0026fa)

This PR adds support for submatrices in offline tuning for: - GEMM - GEMM and bias - ScaledGEMM - Batch Strided GEMM New UTs to cover submatrices. Submatrices for strided batch API is not part of this PR and will be done seperately. There is also a bug fix for offline tuning for full matrix for GEMM and bias in the `NT` case. Offline and online UTs were updated to cover this corner case. To improve code readability, swapped definition of transA and transB. Pull Request resolved: pytorch#151138 Approved by: https://github.com/jeffdaily (cherry picked from commit f6c1cf0)

@mawong-amd

Fixes TunableOp ScaledGEMM regression for rowwise scaling caused by this pytorch#147548 Credit goes to @mawong-amd for fix. Pull Request resolved: pytorch#152403 Approved by: https://github.com/jeffdaily (cherry picked from commit ece1658)

rocm-repo-management-api · 2025-05-08T19:06:03Z

Jenkins build for 63bd33ca042db3cd6b471cd71c34f19107d693d4 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

naromero77amd · 2025-05-08T19:06:33Z

TunableOp UTs were run locally on MI300 and there were no regressions.

naromero77amd and others added 6 commits May 8, 2025 16:42

[ROCm][TunableOp] Unit test for TunableOp BLAS logging. (pytorch#148982)

bdfb31b

Add unit test for new TunableOp BLAS logging feature. Requires this PR to be merged in first: pytorch#148979 Pull Request resolved: pytorch#148982 Approved by: https://github.com/jeffdaily

naromero77amd changed the title ~~[release /2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix~~ [release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix May 8, 2025

naromero77amd requested review from jeffdaily, jithunnair-amd and pruthvistony May 8, 2025 19:07

pruthvistony approved these changes May 8, 2025

View reviewed changes

pruthvistony merged commit 3f73e8a into release/2.7 May 8, 2025
0 of 2 checks passed

pruthvistony deleted the release_/2.7_tunableop_up_cherrypicks branch May 8, 2025 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix#2106

[release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix#2106
pruthvistony merged 6 commits intorelease/2.7from
release_/2.7_tunableop_up_cherrypicks

naromero77amd commented May 8, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented May 8, 2025 •

edited

Loading

Uh oh!

naromero77amd commented May 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naromero77amd commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naromero77amd commented May 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naromero77amd commented May 8, 2025 •

edited

Loading

rocm-repo-management-api bot commented May 8, 2025 •

edited

Loading