Skip to content

[release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix#2106

Merged
pruthvistony merged 6 commits intorelease/2.7from
release_/2.7_tunableop_up_cherrypicks
May 8, 2025
Merged

[release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix#2106
pruthvistony merged 6 commits intorelease/2.7from
release_/2.7_tunableop_up_cherrypicks

Conversation

@naromero77amd
Copy link

@naromero77amd naromero77amd commented May 8, 2025

Align TunableOp UTs, features, and bug fixes with upstream PyTorch main

UTs:
pytorch#148982
pytorch#149930
pytorch#150142
pytorch#150463

Feature: offline tuning for submatrices:
pytorch#151138

Bug Fix: ScaledGEMM rowwise
pytorch#152403

naromero77amd and others added 6 commits May 8, 2025 16:42
Add unit test for new TunableOp BLAS logging feature.

Requires this PR to be merged in first: pytorch#148979

Pull Request resolved: pytorch#148982
Approved by: https://github.com/jeffdaily
…49930)

This PR is cleanup only. There are no feature changes or bug fixes.

We create a TunableOp context manager for setting up and cleanup. We re-write TunableOp unit tests in terms of this context manager. Ultimately reduces the amount of copy-paste code.

Pull Request resolved: pytorch#149930
Approved by: https://github.com/jeffdaily

(cherry picked from commit 45b1173)
…ytorch#150142)

Improvements to unit tests and warnings for unsupported cases in offline tuning. Here are more details:
- Previously we only compared the OpSig for the untuned vs. tuned entries. This was not strict enough so we now compare OpSig+ParamSig.
- The main offline and online UTs are now stricter to make sure we exercise the code paths for the four combinations of transA and transB.
- Offline tuning does not support some tensor shapes. Emit warning and skip tuning.

Pull Request resolved: pytorch#150142
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
(cherry picked from commit ca2ffc2)
…rch#150463)

This PR fixes two race conditions that occur when UT tests are run:
- In a particular order within a single shard.
- Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name.

There were two other minor improvements to the UTs:
- matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed.
- bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful.

Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions.

Pull Request resolved: pytorch#150463
Approved by: https://github.com/jeffdaily

(cherry picked from commit d0026fa)
This PR adds support for submatrices in offline tuning for:
- GEMM
- GEMM and bias
- ScaledGEMM
- Batch Strided GEMM

New UTs to cover submatrices. Submatrices for strided batch API is not part of this PR and will be done seperately.

There is also a bug fix for offline tuning for full matrix for GEMM and bias in the `NT` case. Offline and online UTs were updated to cover this corner case.

To improve code readability, swapped definition of transA and transB.

Pull Request resolved: pytorch#151138
Approved by: https://github.com/jeffdaily

(cherry picked from commit f6c1cf0)
Fixes TunableOp ScaledGEMM regression for rowwise scaling caused by this pytorch#147548

Credit goes to @mawong-amd for fix.

Pull Request resolved: pytorch#152403
Approved by: https://github.com/jeffdaily

(cherry picked from commit ece1658)
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 8, 2025

Jenkins build for 63bd33ca042db3cd6b471cd71c34f19107d693d4 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@naromero77amd
Copy link
Author

TunableOp UTs were run locally on MI300 and there were no regressions.

@naromero77amd naromero77amd changed the title [release /2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix [release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix May 8, 2025
@pruthvistony pruthvistony merged commit 3f73e8a into release/2.7 May 8, 2025
0 of 2 checks passed
@pruthvistony pruthvistony deleted the release_/2.7_tunableop_up_cherrypicks branch May 8, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants