[release/2.7][ROCm][TunableOp] UTs, submatrix offline tuning, and ScaledGEMM rowwise fix#2106
Merged
pruthvistony merged 6 commits intorelease/2.7from May 8, 2025
Merged
Conversation
Add unit test for new TunableOp BLAS logging feature. Requires this PR to be merged in first: pytorch#148979 Pull Request resolved: pytorch#148982 Approved by: https://github.com/jeffdaily
…49930) This PR is cleanup only. There are no feature changes or bug fixes. We create a TunableOp context manager for setting up and cleanup. We re-write TunableOp unit tests in terms of this context manager. Ultimately reduces the amount of copy-paste code. Pull Request resolved: pytorch#149930 Approved by: https://github.com/jeffdaily (cherry picked from commit 45b1173)
…ytorch#150142) Improvements to unit tests and warnings for unsupported cases in offline tuning. Here are more details: - Previously we only compared the OpSig for the untuned vs. tuned entries. This was not strict enough so we now compare OpSig+ParamSig. - The main offline and online UTs are now stricter to make sure we exercise the code paths for the four combinations of transA and transB. - Offline tuning does not support some tensor shapes. Emit warning and skip tuning. Pull Request resolved: pytorch#150142 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit ca2ffc2)
…rch#150463) This PR fixes two race conditions that occur when UT tests are run: - In a particular order within a single shard. - Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name. There were two other minor improvements to the UTs: - matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed. - bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful. Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions. Pull Request resolved: pytorch#150463 Approved by: https://github.com/jeffdaily (cherry picked from commit d0026fa)
This PR adds support for submatrices in offline tuning for: - GEMM - GEMM and bias - ScaledGEMM - Batch Strided GEMM New UTs to cover submatrices. Submatrices for strided batch API is not part of this PR and will be done seperately. There is also a bug fix for offline tuning for full matrix for GEMM and bias in the `NT` case. Offline and online UTs were updated to cover this corner case. To improve code readability, swapped definition of transA and transB. Pull Request resolved: pytorch#151138 Approved by: https://github.com/jeffdaily (cherry picked from commit f6c1cf0)
Fixes TunableOp ScaledGEMM regression for rowwise scaling caused by this pytorch#147548 Credit goes to @mawong-amd for fix. Pull Request resolved: pytorch#152403 Approved by: https://github.com/jeffdaily (cherry picked from commit ece1658)
|
Jenkins build for 63bd33ca042db3cd6b471cd71c34f19107d693d4 commit finished as FAILURE |
Author
|
TunableOp UTs were run locally on MI300 and there were no regressions. |
pruthvistony
approved these changes
May 8, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Align TunableOp UTs, features, and bug fixes with upstream PyTorch main
UTs:
pytorch#148982
pytorch#149930
pytorch#150142
pytorch#150463
Feature: offline tuning for submatrices:
pytorch#151138
Bug Fix: ScaledGEMM rowwise
pytorch#152403