[release/2.5][ROCm][TunableOp] Improve identification of fastest solution (#144942)#2018
Conversation
…#144942) This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300. Changes include: - An improved timer, StreamTimerNoSync - More aggressive skipping of slow solutions - Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3 Pull Request resolved: pytorch#144942 Approved by: https://github.com/jeffdaily (cherry picked from commit fd0cd6a)
|
This is a performance improvement from upstream. So far, there have been no negative reports w.r.t. to performance. So, I think it's worth backporting. I will also add it to ROCm release/2.6. It cannot be trivially backported to release/2.4. |
|
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
|
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
|
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
|
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
|
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit is in progress |
|
!cherry-pick --onto release/2.6 |
…tion (pytorch#144942) (#2018) This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300. Changes include: - An improved timer, StreamTimerNoSync - More aggressive skipping of slow solutions - Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3 Pull Request resolved: pytorch#144942 Approved by: https://github.com/jeffdaily (cherry picked from commit fd0cd6a)
|
Created branch autogenerated/release/2.6_cherry-pick_pr-2018 and #2041 |
… of fastest solution (pytorch#144942) (#2041) Cherry-pick of #2018 Co-authored-by: Nichols A. Romero <165712832+naromero77amd@users.noreply.github.com>
This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300.
Changes include:
Pull Request resolved: pytorch#144942
Approved by: https://github.com/jeffdaily
(cherry picked from commit fd0cd6a)