The test test_mm_plus_mm3_gpu_wrapper in inductor/test_gpu_cpp_wrapper.py fails with [XPASS(strict)] on CUDA because of the xfail.
Test Class: inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper
Platform: CUDA
Error: XPASS(strict) - test was expected to fail but passes
Steps to Reproduce
- Build PyTorch with CUDA support.
- Run:
python test/run_test.py -i inductor/test_gpu_cpp_wrapper -k test_mm_plus_mm3
- Observe
TestGpuWrapper.test_mm_plus_mm3_gpu_wrapper failing with [XPASS(strict)] because it passes but is marked as expected to fail.
Root Cause
PR #172780 added an xfail on test_mm_plus_mm3 in test_select_algorithm.py with the reason "C++ wrapper dynamic shapes fails on CUDA, fixed on ROCm."
On CUDA, the test TestGpuWrapper.test_mm_plus_mm3_gpu_wrapper actually passes, so it fails with [XPASS(strict)] error. Only the dynamic shapes variant DynamicShapesGpuWrapperGpuTests.test_mm_plus_mm3_dynamic_shapes_gpu_wrapper fails on CUDA.
So the xfail is too broad: it marks both tests as expected to fail when only the dynamic shapes run fails.
Proposed fix:
Remove the xfail from test_mm_plus_mm3 and skip the dynamic shapes variant on CUDA in test_failures_gpu_wrapper (similar to test_mm_plus_mm2_dynamic_shapes). So that The TestGpuWrapper.test_mm_plus_mm3_gpu_wrapper test runs and passes on CUDA.
The dynamic shapes test is skipped on CUDA (where it fails) and runs on ROCm (where it passes).
Hence, for test_mm_plus_mm3, if we remove the xfail on the test and use a skip in test_failures_gpu_wrapper the dynamic variant which is known to fail on CUDA is skipped and the XPASS issue will be resolved.
I have a fix for this and if this is a good approach I can go ahead with this.
Code References
Base test: test/inductor/test_select_algorithm.py:299 (test_mm_plus_mm3)
GPU wrapper config: test/inductor/test_gpu_cpp_wrapper.py:107-125 (test_failures_gpu_wrapper)
CC: @eellison @Chillee @shunting314 @jansel @jgong5 @ngimel @ezyang @ptrblck @csarofeen @ajtulloch @zheng-xq @morrison-turnansky @stmcgovern @cleonard530 @jewelkm89 @adabeyta @groenenboomj
cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo
The test
test_mm_plus_mm3_gpu_wrapperininductor/test_gpu_cpp_wrapper.pyfails with [XPASS(strict)] on CUDA because of the xfail.Test Class: inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper
Platform: CUDA
Error: XPASS(strict) - test was expected to fail but passes
Steps to Reproduce
python test/run_test.py -i inductor/test_gpu_cpp_wrapper -k test_mm_plus_mm3TestGpuWrapper.test_mm_plus_mm3_gpu_wrapperfailing with [XPASS(strict)] because it passes but is marked as expected to fail.Root Cause
PR #172780 added an xfail on
test_mm_plus_mm3intest_select_algorithm.pywith the reason "C++ wrapper dynamic shapes fails on CUDA, fixed on ROCm."On CUDA, the test
TestGpuWrapper.test_mm_plus_mm3_gpu_wrapperactually passes, so it fails with[XPASS(strict)]error. Only the dynamic shapes variantDynamicShapesGpuWrapperGpuTests.test_mm_plus_mm3_dynamic_shapes_gpu_wrapperfails on CUDA.So the xfail is too broad: it marks both tests as expected to fail when only the dynamic shapes run fails.
Proposed fix:
Remove the xfail from
test_mm_plus_mm3and skip the dynamic shapes variant on CUDA intest_failures_gpu_wrapper(similar to test_mm_plus_mm2_dynamic_shapes). So that TheTestGpuWrapper.test_mm_plus_mm3_gpu_wrappertest runs and passes on CUDA.The dynamic shapes test is skipped on CUDA (where it fails) and runs on ROCm (where it passes).
Hence, for
test_mm_plus_mm3, if we remove the xfail on the test and use a skip intest_failures_gpu_wrapperthe dynamic variant which is known to fail on CUDA is skipped and the XPASS issue will be resolved.I have a fix for this and if this is a good approach I can go ahead with this.
Code References
Base test: test/inductor/test_select_algorithm.py:299 (test_mm_plus_mm3)
GPU wrapper config: test/inductor/test_gpu_cpp_wrapper.py:107-125 (test_failures_gpu_wrapper)
CC: @eellison @Chillee @shunting314 @jansel @jgong5 @ngimel @ezyang @ptrblck @csarofeen @ajtulloch @zheng-xq @morrison-turnansky @stmcgovern @cleonard530 @jewelkm89 @adabeyta @groenenboomj
cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo