[Reland] ROCm CI (Infra + Skips)#1581
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1581
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 900cf5b with merge base f6f3322 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
|
Thanks @andrewor14 I will work with AMD team on that |
|
@amdfaa just cherry-pick your infra changes into this PR so we can have a clearer CI signal. Please help review the changes. thx |
amdfaa
left a comment
There was a problem hiding this comment.
lgtm from the infra side
|
The breakage on cuda doesns't seem related to you, seems like it's this test FAILED test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_dyn_quant - torch._inductor.exc.CppCompileError: C++ compile error in which case @jerryzh168 might need to take a look |
|
@petrex Looks like more fixes/skips are needed: https://github.com/pytorch/ao/actions/runs/12934492712/job/36084046951?pr=1581 |
|
I see ROCm tests clean : but the job later fails with
|
|
Just to update on the latest status of this PR, we are almost done with enabling/skipping the functionality for the unit tests on ROCm, but we are finalizing the changes to ensure that the ROCm CI runs will run only on push to main branch for now, in light of our limited CI capacity. In a follow-up PR, we intend to expand torchao ROCm CI testing to torchao PRs as well. |
|
@jithunnair-amd maybe lint the changes? |
2da6cd8 to
61e86c2
Compare
80711c5 to
da9f271
Compare
da9f271 to
a6958d7
Compare
Add a skip decorator for ROCm to prevent test failures during ongoing ROCm enablement
Add ROCm skip decorator to prevent test failures during ongoing ROCm enablement
|
@msaroufim @supriyar Can we please get an approval from a torchao maintainer so we can merge this PR when we have a clean signal on ROCm CI (just adding more skips at this point)? |
|
@jithunnair-amd, for sure. @jcaip will be the one reviewing and approving from torchao side. |
|
@jcaip ROCm nightly CI passed: https://github.com/pytorch/ao/actions/runs/13464088074/job/37625918055?pr=1581 Please approve and merge, before any more unit test failures creep in :) |
This PR to skip the unit test failures for ROCm + infra changes to enable ROCm CI. **NOTE:** This PR aims to enable the ROCm CI testing for torchao _only for pushes to main branch_. The ROCm tests should start showing up here once this PR is merged: https://hud.pytorch.org/hud/pytorch/ao/main/1?per_page=50&name_filter=regression Torchao PRs can also trigger the ROCm CI runs using the `ciflow/rocm` PR label (#1749). Enabling ROCm CI testing on *all* torchao PRs will be done in a follow-up PR. This pull request introduces the `skip_if_rocm` decorator across various test files to skip tests that are not yet supported on ROCm. The changes ensure that tests are conditionally skipped if ROCm is detected, improving the test suite's compatibility with different environments. # Key changes include: ### Cherry-pick ROCm CI infra changes from #999 ### Configure workflow to trigger ROCm CI only for pushes to main branch, OR on PRs with the `ciflow/rocm` label ### Introduction of `skip_if_rocm` decorator: * Added `skip_if_rocm` import in multiple test files to conditionally skip tests not supported on ROCm. (`test/dtypes/test_affine_quantized.py`, `test/dtypes/test_floatx.py`, `test/float8/test_base.py`, `test/hqq/test_hqq_affine.py`, `test/integration/test_integration.py`, `test/kernel/test_galore_downproj.py`, `test/prototype/test_awq.py`, `test/prototype/test_low_bit_optim.py`, `test/prototype/test_splitk.py`, `test/quantization/test_galore_quant.py`, `test/quantization/test_marlin_qqq.py`, `test/sparsity/test_marlin.py`, `test/test_ops.py`, `test/test_s8s4_linear_cutlass.py`, `torchao/utils.py`) [[1]](diffhunk://#diff-31b1ffcd78674b79cc65749176354ea4743683070120034709c1da7a3eac31f6R24) [[2]](diffhunk://#diff-0e811fa3416cd87d9a25b4fb680890098c69aa33ca4db4d347d4a10cc41e0eb3L30-R30) [[3]](diffhunk://#diff-05925b4469eb63ab854cc9891f088f570fa3822cdaeb4de109e0b1b9ab5038a7R21) [[4]](diffhunk://#diff-a9708dc28f15bb9cf665417e6c66601f9e8e2f1f672d1858603b74fa879a3357R13) [[5]](diffhunk://#diff-a977c33299f20a626cf650b2b6f0a49ef8fad7c97be21a5618e600b588b14b15R83) [[6]](diffhunk://#diff-4b0ddf8d1e85f4b4f1067f8d1d3e6b4d48785b3675c7202bf49bfbb1079d682fR14) [[7]](diffhunk://#diff-66249d5a8ed995b0a8e22c6354d6b270c5feeb982cb79a28f7c1b929700e89f4L8-R12) [[8]](diffhunk://#diff-244d33d1e8c30e765556011a4d3b76509f61433a346ba12ffc3115144e895aedR33) [[9]](diffhunk://#diff-2bcf3336ff64bfef786e6126813db46040b93628cab5faff3f0f5ed2cb077bf2L16-R24) [[10]](diffhunk://#diff-51ddab022797064be44ca38c87a56c6e87cd69444f4c6151a11b7f0141aef2b9R21) [[11]](diffhunk://#diff-133d8c7492ee2e7536328c8391545610750774e43d128d258380cb6787bb9e93L22-R22) [[12]](diffhunk://#diff-a58427e02fb5b05d26e03e8c2d216e5ae379d82084fd14bf77ea127b5505a43cL18-R18) [[13]](diffhunk://#diff-d183f2afc51d6a59bc70094e8f476d2468c45e415500f6eb60abad955e065156R22-R24) [[14]](diffhunk://#diff-85cc98d31eb8056e082ebdfbf2979aaa046ffc08bbacd4a65a31795b51998645R10-R12) [[15]](diffhunk://#diff-d2a11602a79e83305208472f1abe6a4106f02ce62a7f9524007181813863fcf6R10) ### Application of `skip_if_rocm` decorator: * Applied `@skip_if_rocm("ROCm development in progress")` to multiple test functions to skip them when running on ROCm. (`test/dtypes/test_affine_quantized.py`, `test/dtypes/test_floatx.py`, `test/float8/test_base.py`, `test/hqq/test_hqq_affine.py`, `test/integration/test_integration.py`, `test/kernel/test_galore_downproj.py`, `test/prototype/test_awq.py`, `test/prototype/test_low_bit_optim.py`, `test/prototype/test_splitk.py`, `test/quantization/test_galore_quant.py`, `test/quantization/test_marlin_qqq.py`, `test/sparsity/test_marlin.py`) [[1]](diffhunk://#diff-31b1ffcd78674b79cc65749176354ea4743683070120034709c1da7a3eac31f6R93) [[2]](diffhunk://#diff-31b1ffcd78674b79cc65749176354ea4743683070120034709c1da7a3eac31f6R173) [[3]](diffhunk://#diff-31b1ffcd78674b79cc65749176354ea4743683070120034709c1da7a3eac31f6R186) [[4]](diffhunk://#diff-0e811fa3416cd87d9a25b4fb680890098c69aa33ca4db4d347d4a10cc41e0eb3R111) [[5]](diffhunk://#diff-05925b4469eb63ab854cc9891f088f570fa3822cdaeb4de109e0b1b9ab5038a7R427) [[6]](diffhunk://#diff-a9708dc28f15bb9cf665417e6c66601f9e8e2f1f672d1858603b74fa879a3357R114) [[7]](diffhunk://#diff-a977c33299f20a626cf650b2b6f0a49ef8fad7c97be21a5618e600b588b14b15R571) [[8]](diffhunk://#diff-a977c33299f20a626cf650b2b6f0a49ef8fad7c97be21a5618e600b588b14b15R690) [[9]](diffhunk://#diff-a977c33299f20a626cf650b2b6f0a49ef8fad7c97be21a5618e600b588b14b15R710) [[10]](diffhunk://#diff-a977c33299f20a626cf650b2b6f0a49ef8fad7c97be21a5618e600b588b14b15R904) [[11]](diffhunk://#diff-a977c33299f20a626cf650b2b6f0a49ef8fad7c97be21a5618e600b588b14b15R924) [[12]](diffhunk://#diff-4b0ddf8d1e85f4b4f1067f8d1d3e6b4d48785b3675c7202bf49bfbb1079d682fR33) [[13]](diffhunk://#diff-66249d5a8ed995b0a8e22c6354d6b270c5feeb982cb79a28f7c1b929700e89f4R120) [[14]](diffhunk://#diff-244d33d1e8c30e765556011a4d3b76509f61433a346ba12ffc3115144e895aedR116) [[15]](diffhunk://#diff-2bcf3336ff64bfef786e6126813db46040b93628cab5faff3f0f5ed2cb077bf2L16-R24) [[16]](diffhunk://#diff-51ddab022797064be44ca38c87a56c6e87cd69444f4c6151a11b7f0141aef2b9R86) [[17]](diffhunk://#diff-133d8c7492ee2e7536328c8391545610750774e43d128d258380cb6787bb9e93R48) [[18]](diffhunk://#diff-133d8c7492ee2e7536328c8391545610750774e43d128d258380cb6787bb9e93R70) [[19]](diffhunk://#diff-a58427e02fb5b05d26e03e8c2d216e5ae379d82084fd14bf77ea127b5505a43cR40) [[20]](diffhunk://#diff-a58427e02fb5b05d26e03e8c2d216e5ae379d82084fd14bf77ea127b5505a43cL51-R58) ### Module-level skips for ROCm: * Added module-level skips for ROCm in specific test files to skip all tests within the module if ROCm is detected. (`test/test_ops.py`, `test/test_s8s4_linear_cutlass.py`) [[1]](diffhunk://#diff-d183f2afc51d6a59bc70094e8f476d2468c45e415500f6eb60abad955e065156R22-R24) [[2]](diffhunk://#diff-85cc98d31eb8056e082ebdfbf2979aaa046ffc08bbacd4a65a31795b51998645R10-R12)
This PR to skip the unit test failures for ROCm + infra changes to enable ROCm CI.
NOTE:
This PR aims to enable the ROCm CI testing for torchao only for pushes to main branch. The ROCm tests should start showing up here once this PR is merged: https://hud.pytorch.org/hud/pytorch/ao/main/1?per_page=50&name_filter=regression
Torchao PRs can also trigger the ROCm CI runs using the
ciflow/rocmPR label (#1749).Enabling ROCm CI testing on all torchao PRs will be done in a follow-up PR.
This pull request introduces the
skip_if_rocmdecorator across various test files to skip tests that are not yet supported on ROCm. The changes ensure that tests are conditionally skipped if ROCm is detected, improving the test suite's compatibility with different environments.Key changes include:
Cherry-pick ROCm CI infra changes from #999
Configure workflow to trigger ROCm CI only for pushes to main branch, OR on PRs with the
ciflow/rocmlabelIntroduction of
skip_if_rocmdecorator:skip_if_rocmimport in multiple test files to conditionally skip tests not supported on ROCm. (test/dtypes/test_affine_quantized.py,test/dtypes/test_floatx.py,test/float8/test_base.py,test/hqq/test_hqq_affine.py,test/integration/test_integration.py,test/kernel/test_galore_downproj.py,test/prototype/test_awq.py,test/prototype/test_low_bit_optim.py,test/prototype/test_splitk.py,test/quantization/test_galore_quant.py,test/quantization/test_marlin_qqq.py,test/sparsity/test_marlin.py,test/test_ops.py,test/test_s8s4_linear_cutlass.py,torchao/utils.py) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]Application of
skip_if_rocmdecorator:@skip_if_rocm("ROCm development in progress")to multiple test functions to skip them when running on ROCm. (test/dtypes/test_affine_quantized.py,test/dtypes/test_floatx.py,test/float8/test_base.py,test/hqq/test_hqq_affine.py,test/integration/test_integration.py,test/kernel/test_galore_downproj.py,test/prototype/test_awq.py,test/prototype/test_low_bit_optim.py,test/prototype/test_splitk.py,test/quantization/test_galore_quant.py,test/quantization/test_marlin_qqq.py,test/sparsity/test_marlin.py) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]Module-level skips for ROCm:
test/test_ops.py,test/test_s8s4_linear_cutlass.py) [1] [2]