[CI] Add inductor workflow for rocm#110544
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110544
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (5 Unrelated Failures)As of commit 1db9e19 with merge base 0249c4a ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "ciflow/inductor" |
b60aef6 to
e8aaa65
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
e8aaa65 to
e26d2f7
Compare
…107760) This PR adds a skip decorator which will disable tests in CI for ROCm inductor workflow. This new workflow will be coming in via #110544 Pull Request resolved: #107760 Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/atalman
721b16e to
909146e
Compare
|
Waiting on #110511 to be merged before enabling this workflow so as to not overburden ROCm CI. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/6894323604 |
|
Saving the passing CI link before rebasing: https://hud.pytorch.org/pr/110544 |
Be aware that the link is dynamic; it will also update when you rebase :) Your best bet is to save the links to the specific jobs that ran eg. https://github.com/pytorch/pytorch/actions/runs/6541944809/job/17764595548. |
909146e to
b69a69c
Compare
|
Please help double check the duration of the new test job too. I'm waiting for the current run to finish https://github.com/pytorch/pytorch/actions/runs/6894570049/job/18757398102 to see how long it takes to finish. More shards might be needed if it takes more than 2 hours |
| #Set Default values for these variables in case they are not set | ||
| SHARD_NUMBER="${SHARD_NUMBER:=1}" | ||
| NUM_TEST_SHARDS="${NUM_TEST_SHARDS:=1}" |
There was a problem hiding this comment.
Why this is needed? I would prefer it to error out if SHARD_NUMBER is not defined
There was a problem hiding this comment.
In order to run inductor config, it seems that SHARD_NUMBER is required now. (https://github.com/pytorch/pytorch/pull/110544/files#diff-9709e5db13aeac90c0312b2b8da34b37cf51242ec789ca959fcf1ba295a8da7aR1101)
Although this variable is always defined in all the CI yaml files that trigger these tests, it's nice to be able to run this file locally outside of CI context.
@huydhn -- I have noticed that it takes around 1.5 hours on average for this job to run. Is that sufficient for one shard? |
9e076a3 to
044ecdf
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
4931a1f to
1db9e19
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR is to create a separate CI job for inductor UTs on ROCm. You will need to add
ciflow/inductortag on PRs to trigger this job. However, the job will run on its own on any commit merged in main. This job takes around 1.5 hours to run and it is run in parallel to other rocm jobs. It is run only on the MI210 CI runners to ensure maximum inductor functionality is tested.cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler