Skip to content

[ROCm] upgrade ROCm CI builds to py3.10#134108

Closed
jataylo wants to merge 1 commit intopytorch:mainfrom
ROCm:rocm-py310-upgrade
Closed

[ROCm] upgrade ROCm CI builds to py3.10#134108
jataylo wants to merge 1 commit intopytorch:mainfrom
ROCm:rocm-py310-upgrade

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Aug 21, 2024

@jataylo jataylo added rocm This tag is for PRs from ROCm team ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Aug 21, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134108

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit 9e85c89 with merge base c64ae60 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jataylo jataylo requested a review from malfet August 22, 2024 09:10
@jataylo jataylo marked this pull request as ready for review August 22, 2024 09:10
@jataylo jataylo requested a review from a team as a code owner August 22, 2024 09:10
@jataylo
Copy link
Collaborator Author

jataylo commented Aug 22, 2024

I don't believe any of these failures are new and the b2b-gemm tests are passing:

2024-08-21T21:58:17.8609540Z Running inductor/test_b2b_gemm 1/1 ... [2024-08-21 21:58:17.860331]
2024-08-21T21:58:17.8610134Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2024-08-21T21:58:17.8615244Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_b2b_gemm.py', '-m', 'serial', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2024-08-21 21:58:17.860949]
2024-08-21T21:58:21.6348271Z 
2024-08-21T21:58:21.6350603Z inductor/test_b2b_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_b2b_gemm_1.1_56d822e46ac72b2e_.log
2024-08-21T21:58:21.6352427Z Running 0 items in this shard:

@jataylo jataylo added rocm priority high priority ROCm PRs from performance or other aspects keep-going Don't stop on first failure, keep running tests until the end labels Aug 22, 2024
@jataylo
Copy link
Collaborator Author

jataylo commented Aug 22, 2024

@pytorchbot rebase

@jataylo jataylo added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Aug 22, 2024
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rocm-py310-upgrade onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm-py310-upgrade && git pull --rebase)

@jataylo
Copy link
Collaborator Author

jataylo commented Aug 23, 2024

@malfet could you take a look at this. Should resolve these b2b-gemm issues we pointed out related to py3.8

ROCm failures are expected, and the timeout is due to keep-going enable with the current test_transformers failures.

@pruthvistony
Copy link
Collaborator

Waiting for this PR #132895 to merge before this PR.

@pruthvistony pruthvistony marked this pull request as draft August 26, 2024 20:50
@jithunnair-amd
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rocm-py310-upgrade onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm-py310-upgrade && git pull --rebase)

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 12, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/134108/head returned non-zero exit code 1

Rebasing (1/3)
Auto-merging .ci/docker/build.sh
Auto-merging .github/workflows/inductor-rocm.yml
CONFLICT (content): Merge conflict in .github/workflows/inductor-rocm.yml
Auto-merging .github/workflows/periodic.yml
CONFLICT (content): Merge conflict in .github/workflows/periodic.yml
Auto-merging .github/workflows/pull.yml
CONFLICT (content): Merge conflict in .github/workflows/pull.yml
Auto-merging .github/workflows/rocm.yml
Auto-merging .github/workflows/slow.yml
CONFLICT (content): Merge conflict in .github/workflows/slow.yml
Auto-merging .github/workflows/trunk.yml
CONFLICT (content): Merge conflict in .github/workflows/trunk.yml
error: could not apply 6a0bdb730d... [ROCm] upgrade ROCm CI builds to py3.10
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 6a0bdb730d... [ROCm] upgrade ROCm CI builds to py3.10

Raised by https://github.com/pytorch/pytorch/actions/runs/10833692309

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 16, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rocm-py310-upgrade onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm-py310-upgrade && git pull --rebase)

@jataylo jataylo marked this pull request as ready for review September 16, 2024 16:26
@jataylo
Copy link
Collaborator Author

jataylo commented Sep 16, 2024

Failures seem unrelated. Opening for review

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 16, 2024

cc: @malfet

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jithunnair-amd
Copy link
Collaborator

lgtm. Please followup on periodic / linux-focal-rocm6.1-py3.10 / test (distributed, 3, 3, linux.rocm.gpu)

@atalman Yes, we're looking into the timeouts seen in the distributed job. However, it is not an issue related to the python version, so it shouldn't be a blocker for this PR.

@jithunnair-amd
Copy link
Collaborator

@pytorchbot merge -f "CI failures are unrelated to py3.10 upgrade"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
@kit1980
Copy link
Contributor

kit1980 commented Sep 25, 2024

@pytorchmergebot cherry-pick --onto release/2.5 -c critical

@pytorchbot
Copy link
Collaborator

Cherry picking #134108

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x bad69044d87b67a8b64691e4e5a7a68147185aa0 returned non-zero exit code 1

Auto-merging .ci/docker/build.sh
Auto-merging .github/workflows/inductor-rocm.yml
CONFLICT (content): Merge conflict in .github/workflows/inductor-rocm.yml
Auto-merging .github/workflows/periodic.yml
CONFLICT (content): Merge conflict in .github/workflows/periodic.yml
Auto-merging .github/workflows/pull.yml
CONFLICT (content): Merge conflict in .github/workflows/pull.yml
Auto-merging .github/workflows/rocm.yml
CONFLICT (content): Merge conflict in .github/workflows/rocm.yml
Auto-merging .github/workflows/slow.yml
CONFLICT (content): Merge conflict in .github/workflows/slow.yml
Auto-merging .github/workflows/trunk.yml
CONFLICT (content): Merge conflict in .github/workflows/trunk.yml
error: could not apply bad69044d8... [ROCm] upgrade ROCm CI builds to py3.10 (#134108)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

kit1980 added a commit that referenced this pull request Sep 26, 2024
Upgrade ROCm CI builds to py3.10

Pull Request resolved: #134108
Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/atalman

Co-authored-by: Jack Taylor <jack.taylor@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm keep-going Don't stop on first failure, keep running tests until the end Merged module: rocm AMD GPU support for Pytorch open source rocm priority high priority ROCm PRs from performance or other aspects rocm This tag is for PRs from ROCm team topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants