Skip to content

[ROCm] upgrade CI to 6.0#116270

Closed
jeffdaily wants to merge 533 commits intopytorch:mainfrom
ROCm:rocm60_ci
Closed

[ROCm] upgrade CI to 6.0#116270
jeffdaily wants to merge 533 commits intopytorch:mainfrom
ROCm:rocm60_ci

Conversation

@jeffdaily
Copy link
Copy Markdown
Collaborator

@jeffdaily jeffdaily commented Dec 21, 2023

@jeffdaily jeffdaily requested a review from a team as a code owner December 21, 2023 16:42
@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Dec 21, 2023
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Dec 21, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/116270

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures, 7 Unrelated Failures

As of commit b9f8059 with merge base f1aef2c (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 21, 2023
@johnnynunez
Copy link
Copy Markdown
Contributor

when?

@pruthvistony
Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased rocm60_ci onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm60_ci && git pull --rebase)

@malfet
Copy link
Copy Markdown
Contributor

malfet commented Dec 30, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased rocm60_ci onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm60_ci && git pull --rebase)

@jithunnair-amd jithunnair-amd added the keep-going Don't stop on first failure, keep running tests until the end label Jan 2, 2024
@jithunnair-amd
Copy link
Copy Markdown
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased rocm60_ci onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm60_ci && git pull --rebase)

Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though fact that lazy_init not longer works might negatively affect distributed workflows on ROCm platform.
cc: @wconstab

torch.addmm(s, m1, m2)

@unittest.skipIf(TEST_MULTIGPU, "Testing on one GPU is sufficient")
@skipIfRocm(msg="Skipped!!, since it is failing on ROCm 6.0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a regression which will affect distributed workflows pretty badly, but up to you guys

@jithunnair-amd jithunnair-amd added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Jan 3, 2024
@jithunnair-amd
Copy link
Copy Markdown
Collaborator

@jeffdaily We will also want to include inductor.yml changes to ROCm6.0 in this PR to keep it all together. There's a PR to add that workflow here: #110544

@jithunnair-amd jithunnair-amd added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request module: rocm AMD GPU support for Pytorch topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.