Skip to content

Enable target-determination (TD) for ROCm CI#156545

Closed
jithunnair-amd wants to merge 7 commits intomainfrom
enable_td_rocm
Closed

Enable target-determination (TD) for ROCm CI#156545
jithunnair-amd wants to merge 7 commits intomainfrom
enable_td_rocm

Conversation

@jithunnair-amd
Copy link
Collaborator

@jithunnair-amd jithunnair-amd commented Jun 21, 2025

Target determination sorts the tests in a PR CI run based on heuristics about which tests are more relevant to the PR's changes. This can help provide faster CI signal as well as help alleviate capacity concerns as job durations should decrease due to catching failures earlier.

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156545

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures, 23 Unrelated Failures

As of commit e1ed5e2 with merge base 1cfdcb9 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Jun 21, 2025
@jithunnair-amd jithunnair-amd added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 23, 2025
@jithunnair-amd jithunnair-amd added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300/MI325 labels Jul 3, 2025
@jithunnair-amd
Copy link
Collaborator Author

jithunnair-amd commented Jul 5, 2025

Avg duration of each rocm-mi300 shard went down from 2.1h to 1.5h
Avg duration of each rocm shard went down from 2h to 1.2h

@jithunnair-amd jithunnair-amd marked this pull request as ready for review July 8, 2025 06:25
@jithunnair-amd jithunnair-amd requested a review from a team as a code owner July 8, 2025 06:25
@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot merge -f "ROCm jobs ran with TD enabled. Other failures not related to this PR"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the enable_td_rocm branch August 8, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300/MI325 ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants