Skip to content

Mitigate some flaky tests in trunk#157756

Closed
huydhn wants to merge 2 commits intopytorch:mainfrom
huydhn:disable-some-flaky-tests
Closed

Mitigate some flaky tests in trunk#157756
huydhn wants to merge 2 commits intopytorch:mainfrom
huydhn:disable-some-flaky-tests

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 8, 2025

(not really fix these issues, but we should be able to close them. This also allows CI from the PR to test them)

Fixes #156579
Fixes #156580
Fixes #126867

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

(not really fix these issues, but we should be able to close them and
also to test them on CI)
Fixes pytorch#156579
Fixes pytorch#156580
Fixes pytorch#126867

Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn requested a review from clee2000 July 8, 2025 00:42
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157756

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs, 2 Pending

As of commit 5be36ec with merge base ae1094b (image):

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Huy Do <huydhn@gmail.com>
torch._dynamo.utils.clear_compilation_metrics()

# https://github.com/pytorch/pytorch/issues/156580
@serialTest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why this fixes the test? Nothing immediately stands out to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I try different way to see if I could reproduce the failure locally, but I couldn't do it pytest -v test/dynamo/test_repros.py -k test_dont_dce_rand --flake-finder. My theory is that this running this test in parallel is to blame, so this is actually a test. I will need to wait till CI finishes to confirm if this fixes the flaky issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huydhn
Copy link
Contributor Author

huydhn commented Jul 8, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 8, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: inductor-rocm / rocm-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2), inductor-rocm / rocm-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@huydhn
Copy link
Contributor Author

huydhn commented Jul 8, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: inductor-rocm / rocm-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2), inductor-rocm / rocm-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor Author

huydhn commented Jul 8, 2025

@pytorchbot merge -f 'Some remaining ROCm jobs, should be fine'

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn huydhn deleted the disable-some-flaky-tests branch July 8, 2025 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants