[FIX] RayJob doc test flaky by machichima · Pull Request #52170 · ray-project/ray

machichima · 2025-04-09T11:17:29Z

Why are these changes needed?

As mentioned here: #51756 (comment), the doc test for rayjob is flaky.

Sort get pods output with create time to prevent flaky. Ensure the doc test passed locally twice.

Doc link: https://anyscale-ray--52170.com.readthedocs.build/en/52170/cluster/kubernetes/getting-started/rayjob-quick-start.html

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

-e Signed-off-by: machichima <nary12321@gmail.com>

kevin85421 · 2025-04-09T15:39:59Z

Maybe next time, @MortalHappiness, you could ask contributors to run the tests ten times before approving the PR, in case some of us are not familiar with the doc tests and might miss something during review.

MortalHappiness · 2025-04-09T15:44:22Z

@kevin85421 Okay no problem. @machichima Could you post your screenshot of running ten times? Thanks.

kevin85421 · 2025-04-09T15:44:45Z

doc/source/cluster/kubernetes/getting-started/rayjob-quick-start.ipynb

    "# Step 4.3: List all Pods in the `default` namespace.\n",
    "# The Pod created by the Kubernetes Job will be terminated after the Kubernetes Job finishes.\n",
-    "kubectl get pods --sort-by=.metadata.name"
+    "kubectl get pods --sort-by='.metadata.creationTimestamp'"


@machichima, can you help me understand where the logic for validating this cell's output is located? Thanks!

I think it's because the last 5 characters are randomized, so if we sort by name, sometimes rayjob-sample-xxxxx comes before rayjob-sample-raycluster-yyyyy-head-zzzzz.

Oh, make sense.

Thank you for the explanation!

kevin85421 · 2025-04-09T16:50:16Z

@kevin85421 Okay no problem. @machichima Could you post your screenshot of running ten times? Thanks.

Not necessarily for this PR. The test is flaky in CI and blocks multiple PRs. My point is that we can improve the review process by requesting contributors to run the tests multiple times locally to avoid this situation in the future.

Thank @MortalHappiness for the review and @machichima for the quick fix!

MortalHappiness · 2025-04-10T10:34:06Z

@machichima Seems like still flaky.

https://buildkite.com/ray-project/premerge/builds/38127#01961f05-1374-4806-b1b8-1f2392d8fb77
https://buildkite.com/ray-project/premerge/builds/38127#01961f10-580e-40be-893a-108a631097e4

machichima · 2025-04-10T11:04:41Z

@machichima Seems like still flaky.

I will run more times locally and see if I can spot the reason for this flaky behaviour

machichima · 2025-04-10T12:38:16Z

Hi @MortalHappiness

I tried multiple times (7 times) and they all passed...

I don't actually know what might be the case, As the CI log seems to logging more info then I did. I am wondering if it might caused by using different version of something?

Attached my running log here: logging.log

UPDATE: I got kubectl version v1.32.3 while CI use version v1.28.4. Let me downgrade and try again

UPDATE: I use versions same as CI

kubectl version v1.28.4
python version 3.10.16
pytest-7.4.4
pluggy-1.3.0

and tried 7 times locally and they all passed. I couldn't reproduce the flaky behaviour locally. New logging is here: logging-new.log

Signed-off-by: Steve Han <stevehan2001@gmail.com>

fix: sort get pods with create time to prevent flaky

45e1bef

-e Signed-off-by: machichima <nary12321@gmail.com>

machichima requested review from a team, kevin85421 and pcmoritz as code owners April 9, 2025 11:17

MortalHappiness approved these changes Apr 9, 2025

View reviewed changes

MortalHappiness added the go add ONLY when ready to merge, run all tests label Apr 9, 2025

kevin85421 reviewed Apr 9, 2025

View reviewed changes

kevin85421 approved these changes Apr 9, 2025

View reviewed changes

jjyao merged commit 78b5967 into ray-project:master Apr 9, 2025
5 checks passed

han-steve pushed a commit to han-steve/ray that referenced this pull request Apr 11, 2025

[FIX] RayJob doc test flaky (ray-project#52170)

841e2d4

Signed-off-by: Steve Han <stevehan2001@gmail.com>

hainesmichaelc added the community-backlog label May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] RayJob doc test flaky#52170

[FIX] RayJob doc test flaky#52170
jjyao merged 1 commit intoray-project:masterfrom
machichima:fix-rayjob-flaky-test

machichima commented Apr 9, 2025 •

edited

Loading

Uh oh!

kevin85421 commented Apr 9, 2025

Uh oh!

MortalHappiness commented Apr 9, 2025 •

edited

Loading

Uh oh!

kevin85421 Apr 9, 2025

Uh oh!

MortalHappiness Apr 9, 2025

Uh oh!

kevin85421 Apr 9, 2025

Uh oh!

kevin85421 Apr 9, 2025

Uh oh!

kevin85421 commented Apr 9, 2025

Uh oh!

Uh oh!

MortalHappiness commented Apr 10, 2025

Uh oh!

machichima commented Apr 10, 2025

Uh oh!

machichima commented Apr 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

machichima commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

kevin85421 commented Apr 9, 2025

Uh oh!

MortalHappiness commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevin85421 Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

MortalHappiness Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kevin85421 Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kevin85421 Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kevin85421 commented Apr 9, 2025

Uh oh!

Uh oh!

MortalHappiness commented Apr 10, 2025

Uh oh!

machichima commented Apr 10, 2025

Uh oh!

machichima commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

machichima commented Apr 9, 2025 •

edited

Loading

MortalHappiness commented Apr 9, 2025 •

edited

Loading

machichima commented Apr 10, 2025 •

edited

Loading