CI job running tests with queuing on#6989
Conversation
we have to make the tasks to be stolen not look like root tasks.
There are a lot more race conditions with queuing
these should not be failing and we need to evaluate why they are
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 5h 53m 29s ⏱️ - 16m 4s For more details on these failures, see this check. Results for commit 33e4995. ± Comparison against base commit 1818788. ♻️ This comment has been updated with latest results. |
crusaderky
left a comment
There was a problem hiding this comment.
Instead of this new oversaturate_only mark, couldn't you simply have
OVERSATURATION = math.isinf(dask.config.get("distributed.scheduler.worker-saturation"))
@pytest.mark.flaky(not OVERSATURATION, reason="flaky on MacOS")Tests that don't make sense with queueing should instead force
config={"distributed.scheduler.worker-saturation": math.inf}(some already do)
.github/workflows/tests.yaml
Outdated
| env: | ||
| TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }} | ||
| # TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}-${{ matrix.run }} | ||
| TEST_ID: ${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.partition-label }}-${{ matrix.queuing }} |
There was a problem hiding this comment.
I think this renaming will break some logic in our test report generation.
distributed/continuous_integration/scripts/test_report.py
Lines 146 to 152 in acf6078
distributed/continuous_integration/scripts/test_report.py
Lines 353 to 357 in acf6078
I introduced this in #6837
Worst case, we remove the more specific job URL again (we knew from the start this is very brittle)
There was a problem hiding this comment.
There was a problem hiding this comment.
I don't think this should block this PR
Pulled out from dask#6989. This minor refactor makes it easier to add other config options in the future. It also ensure that the `ws` marker is added even when `--runslow` is given.
This reverts commit 0f9034a.
GitHub didn't like setting `partition-label` in two different ways; it showed up in two different places in the job name and breaks the test report script. Instead, we can set the `TEST_ID` variable as a step in the job, which is cleaner anyway IMO.
Ready for re-review and merge. |
|
Thank you! |
|
Just to confirm, is the new CI job added here for testing purposes and is planned to be removed in the future, or is this intended as a long-term change? |
|
Long term, we'll want to enable scheduler-side queueing by default. |

Adds a CI job, only for Ubuntu and py3.9, running the test suite with
worker-saturation: 1.0(instead ofinf).Also adds a
@pytest.mark.oversaturate_onlymarker, which automatically skips marked tests if queuing is enabled.A surprisingly small number of tests actually failed when queuing was turned on. I've updated the ones that did as needed. I've tried as much as possible to just make the tests agnostic to queuing. In a few cases though, I've needed to add explicit conditional logic. And in a couple cases, I've marked tests to be skipped when queuing is enabled, since they may not make sense.
Unfortunately there are three tests that are flaky under queuing, which could indicate an actual bug. They're skipped for now, but should be investigated.
Closes #6631
pre-commit run --all-files