Dump cluster state on all test failures#5674
Conversation
gjoseph92
left a comment
There was a problem hiding this comment.
LGTM. I could see not wanting dumps from xfails in the future, but I assume you're thinking that at the moment, there could be useful information in there?
Not really - it's just not straightforward to detect xfails from where I'm dumping and there's very little if any harm done by dumping a few extra kilobytes |
Co-authored-by: Gabe Joseph <gjoseph92@gmail.com>
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @crusaderky! Will merge after CI finishes up
|
@jrbourbeau I'd like to verify if the trick to avoid dumping xfailed tests works first |
|
Sure, sounds good. FWIW I ran distributed/distributed/tests/test_scheduler.py Lines 1321 to 1335 in 7ebeda4 locally a |
|
Trying again |
|
I've updated a wealth of tests that require a parameterized gen_cluster. One of them was xfailed and was simply outputting "test.yaml" as a dump. Please review again. Known issues
|
| test() | ||
| await wait(fut) | ||
| assert s.tasks | ||
| assert all(ts.annotations == {"foo": "bar"} for ts in s.tasks.values()) |
There was a problem hiding this comment.
Fixed race condition where the test would potentially find an empty s.tasks dict
| config={"distributed.scheduler.work-stealing": False}, | ||
| ) | ||
| async def test(c, s, *workers): | ||
| async def test_decide_worker_coschedule_order_neighbors_(c, s, *workers): |
There was a problem hiding this comment.
Ensure unique dump file name
| ) | ||
| def test_balance(inp, expected): | ||
| async def test(*args, **kwargs): | ||
| async def test_balance_(*args, **kwargs): |
There was a problem hiding this comment.
Ensure unique dump file name
| worker["address"] = worker["queue"].get(timeout=5) | ||
| except queue.Empty: | ||
| raise pytest.xfail.Exception("Worker failed to start in test") | ||
| pytest.xfail("Worker failed to start in test") |
There was a problem hiding this comment.
they're the same - this looks cleaner imho
|
Test output and artifacts inspected - ready for final review and merge |
| while "x" in a.tasks: | ||
| await asyncio.sleep(0.01) |
There was a problem hiding this comment.
I'm curious why this addition was needed
There was a problem hiding this comment.
Because nothing tests for the existence of the release_key method in the plugin until a key is released
commit 14bddba1482de035ca416368f918d94b115d3660 Merge: df079fc 9a66b71 Author: crusaderky <crusaderky@gmail.com> Date: Tue Jan 25 17:16:58 2022 +0000 Merge branch 'main' into AMM/RetireWorker commit df079fc Merge: 9676934 5835ce3 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 15:08:51 2022 +0000 Merge branch 'AMM/staging' into AMM/RetireWorker commit 5835ce3 Merge: ee68b02 b0b8e95 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 15:07:27 2022 +0000 Merge branch 'AMM/test_close_gracefully' into AMM/staging commit ee68b02 Merge: 7d4e6ee ccad288 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 15:06:50 2022 +0000 Merge branch 'main' into AMM/staging commit b0b8e95 Author: crusaderky <crusaderky@gmail.com> Date: Mon Jan 24 14:32:29 2022 +0000 Code review commit 9676934 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:50:53 2022 +0000 AMM to manage retire_workers() commit 7d4e6ee Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:39:41 2022 +0000 Fix flaky test_close_gracefully and test_lifetime (dask#5677) commit 7faab51 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:39:41 2022 +0000 harden test commit aef3b71 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 14:29:10 2022 +0000 Increase resilience on slow CI commit af84e40 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:39:18 2022 +0000 Dump cluster state on all test failures (dask#5674) commit 5054c19 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:38:38 2022 +0000 Paused workers shouldn't steal tasks (dask#5665) commit eadb35f Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:37:48 2022 +0000 transition flight to missing if no who_has (dask#5653) commit 581aee8 Merge: 940bb45 60c0d60 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:36:09 2022 +0000 Merge branch 'main' into AMM/test_close_gracefully commit 940bb45 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:20:10 2022 +0000 tweak comment commit 731d132 Author: crusaderky <crusaderky@gmail.com> Date: Fri Jan 21 12:12:03 2022 +0000 Fix flaky test_close_gracefully and test_lifetime
Change
@gen_clusterto perform a state dump on all test failures (notably, including xfails - this should be negligible in terms of extra cost) instead of just timeouts.Shorten the stack trace in case of
@gen_clustertimeout.