Failure to spill breaks available resources#6703
Conversation
| ExecuteSuccessEvent.dummy("x", None, stimulus_id="s1") | ||
| ) | ||
| assert instructions == [TaskErredMsg.match(key="x", stimulus_id="s1")] | ||
| assert ws.tasks["x"].state == "error" |
There was a problem hiding this comment.
Without the change to the WorkerState in this PR, this test was failing on teardown with available_resources={R: 2}.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 21m 43s ⏱️ + 1m 21s For more details on these failures, see this check. Results for commit e76dfb4. ± Comparison against base commit 6765e6e. |
e76dfb4 to
1bf1407
Compare
hendrikmakait
left a comment
There was a problem hiding this comment.
LGTM, I would like to see the nit regarding the test docs addresses, but feel free to skip if this adds too much overhead to getting this change in.
|
|
||
| @pytest.mark.xfail(reason="https://github.com/dask/distributed/issues/6705") | ||
| def test_workerstate_fail_to_pickle_flight(ws): | ||
| """Same as test_workerstate_fail_to_pickle_execute_1, but the task was |
There was a problem hiding this comment.
nit: I'm personally not a fan of these one-directional references to other tests. I'd suggest either making the reference bi-directional or copying the description of the performed test over to this docstring. From my experience, these on-directional references have a tendency to get out of sync as tests change over time.
A task finishes its computation successfully, returning an output that is individually larger than 60% of memory_limit.
The task is spilled immediately to disk; however it fails to pickle (this is not the same as OSError, which is handled transparently by the
SpillBuffer).The task is marked as having status=error.
This PR fixes a bug where, if the task was using resources, the resources are returned twice, causing available_resources to become higher than total_resources.