remove server close background task grace period#6633
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 22m 32s ⏱️ - 3h 16m 31s For more details on these failures, see this check. Results for commit 4a05368. ± Comparison against base commit 1bfd99c. ♻️ This comment has been updated with latest results. |
| async def background_stops(): | ||
| await asyncio.gather(*_stops) | ||
|
|
||
| self._ongoing_background_tasks.call_soon(background_stops) |
There was a problem hiding this comment.
Should we add a comment/warning here to highlight that these are likely to get cancelled due to the lack of a grace period? Alternatively, we could add a small grace period back in that would allow fast tasks to finish but have less of a performance impact? For example, 100 ms should result in ~5 % performance degradation, 20 ms in 1 %.
There was a problem hiding this comment.
I added a PendingDeprecationWarning to https://github.com/dask/distributed/pull/6633/files#r907345989
I have no idea who would implement this asynchronously. I think the deprecation warning there should be sufficient.
There was a problem hiding this comment.
I think a regular DeprecationWarning would be better, otherwise you have to go via Pend -> Deprecate -> Remove
| if inspect.isawaitable(future): | ||
| _stops.add(future) |
There was a problem hiding this comment.
IIUC you introduce this merely to ensure that there are no users outside that provide a listener with an async stop?
This smells like we should introduce a deprecation warning and get rid of this
There was a problem hiding this comment.
yep, it's tricky to work out where to put the deprecation, as right at server stop is a bit late.
There was a problem hiding this comment.
yep, it's tricky to work out where to put the deprecation, as right at server stop is a bit late.
Well, better than nothing. If anybody is using this, they'll see it in their CI then. We're also not in a rush to remove this so we can let the warning sit for a while
|
There appears to be a worker state related test failure with |
tasks were never started here
with no grace period sync stop methods on listeners will never be started.
d93c7b2 to
b9767ca
Compare
| await s.remove_worker(w1.address, stimulus_id="stim-id") | ||
|
|
||
| await wait_for_state(f3.key, "resumed", w2) | ||
| await asyncio.gather( |
There was a problem hiding this comment.
still failing intermittently https://github.com/dask/distributed/runs/7090644567?check_suite_focus=true#step:11:1844
| f"Expected all ongoing tasks to be cancelled and removed, found {self._ongoing_tasks}." | ||
| ) | ||
| if err is not None: | ||
| raise err |
There was a problem hiding this comment.
What's the reasoning behind calling task.cancel() multiple times?
There was a problem hiding this comment.
If the current task is cancelled and any child task suppresses the cancelation then the child tasks will leak from the task group
crusaderky
left a comment
There was a problem hiding this comment.
Besides one test that leaves me skeptical, LGTM
hendrikmakait
left a comment
There was a problem hiding this comment.
LGTM apart from one list without a clear purpose.
cd8ac6e to
4a05368
Compare
Closes #6632
a draft CI run to see what happens
pre-commit run --all-files