Skip to content

Flaky test_adapt_then_manual: race condition in SpecCluster #7079

@crusaderky

Description

@crusaderky

test_adapt_then_manual is midly flaky. It looks like a race condition in the tested code.
https://github.com/dask/distributed/actions/runs/3143480989/jobs/5108282616

There are two separate tracebacks in the failed test.

2022-09-28 13:18:40,941 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x00000284829BC0A0>>, <Task finished name='Task-38921' coro=<SpecCluster._correct_state_internal() done, defined at d:\a\distributed\distributed\distributed\deploy\spec.py:330> exception=KeyError(2)>)
Traceback (most recent call last):
  File "C:\Miniconda3\envs\dask-distributed\lib\site-packages\tornado\ioloop.py", line 741, in _run_callback
    ret = callback()
  File "C:\Miniconda3\envs\dask-distributed\lib\site-packages\tornado\ioloop.py", line 765, in _discard_future_result
    future.result()
  File "d:\a\distributed\distributed\distributed\deploy\spec.py", line 351, in _correct_state_internal
    d = self.worker_spec[name]
KeyError: 2
distributed\deploy\spec.py:437: AssertionError

    async def _close(self):
        [...]
            for w in self._created:
>               assert w.status in {
                    Status.closing,
                    Status.closed,
                    Status.failed,
                }, w.status
E               AssertionError: Status.init

Metadata

Metadata

Assignees

No one assigned

    Labels

    flaky testIntermittent failures on CI.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions