Skip to content

InvalidTransition: Impossible transition from memory to missing #6125

@mrocklin

Description

@mrocklin
import coiled
coiled.create_software_environment(
    name="coiled-runtime-chaos",
    conda={"channels": ["coiled", "conda-forge"], "dependencies": ["coiled-runtime", "coiled=0.0.73"]},
    pip=["git+https://github.com/mrocklin/distributed@chaos"],
)

from coiled._beta import ClusterBeta as Cluster
import dask
from dask.distributed import Client

cluster = Cluster(
    software="coiled-runtime-chaos",
    n_workers=10,
    worker_vm_types=["m5.large"],
    scheduler_vm_types=["m5.large"],
    shutdown_on_close=False,
    name="play",
)
client = Client(cluster)

from distributed.chaos import KillWorker
plugin = KillWorker(delay="10 s", mode="sys.exit")
client.register_worker_plugin(plugin, name="kill")

import dask.array as da
x = da.random.random((50000, 50000))
x.rechunk((50000, 20)).rechunk((20, 50000)).sum().compute()
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]: Traceback (most recent call last):
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:   File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/utils.py", line 693, in log_errors
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:     yield
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:   File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 3094, in gather_dep
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:     self.transitions(recommendations, stimulus_id=stimulus_id)
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:   File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2607, in transitions
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:     a_recs, a_instructions = self._transition(
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:   File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 2543, in _transition
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]:     raise InvalidTransition(
Apr 14 01:42:23 ip-10-4-8-2 cloud-init[989]: distributed.worker_state_machine.InvalidTransition: Impossible transition from memory to missing for ('rechunk-split-a73e77c2dac2f625e22767d7c04cbe17', 1754)

cc @fjetter @gjoseph92

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions