Skip to content

KeyError in gather_dep #6194

@mrocklin

Description

@mrocklin

This was found when running distributed/tests/test_stress.py::test_chaos_rechunk

Traceback (most recent call last):
  File "/home/mrocklin/workspace/distributed/distributed/utils.py", line 759, in wrapper
    return await func(*args, **kwargs)
  File "/home/mrocklin/workspace/distributed/distributed/worker.py", line 3090, in gather_dep
    ts = self.tasks[d]
KeyError: "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)"

Story

("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'ensure-task-exists', 'released', 'compute-task-1650891186.0628088', 1650891186.080167)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'released', 'fetch', 'fetch', {}, 'compute-task-1650891186.0628088', 1650891186.0802405)
('gather-dependencies', 'tcp://127.0.0.1:44059', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1018)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1014)"}, 'ensure-communicating-1650891186.0806584', 1650891186.0826457)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'fetch', 'flight', 'flight', {}, 'ensure-communicating-1650891186.0806584', 1650891186.0826836)
('request-dep', 'tcp://127.0.0.1:44059', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1018)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1014)"}, 'ensure-communicating-1650891186.0806584', 1650891186.0840328)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'flight', 'released', 'cancelled', {}, 'processing-released-1650891186.0754924', 1650891186.16295)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'compute-task', 'compute-task-1650891186.392905', 1650891186.4060445)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'cancelled', 'waiting', 'cancelled', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)": ('resumed', 'waiting')}, 'compute-task-1650891186.392905', 1650891186.40608)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'cancelled', 'resumed', 'resumed', {}, 'compute-task-1650891186.392905', 1650891186.406102)
('free-keys', ("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)",), 'processing-released-1650891188.465502', 1650891188.5917683)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'release-key', 'processing-released-1650891188.465502', 1650891188.5917764)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'resumed', 'released', 'released', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)": 'forgotten'}, 'processing-released-1650891188.465502', 1650891188.5917976)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'released', 'forgotten', 'forgotten', {}, 'processing-released-1650891188.465502', 1650891188.591806)

I'm not surprised that there would be a missing key, however I am surprised that we're not catching this in a more graceful way.

cc @fjetter

Metadata

Metadata

Assignees

Labels

deadlockThe cluster appears to not make any progress

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions