-
-
Notifications
You must be signed in to change notification settings - Fork 756
Closed
Labels
deadlockThe cluster appears to not make any progressThe cluster appears to not make any progress
Description
This was found when running distributed/tests/test_stress.py::test_chaos_rechunk
Traceback (most recent call last):
File "/home/mrocklin/workspace/distributed/distributed/utils.py", line 759, in wrapper
return await func(*args, **kwargs)
File "/home/mrocklin/workspace/distributed/distributed/worker.py", line 3090, in gather_dep
ts = self.tasks[d]
KeyError: "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)"Story
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'ensure-task-exists', 'released', 'compute-task-1650891186.0628088', 1650891186.080167)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'released', 'fetch', 'fetch', {}, 'compute-task-1650891186.0628088', 1650891186.0802405)
('gather-dependencies', 'tcp://127.0.0.1:44059', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1018)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1014)"}, 'ensure-communicating-1650891186.0806584', 1650891186.0826457)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'fetch', 'flight', 'flight', {}, 'ensure-communicating-1650891186.0806584', 1650891186.0826836)
('request-dep', 'tcp://127.0.0.1:44059', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1018)", "('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1014)"}, 'ensure-communicating-1650891186.0806584', 1650891186.0840328)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'flight', 'released', 'cancelled', {}, 'processing-released-1650891186.0754924', 1650891186.16295)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'compute-task', 'compute-task-1650891186.392905', 1650891186.4060445)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'cancelled', 'waiting', 'cancelled', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)": ('resumed', 'waiting')}, 'compute-task-1650891186.392905', 1650891186.40608)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'cancelled', 'resumed', 'resumed', {}, 'compute-task-1650891186.392905', 1650891186.406102)
('free-keys', ("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)",), 'processing-released-1650891188.465502', 1650891188.5917683)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'release-key', 'processing-released-1650891188.465502', 1650891188.5917764)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'resumed', 'released', 'released', {"('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)": 'forgotten'}, 'processing-released-1650891188.465502', 1650891188.5917976)
("('rechunk-split-82117560f6f829a7fa07bfef62cff7d5', 1006)", 'released', 'forgotten', 'forgotten', {}, 'processing-released-1650891188.465502', 1650891188.591806)
I'm not surprised that there would be a missing key, however I am surprised that we're not catching this in a more graceful way.
cc @fjetter
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
deadlockThe cluster appears to not make any progressThe cluster appears to not make any progress