Skip to content

Fix a deadlock when queued tasks are resubmitted quickly in succession#7348

Merged
fjetter merged 4 commits intodask:mainfrom
fjetter:deadlock_queued
Nov 25, 2022
Merged

Fix a deadlock when queued tasks are resubmitted quickly in succession#7348
fjetter merged 4 commits intodask:mainfrom
fjetter:deadlock_queued

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Nov 24, 2022

Closes #7200

There is a race condition if task-finishes and free-keys are submitted concurrently.

Queued tasks could end up being transitioned to memory which is wrong because shortly after this the worker will have forgotten the data already.

Sending another free-keys in this situation is not absolutely necessary but safe since the scheduler guarantees ordering of messages to a worker, i.e. if the task is in queued there is no other worker supposed to have or compute this task until it is transitioned out of this state. The free-keys is just there for good measure and will be handled on worker side gracefully.

I added a test for processing as well to be on the safe side. This isn't asserted but the worker just computes the task twice, as it should from our "release and compute task" semantics.

The test is a bit involved due to how we define rootishness but I added hopefully sufficient commentary.

@fjetter fjetter requested review from crusaderky and hendrikmakait and removed request for crusaderky November 24, 2022 17:34
@fjetter fjetter self-assigned this Nov 24, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±  0         15 suites  ±0   6h 29m 15s ⏱️ + 14m 34s
  3 228 tests +  2    3 140 ✔️ ±  0    83 💤 ±0  5 +2 
23 868 runs  +16  22 954 ✔️ +14  909 💤 +1  5 +1 

For more details on these failures, see this check.

Results for commit 9026d47. ± Comparison against base commit cff33d5.

♻️ This comment has been updated with latest results.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job on the description of what the test is doing. Thanks, @fjetter!

@hendrikmakait hendrikmakait self-requested a review November 25, 2022 10:41
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test fails with queuing turned off.

@fjetter
Copy link
Member Author

fjetter commented Nov 25, 2022

The new test fails with queuing turned off.

ahhh.. thanks, that makes sense. I forgot about the additional CI config and was already concerned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transition queued->memory causes AssertionError

2 participants