Fix a deadlock when queued tasks are resubmitted quickly in succession#7348
Fix a deadlock when queued tasks are resubmitted quickly in succession#7348
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 29m 15s ⏱️ + 14m 34s For more details on these failures, see this check. Results for commit 9026d47. ± Comparison against base commit cff33d5. ♻️ This comment has been updated with latest results. |
hendrikmakait
left a comment
There was a problem hiding this comment.
LGTM, great job on the description of what the test is doing. Thanks, @fjetter!
hendrikmakait
left a comment
There was a problem hiding this comment.
The new test fails with queuing turned off.
ahhh.. thanks, that makes sense. I forgot about the additional CI config and was already concerned |
Closes #7200
There is a race condition if
task-finishesandfree-keysare submitted concurrently.Queued tasks could end up being transitioned to memory which is wrong because shortly after this the worker will have forgotten the data already.
Sending another free-keys in this situation is not absolutely necessary but safe since the scheduler guarantees ordering of messages to a worker, i.e. if the task is in queued there is no other worker supposed to have or compute this task until it is transitioned out of this state. The free-keys is just there for good measure and will be handled on worker side gracefully.
I added a test for processing as well to be on the safe side. This isn't asserted but the worker just computes the task twice, as it should from our "release and compute task" semantics.
The test is a bit involved due to how we define rootishness but I added hopefully sufficient commentary.