Set runspec on all new tasks to avoid deadlocks#4432
Conversation
|
I notice that this is still in WIP status. Are the changes here an improvement over what is in master? If so, I might press the "Ready for review" button |
|
I'll note that the failures in |
Also, check that we don't release keys with dependents
|
I'm currently testing the latest commit on a few clusters in parallel. So far, no funny logs, no deadlocks, no problems at all. |
|
The only thing I could observe, so far is that the waiting_for_data_count is off but I checked the workers, both data and tasks are empty and I suspected a bug in this counter already. I believe this is not critical, though, since the counter is not used internally, merely for instrumentation (The following graph shows states as exposed by prometheus. Not identical to internal states) |
|
That's great news
…On Mon, Jan 18, 2021 at 8:49 AM Florian Jetter ***@***.***> wrote:
The only thing I could observe, so far is that the waiting_for_data_count
is off but I checked the workers, both data and tasks are empty and I
suspected a bug in this counter already. I believe this is not critical,
though, since the counter is not used internally, merely for instrumentation
[image: image]
<https://user-images.githubusercontent.com/8629629/104942900-48302700-59b5-11eb-9c62-2d5690ee3537.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4432 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDWI65NH6CXLLSSXCTS2RRAFANCNFSM4WEK5ONQ>
.
|
975caac to
1fc1b46
Compare
|
Needed to push another time. There was actually a typo on my side causing a test to fail. |
|
Unfortunately, I needed to push another small change but an expected one where we didn't clear I'm dropping out now but if there is no big CI thing popping up again, I feel confident merging this now. I'm not sure if it fixes all issues but it should provide a significant improvement to master from what I've seen. |
|
@fjetter are there other issues that you know exist? If so I'm happy to take a look at them today. |
|
On the other PR I originally wanted to fix an issue with erroneous tasks which led me to discover this runspec thing. I added a test case with an object with broken deserialization. That should hit the worker internal suspicious counter (but it doesn't/didn't) something else might reveal itself there. Otherwise I don't have anything on my mind right now |
|
OK, once tests pass here I'm going to merge. If things are ok for a bit I suggest that we release. cc @jrbourbeau |
|
I briefly tried things with the exercise in #4430 and things didn't complete. I need to double check that work though. |
|
Ah, indeed, things with that workflow are good for me too as well now. Good for a release whenever makes sense @jrbourbeau . I suspect that you're aiming for this Friday. |
|
xref dask/community#121 |

This is a more minimalistic version of #4360 with fixes I am pretty confident with. I still can see local issue, though and this is incomplete.