-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
When worker A calls gather_dep on an Actor task, it gets sent an Actor handle by worker B where the Actor is running. When that handle is deserialized on worker A, it gets a Client and creates a Future reference holding onto that Actor's key. The scheduler now notes that worker A's Client desires that key.
When the actual user's Client tries to release the Actor, the scheduler notes that worker A's Client still holds a reference to it, so it is not released.
More complex case:
A user submits a task where one of the dependencies is marked as an Actor, like:
with dask.annotate(workers=workers[0]):
counter = dask.delayed(Counter)()
with dask.annotate(workers=workers[1]):
intermediate = dask.delayed(lambda c: None)(counter)
with dask.annotate(workers=workers[0]):
final = dask.delayed(lambda x, c: x)(intermediate, counter)
final.compute(actors=counter, optimize_graph=False)In this case, the user doesn't even hold a reference to the Actor. But when the final task completes and the scheduler runs _propagate_forgotten to release its dependencies (including Counter), it sees that some Client holds a reference to the Counter, so it doesn't release it—when in fact the client holding the reference is workers[1]'s Actor handle.
This is what's causing test failures in #4925, now that we're more likely to schedule tasks on workers that don't hold any dependencies.