Skip to content

Cancelled tasks release dependencies #6893

@crusaderky

Description

@crusaderky

When a task execution is cancelled, the scheduler releases all of the task's dependencies, unless they have other dependents or waiters on the scheduler side.
On the worker, this causes the task to transition from executing to cancelled, and its dependencies to immediately transition from memory to released.

There are problems with this:

  • those dependencies will remain in memory until execute() complete, but won't show up anywhere - in other words, they'll become unmanaged memory
  • the dependencies will remain released if the task transitions back to resumed or to executing

This use case is virtually untested. We had for a long time these lines in validate_task_executing:

for dep in ts.dependencies:
assert dep.state == "memory", self.story(dep)
assert dep.key in self.data or dep.key in self.actors

This assertion will fail in case of a executing -> cancelled -> executing cycle, but no unit tests ever tripped it.
The assertion was commented out in #6699. The PR originally tried extending it to the cancelled state, with the result that one (1!) test became flaky, test_cancel_fire_and_forget.

This also causes a race condition in the preamble of Worker.execute, where a task had the time to go through a executing -> cancelled -> executing/resumed cycle by the time it tries to load up the dependencies; the task will fail with KeyError.
The easiest way to get into this use case in real life is when the task has a sizeable run_spec, which is deserialized in a thread, and the threadpool for whatever reason is somewhat slow to respond.

CC @fjetter

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions