Skip to content

Investigate and remove unusual scheduler transitions to memory #7210

@gjoseph92

Description

@gjoseph92

There are a few strange ways the scheduler lets tasks end up in memory:

  • transition_processing_memory (with unexpected worker)
  • transition_waiting_memory
  • transition_no_worker_memory

These cases don't have much testing, and it's hard to think of cases where it would be valid for them to happen.

They all revolve around the idea of a task completing on multiple workers at once. This is of course possible (anything is possible in a distributed system), but since removing worker reconnect #6361, it shouldn't be possible that a task completes on multiple connected workers at once. That is, before the scheduler would receive the task-finished message, the BatchedStream carrying that message should be disconnected, so the message wouldn't actually be processed.

So far, the only "valid" way we've come up with to trigger these strange transitions is Scheduler.reschedule, which shouldn't be used anyway #7209.

See discussions for background in:

We should investigate whether these transitions are actually still valid, and if not, remove them.

cc @crusaderky @fjetter

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions