Scheduler behaves badly when adaptively adding workers to meet resource demand 

There is an issue with how the scheduler assigns tasks from the `unrannable` queue to workers who meet the resource requirements joining the scheduler.

The use case is some long running complex task where some tasks require an expensive resource, say GPUs, but those resources are only provided (through `Adaptive`) once the tasks requiring those resources are ready to be run. Say we come to a point in the computation where 5 tasks could be run, if GPUs were available. Managing the `scale_up` behaviour through adaptive is fairly straightforward and allows adding new compute nodes (for instance on AWS) with the required resources. The problem appears when the first of the new workers connects.

`Scheduler.addWorker` will go through the list of `unrunnable` tasks and check if there are workers that meet the requirements, since only the first worker has thus far connected there is only one worker that meets the requirements (some possibly positive number of workers may or may not booting up and joining shortly, but that hasn't happened yet).

            for ts in list(self.unrunnable):
                valid = self.valid_workers(ts)
                if valid is True or ws in valid:
                    recommendations[ts.key] = 'waiting'

The task goes through `released -> waiting` and then `waiting -> processing`, the `transition_waiting_processing` again calls `valid_workers` to get a list of workers where the task(s) can be run (this list still just contains a single worker because the other ones haven't yet connected).

The end result of all of this is that the worker who happens connect first and have the resource required by the tasks gets all the tasks dumped onto it with all the other workers, who potentially connect just seconds later, get nothing and are shutdown by the scheduler because they are idling.

In short, it appears to be the case that the purpose of the `resource_requirements` is to act as a hint of required peak performance (memory, GPU, whatever) from the workers, and not to be a dynamically changing resource allocation. Is this the case, and is there any interested in changing that? The resources available and resources consumed are taken into account in `transition_waiting_processing`, but only on the `worker_state` not for the scheduler in general.

If this is not the intended behaviour and should be fixed, I'm more than happy to work on this.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scheduler behaves badly when adaptively adding workers to meet resource demand #1851

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Scheduler behaves badly when adaptively adding workers to meet resource demand #1851

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions