Tasks order and scheduling

First of all I apologies because this is more a question than an issue or feature request (though it might turn to one of those in the end).

I am having a really hard time to figure out how to influence task scheduling to get it the way I want it.

Here is my setup. I have two set of tasks, say A and B. Each set has a few thousands tasks in it. A tasks depends on nothing. Each B task depends on a few A tasks output (5 in average).

In my code I delay all A tasks, then I delay B tasks using delayed A objects to model dependencies.

Then I call `client.compute(B_delayed)` and I retrieve results of B on master node on the flow, using `as_completed` (client is from `dask.distributed`).

I would expect the scheduler to figure out that each B task has very few dependencies to A tasks, and therefore to try to complete entire B tasks to be able to clear ressources for A tasks that are no longer needed.

What happens is quite the opposite : the scheduler prioritize A tasks, trying to complete almost all of them while very scarcely computing B tasks (even when a lot of them become ready to run).

This is problematic for us:
- On the flow retrieval is less efficient if all B tasks come at the end (it involves IO operations)
- In some cases, the amount of memory on workers is insufficient to store all A results. This memory could be easily cleared, if only the scheduler decided to complete the corresponding B tasks.

What I tried so far:
- Also submit A tasks with `client.compute()`, with a lower priority (did not change the behavior)
- Submit B tasks in batch with decreasing priority (it helps a bit, but scheduler still wants to complete as many A tasks as possible).
- Mix of both : no noticeable progress





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tasks order and scheduling #5584

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tasks order and scheduling #5584

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions