ENH: impure tasks?

When creating a DAG, e.g., with delayed, it is possible to declare a task as "impure", meaning that a new unique key is generated even when the input arguments are identical. 

I am wondering if there is any appetite for a similar concept on the scheduler: where a task-key is annotated as side-effect only and having no useful return value. Use cases might include IO on some external storage, where we want to ensure than an operation happened, but if the worker that executed it goes down, there is no need to repeat it. In other words, when the scheduler state for the task would normally go to in-memory, it can now be just "completed" (or released) and any task that depends on it can be allowed to run without having to fetch any results. A set of CSV write tasks with a finalize task depending on all of them would be a good example of this (and the barrier doesn't actually need to execute anything in this case, it's only a meta-task of dependencies).

This pattern would weakly move towards tasks that are executed exactly once, where the side-effect is mutation of some resource. It would *not* guard against a task being run simultaneously on two workers - an opposite to [speculative execution](https://dask.discourse.group/t/speculative-execution/350). It's probably not feasible to make a strict guarantee of such without a lot of work.

Feel free to say that considering this is unnecessary complexity. I am thinking of it in terms of shared mutable memory between processes on a single node - but the big IO case is also interesting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: impure tasks? #6378

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: impure tasks? #6378

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions