ENH: Delayed variant of persist (pin?) 

It seems that storing an intermediate result now requires one call `persist` or `cache`, which trigger a computation in the background. Something that would be really nice is delayed caching (pinning?).

To elaborate a bit, pinning would mean that when computation is triggered by other means, the pinned object would be computed once and then persisted on the workers. However noting that the object should be pinned would not result in any computation on its own. Here's a simple example to demonstrate what this might mean.

```python
In [1]: import dask.array as da

In [2]: import numpy as np

In [3]: a = da.from_array(np.arange(6), chunks=(6,))

In [4]: b = a * 2

In [5]: b = b.pin()  # persist this later

In [6]: c = b + 3

In [7]: c.compute()  # compute `c` and persist `b` in the process
Out[7]: array([ 3,  5,  7,  9, 11, 13])

In [8]: b.compute()  # fetch `b`, it was already computed when `c` was
Out[8]: array([ 0,  2,  4,  6,  8, 10])

In [9]: del a, b, c  # free memory, particularly `b` and `c` are needed to free `b`'s memory
```

Not to walk through all of this, but calling `b.pin()` notes that this value should be persisted on the workers once computation is triggered. Though otherwise it is just an annotation in the Dask graph. When we call `c.compute()`, this triggers computation, which will result in computing `b` as well. Once `b` is computed, its result is kept on the workers just as `b.persist()` does. So when we call `b.compute()`, it doesn't really compute anything. It just returns the result that was persisted in memory on the workers. To free the memory attached to `b`, we just release all references to it as we would do with `persist`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Delayed variant of persist (pin?) #2156

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Delayed variant of persist (pin?) #2156

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions