Distributed memory management plugin

It would be useful to have a plugin that tracked and managed distributed memory.

Currently results are copied around to various workers as necessary for computation.  Our current policy is to avoid movement when possible and keep duplicated data around until it is no longer necessary with the thought that data that ends up being duplicated organically is likely in high demand.  This can currently be manually controlled by two synchronous operations:

1.  `rebalance(futures)`: takes a set of futures and tries to balance them around a set of workers so that all workers store roughly the same number of bytes
2.  `replicate(futures)`: Change the number of replicas for results.  This is commonly used to spread around commonly needed intermediate results efficiently (this performs a tree-scatter) or to clean up replicated data (n=1) and increase free space

Currently people do this by hand.  This is error prone for a number of reasons and generally rare among non-expert users. It would be interesting to consider a scheduler plugin that watched current memory usage and responded dynamically by moving data around in the background.  Operations like `rebalance`, and `replicate` might then set objectives for this dynamic system to accomplish rather than perform the operation explicitly.

There are a few objectives when moving memory around:

1.  Respect replication constraints set by users (came up in https://github.com/dask/distributed/issues/3184)
2.  Redistribute data evenly across workers
3.  Free up memory, particularly when we have highly replicated data
4.  But still respect replication, particularly when it is cheap or frequently used

I think that this would improve memory usage when we're constrained.  I think that it shouldn't impact task scheduling much if we act in a safe manner like verifying that copied data has been moved to one location before deleting it from its origin, etc..  It may be challenging to do this efficiently in linear time, but we could also punt if we notice that the scheduler is otherwise busy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed memory management plugin #1002

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Distributed memory management plugin #1002

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions