You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to have a plugin that tracked and managed distributed memory.
Currently results are copied around to various workers as necessary for computation. Our current policy is to avoid movement when possible and keep duplicated data around until it is no longer necessary with the thought that data that ends up being duplicated organically is likely in high demand. This can currently be manually controlled by two synchronous operations:
rebalance(futures): takes a set of futures and tries to balance them around a set of workers so that all workers store roughly the same number of bytes
replicate(futures): Change the number of replicas for results. This is commonly used to spread around commonly needed intermediate results efficiently (this performs a tree-scatter) or to clean up replicated data (n=1) and increase free space
Currently people do this by hand. This is error prone for a number of reasons and generally rare among non-expert users. It would be interesting to consider a scheduler plugin that watched current memory usage and responded dynamically by moving data around in the background. Operations like rebalance, and replicate might then set objectives for this dynamic system to accomplish rather than perform the operation explicitly.
There are a few objectives when moving memory around:
Free up memory, particularly when we have highly replicated data
But still respect replication, particularly when it is cheap or frequently used
I think that this would improve memory usage when we're constrained. I think that it shouldn't impact task scheduling much if we act in a safe manner like verifying that copied data has been moved to one location before deleting it from its origin, etc.. It may be challenging to do this efficiently in linear time, but we could also punt if we notice that the scheduler is otherwise busy.
It would be useful to have a plugin that tracked and managed distributed memory.
Currently results are copied around to various workers as necessary for computation. Our current policy is to avoid movement when possible and keep duplicated data around until it is no longer necessary with the thought that data that ends up being duplicated organically is likely in high demand. This can currently be manually controlled by two synchronous operations:
rebalance(futures): takes a set of futures and tries to balance them around a set of workers so that all workers store roughly the same number of bytesreplicate(futures): Change the number of replicas for results. This is commonly used to spread around commonly needed intermediate results efficiently (this performs a tree-scatter) or to clean up replicated data (n=1) and increase free spaceCurrently people do this by hand. This is error prone for a number of reasons and generally rare among non-expert users. It would be interesting to consider a scheduler plugin that watched current memory usage and responded dynamically by moving data around in the background. Operations like
rebalance, andreplicatemight then set objectives for this dynamic system to accomplish rather than perform the operation explicitly.There are a few objectives when moving memory around:
I think that this would improve memory usage when we're constrained. I think that it shouldn't impact task scheduling much if we act in a safe manner like verifying that copied data has been moved to one location before deleting it from its origin, etc.. It may be challenging to do this efficiently in linear time, but we could also punt if we notice that the scheduler is otherwise busy.