-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Credits go to @jcrist for discovering this during a recent sprint. Merely opening the issue so we can track it. Placing it on the Dask issue tracker to start, but we can move as needed.
Currently with map_overlap operations, it seems the scheduler is preferring computations such that all intermediaries are held in memory before any of the computations will proceed far enough to release chunks of memory. It may be better to prioritize small chains of computations so that memory is released. ATM it is unclear how easy this would be to accomplish or what sort of tradeoffs this will entail. IOW further investigation is required.
Below is a trivial MRE that demonstrates the issue with the corresponding task graph generated by this operation.
Code:
In [1]: import numpy as np
In [2]: import dask.array as da
In [3]: a = da.random.random((12,), chunks=(4,))
In [4]: b = a.map_overlap(lambda e: 2 * e, depth=1)
In [5]: b.visualize(color="order")
Out[5]: <IPython.core.display.Image object>Environment:
name: test
channels:
- conda-forge
- defaults
dependencies:
- appnope=0.1.0=py36_0
- backcall=0.1.0=py_0
- blas=1.1=openblas
- bokeh=0.12.16=py36_0
- ca-certificates=2018.4.16=0
- certifi=2018.4.16=py36_0
- click=6.7=py_1
- cloudpickle=0.5.3=py_0
- cycler=0.10.0=py36_0
- cytoolz=0.9.0.1=py36_0
- dask=0.17.5=py_0
- dask-core=0.17.5=py_0
- decorator=4.3.0=py_0
- distributed=1.21.8=py36_0
- freetype=2.8.1=0
- graphviz=2.38.0=7
- heapdict=1.0.0=py36_0
- ipython=6.4.0=py36_0
- ipython_genutils=0.2.0=py36_0
- jedi=0.12.0=py36_0
- jinja2=2.10=py36_0
- jpeg=9b=2
- kiwisolver=1.0.1=py36_1
- libgfortran=3.0.0=0
- libpng=1.6.34=0
- libtiff=4.0.9=0
- locket=0.2.0=py36_1
- markupsafe=1.0=py36_0
- matplotlib=2.2.2=py36_1
- msgpack-python=0.5.6=py36h2d50403_2
- ncurses=5.9=10
- numpy=1.14.3=py36_blas_openblas_200
- openblas=0.2.20=8
- openssl=1.0.2o=0
- packaging=17.1=py_0
- pandas=0.23.0=py36_1
- parso=0.2.1=py_0
- partd=0.3.8=py36_0
- pexpect=4.6.0=py36_0
- pickleshare=0.7.4=py36_0
- prompt_toolkit=1.0.15=py36_0
- psutil=5.4.5=py36_0
- ptyprocess=0.5.2=py36_0
- pygments=2.2.0=py36_0
- pyparsing=2.2.0=py36_0
- python=3.6.5=1
- python-dateutil=2.7.3=py_0
- python-graphviz=0.8.3=py36_0
- pytz=2018.4=py_0
- pyyaml=3.12=py36_1
- readline=7.0=0
- setuptools=39.2.0=py36_0
- simplegeneric=0.8.1=py36_0
- six=1.11.0=py36_1
- sortedcontainers=2.0.2=py36_0
- sqlite=3.20.1=2
- tblib=1.3.2=py36_0
- tk=8.6.7=0
- toolz=0.9.0=py_0
- tornado=5.0.2=py36_0
- traitlets=4.3.2=py36_0
- wcwidth=0.1.7=py36_0
- xz=5.2.3=0
- yaml=0.1.7=0
- zict=0.1.3=py_0
- zlib=1.2.11=h470a237_3cc @mrocklin
