Skip to content

Poor ordering in map_overlap #3554

@jakirkham

Description

@jakirkham

Credits go to @jcrist for discovering this during a recent sprint. Merely opening the issue so we can track it. Placing it on the Dask issue tracker to start, but we can move as needed.

Currently with map_overlap operations, it seems the scheduler is preferring computations such that all intermediaries are held in memory before any of the computations will proceed far enough to release chunks of memory. It may be better to prioritize small chains of computations so that memory is released. ATM it is unclear how easy this would be to accomplish or what sort of tradeoffs this will entail. IOW further investigation is required.

Below is a trivial MRE that demonstrates the issue with the corresponding task graph generated by this operation.

Code:
In [1]: import numpy as np

In [2]: import dask.array as da

In [3]: a = da.random.random((12,), chunks=(4,))

In [4]: b = a.map_overlap(lambda e: 2 * e, depth=1)

In [5]: b.visualize(color="order")
Out[5]: <IPython.core.display.Image object>
Dask Graph:

mydask

Environment:
name: test
channels:
  - conda-forge
  - defaults
dependencies:
  - appnope=0.1.0=py36_0
  - backcall=0.1.0=py_0
  - blas=1.1=openblas
  - bokeh=0.12.16=py36_0
  - ca-certificates=2018.4.16=0
  - certifi=2018.4.16=py36_0
  - click=6.7=py_1
  - cloudpickle=0.5.3=py_0
  - cycler=0.10.0=py36_0
  - cytoolz=0.9.0.1=py36_0
  - dask=0.17.5=py_0
  - dask-core=0.17.5=py_0
  - decorator=4.3.0=py_0
  - distributed=1.21.8=py36_0
  - freetype=2.8.1=0
  - graphviz=2.38.0=7
  - heapdict=1.0.0=py36_0
  - ipython=6.4.0=py36_0
  - ipython_genutils=0.2.0=py36_0
  - jedi=0.12.0=py36_0
  - jinja2=2.10=py36_0
  - jpeg=9b=2
  - kiwisolver=1.0.1=py36_1
  - libgfortran=3.0.0=0
  - libpng=1.6.34=0
  - libtiff=4.0.9=0
  - locket=0.2.0=py36_1
  - markupsafe=1.0=py36_0
  - matplotlib=2.2.2=py36_1
  - msgpack-python=0.5.6=py36h2d50403_2
  - ncurses=5.9=10
  - numpy=1.14.3=py36_blas_openblas_200
  - openblas=0.2.20=8
  - openssl=1.0.2o=0
  - packaging=17.1=py_0
  - pandas=0.23.0=py36_1
  - parso=0.2.1=py_0
  - partd=0.3.8=py36_0
  - pexpect=4.6.0=py36_0
  - pickleshare=0.7.4=py36_0
  - prompt_toolkit=1.0.15=py36_0
  - psutil=5.4.5=py36_0
  - ptyprocess=0.5.2=py36_0
  - pygments=2.2.0=py36_0
  - pyparsing=2.2.0=py36_0
  - python=3.6.5=1
  - python-dateutil=2.7.3=py_0
  - python-graphviz=0.8.3=py36_0
  - pytz=2018.4=py_0
  - pyyaml=3.12=py36_1
  - readline=7.0=0
  - setuptools=39.2.0=py36_0
  - simplegeneric=0.8.1=py36_0
  - six=1.11.0=py36_1
  - sortedcontainers=2.0.2=py36_0
  - sqlite=3.20.1=2
  - tblib=1.3.2=py36_0
  - tk=8.6.7=0
  - toolz=0.9.0=py_0
  - tornado=5.0.2=py36_0
  - traitlets=4.3.2=py36_0
  - wcwidth=0.1.7=py36_0
  - xz=5.2.3=0
  - yaml=0.1.7=0
  - zict=0.1.3=py_0
  - zlib=1.2.11=h470a237_3

cc @mrocklin

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions