Skip to content

Cull in array slicing hurts performance for highly nested computations #1732

@mrocklin

Description

@mrocklin

Recently we started culling dask.array graphs when slicing if the output array had many fewer output chunks than the input array. There is a tradeoff here between copying the full .dask attribute and culling it. Copying is cheaper per node but operates on all of the nodes. Culling is much more expensive per node but operates only on the nodes that survive the process. Additionally culling is also better downstream because future computations will have less to deal with.

However, there are some fail cases. Here is a good one:

from dask import array as da
z = da.ones((100, 100), chunks=(100, 1))
z = z.rechunk((1, 100))
z = z.rechunk((100, 1))
z = z.rechunk((1, 100))
z = z.rechunk((100, 1))
z = z.rechunk((1, 100))
z = z.rechunk((100, 1))
z = z.rechunk((1, 100))
%prun z[0]
         818565 function calls (576944 primitive calls) in 0.680 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
272107/60802    0.281    0.000    0.489    0.000 core.py:159(_deps)
30903/602    0.173    0.000    0.284    0.000 core.py:186(<listcomp>)
        1    0.113    0.113    0.651    0.651 optimize.py:14(cull)
    60801    0.030    0.000    0.519    0.000 core.py:197(get_dependencies)
        1    0.027    0.027    0.680    0.680 <string>:1(<module>)
    30905    0.022    0.000    0.022    0.000 {built-in method builtins.sum}
   241204    0.013    0.000    0.013    0.000 {built-in method builtins.callable}
    60801    0.008    0.000    0.008    0.000 {method 'pop' of 'list' objects}
    60810    0.008    0.000    0.008    0.000 {method 'add' of 'set' objects}
    60800    0.003    0.000    0.003    0.000 {method 'append' of 'list' objects}
        1    0.001    0.001    0.653    0.653 core.py:1104(__getitem__)
      5/1    0.000    0.000    0.000    0.000 abc.py:194(__subclasscheck__)
        1    0.000    0.000    0.000    0.000 slicing.py:240(slice_slices_and_integers)

This is run under @pitrou 's branch #1731

Some options:

  1. We create a better fast check that allows us to approximate the size of the output graph that doesn't rely on get_dependencies (slow) and use that to decide if we should cull or copy
  2. We provide globals to decide to cull or not (though this feels like bloat to me)
  3. ??

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions