Cull in array slicing hurts performance for highly nested computations

Recently we started culling dask.array graphs when slicing if the output array had many fewer output chunks than the input array.  There is a tradeoff here between copying the full `.dask` attribute and culling it.  Copying is cheaper per node but operates on all of the nodes.  Culling is much more expensive per node but operates only on the nodes that survive the process.  Additionally culling is also better downstream because future computations will have less to deal with.

However, there are some fail cases.  Here is a good one:

```python
from dask import array as da
z = da.ones((100, 100), chunks=(100, 1))
z = z.rechunk((1, 100))
z = z.rechunk((100, 1))
z = z.rechunk((1, 100))
z = z.rechunk((100, 1))
z = z.rechunk((1, 100))
z = z.rechunk((100, 1))
z = z.rechunk((1, 100))
%prun z[0]
```

```
         818565 function calls (576944 primitive calls) in 0.680 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
272107/60802    0.281    0.000    0.489    0.000 core.py:159(_deps)
30903/602    0.173    0.000    0.284    0.000 core.py:186(<listcomp>)
        1    0.113    0.113    0.651    0.651 optimize.py:14(cull)
    60801    0.030    0.000    0.519    0.000 core.py:197(get_dependencies)
        1    0.027    0.027    0.680    0.680 <string>:1(<module>)
    30905    0.022    0.000    0.022    0.000 {built-in method builtins.sum}
   241204    0.013    0.000    0.013    0.000 {built-in method builtins.callable}
    60801    0.008    0.000    0.008    0.000 {method 'pop' of 'list' objects}
    60810    0.008    0.000    0.008    0.000 {method 'add' of 'set' objects}
    60800    0.003    0.000    0.003    0.000 {method 'append' of 'list' objects}
        1    0.001    0.001    0.653    0.653 core.py:1104(__getitem__)
      5/1    0.000    0.000    0.000    0.000 abc.py:194(__subclasscheck__)
        1    0.000    0.000    0.000    0.000 slicing.py:240(slice_slices_and_integers)
```

This is run under @pitrou 's branch #1731 

Some options: 

1.  We create a better fast check that allows us to approximate the size of the output graph that doesn't rely on `get_dependencies` (slow) and use that to decide if we should cull or copy
2.  We provide globals to decide to cull or not (though this feels like bloat to me)
3.  ??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cull in array slicing hurts performance for highly nested computations #1732

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cull in array slicing hurts performance for highly nested computations #1732

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions