-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
We want to implement slicing on chunked dask graphs representing arrays. E.g. if I have a 100 by 100 array, split into 10 by 10 blocks and I compute
>>> x[:15, :15]Then I want the upper left block completely, and half of the block above and below, and a quarter of the block diagonally lower-right. Note that doing this logic correctly depends not only on the dask, but also in the shape-metadata in the Array object.
Slicing might become complex (there is a lot of potential book-keeping here) but a partial solution is probably good enough for now.
Example
Given a dask Array object like the following
>>> x = np.ones((20, 20))
>>> dsk = {'x': x}
>>> a = into(Array, x, blockshape=(5, 5), name='y')
>>> a.dask
{'y': array([[ 1., 1., 1., 1., 1 ...
('y', 0, 0): (<function dask.array.ndslice>, 'y', (5, 5), 0, 0),
('y', 0, 1): (<function dask.array.ndslice>, 'y', (5, 5), 0, 1),
...
('y', 3, 2): (<function dask.array.ndslice>, 'y', (5, 5), 3, 2),
('y', 3, 3): (<function dask.array.ndslice>, 'y', (5, 5), 3, 3)}
}and a blaze expression like the following
>>> from blaze import symbol, compute
>>> s = symbol('s', '20 * 20 * int')
>>> expr = s[:8, :8]We'd like to be able to compute a new dask.obj.Array object with the following dask
def sliceit(x, *inds):
return x[*inds]
{('y_1', 0, 0): ('y', 0, 0),
('y_1', 0, 1): (sliceit, ('y', 0, 1), slice(None, None), slice(None, 3)),
('y_1', 1, 0): (sliceit, ('y', 1, 0), slice(None, 3), slice(None, None)),
('y_1', 1, 1): (sliceit, ('y', 1, 1), slice(None, 3), slice(None, 3)),
...Note on code complexity
Dask is likely to be refactored. Things like the Array class are experimental and likely to change shape in the future. We want to avoid work when that refactor occurs. Because of this it's nice to keep as much gritty detail work like this independent from the current conventions for as long as possible. E.g. there is likely a std-lib only function that creates a dictionary from pure-python terms (tuples, dicts, names, slices) and then a dask.obj.Array-aware function. This is the approach behind array.top and obj.atop