Skip to content

Blaze slicing #1

@mrocklin

Description

@mrocklin

We want to implement slicing on chunked dask graphs representing arrays. E.g. if I have a 100 by 100 array, split into 10 by 10 blocks and I compute

>>> x[:15, :15]

Then I want the upper left block completely, and half of the block above and below, and a quarter of the block diagonally lower-right. Note that doing this logic correctly depends not only on the dask, but also in the shape-metadata in the Array object.

Slicing might become complex (there is a lot of potential book-keeping here) but a partial solution is probably good enough for now.

Example

Given a dask Array object like the following

>>> x = np.ones((20, 20))
>>> dsk = {'x': x}
>>> a = into(Array, x, blockshape=(5, 5), name='y')
>>> a.dask
{'y': array([[ 1.,  1.,  1.,  1.,  1 ...
 ('y', 0, 0): (<function dask.array.ndslice>, 'y', (5, 5), 0, 0),
 ('y', 0, 1): (<function dask.array.ndslice>, 'y', (5, 5), 0, 1),
...
 ('y', 3, 2): (<function dask.array.ndslice>, 'y', (5, 5), 3, 2),
 ('y', 3, 3): (<function dask.array.ndslice>, 'y', (5, 5), 3, 3)}
}

and a blaze expression like the following

>>> from blaze import symbol, compute
>>> s = symbol('s', '20 * 20 * int')
>>> expr = s[:8, :8]

We'd like to be able to compute a new dask.obj.Array object with the following dask

def sliceit(x, *inds):
    return x[*inds]

{('y_1', 0, 0): ('y', 0, 0),
 ('y_1', 0, 1): (sliceit, ('y', 0, 1), slice(None, None), slice(None, 3)),
 ('y_1', 1, 0): (sliceit, ('y', 1, 0), slice(None, 3), slice(None, None)),
 ('y_1', 1, 1): (sliceit, ('y', 1, 1), slice(None, 3), slice(None, 3)),
 ...

Note on code complexity

Dask is likely to be refactored. Things like the Array class are experimental and likely to change shape in the future. We want to avoid work when that refactor occurs. Because of this it's nice to keep as much gritty detail work like this independent from the current conventions for as long as possible. E.g. there is likely a std-lib only function that creates a dictionary from pure-python terms (tuples, dicts, names, slices) and then a dask.obj.Array-aware function. This is the approach behind array.top and obj.atop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions