Skip to content

Indexing a dask array with another dask array #3096

@mraspaud

Description

@mraspaud

So, following the instructions in the documentation, I'm raising an issue :)

It does not currently support the following:

  • Slicing one dask.array with another x[x > 0]
  • Slicing with lists in multiple axes x[[1, 2, 3], [3, 2, 1]]

Both of these are straightforward to add though. If you have a use case then raise an issue.

Our use case is a quite common one in our domain (weather satellite data processing), that is using lookup tables like this:

import numpy as np
import dask.array as da

lut = np.random.rand(1024) * 100
dlut = da.from_array(lut, chunks=1024)
idx = np.random.randint(1024, size=(1000, 1000))
print(dlut[idx.ravel()])
# -> dask.array<getitem, shape=(1000000,), dtype=float64, chunksize=(1000000,)>

didx = da.from_array(idx, chunks=1000)
print(dlut[didx.ravel()])

I most cases, the lookup table is quite small (256 to 4096 values), but the index (the actual data to be converted) is large.

The previous script crashes on the last line with this traceback

  File "test_dask_idx.py", line 13, in <module>
    print(dlut[didx.ravel()])
  File "/home/a001673/.local/lib/python2.7/site-packages/dask/array/core.py", line 1257, in __getitem__
    self, index2 = slice_with_dask_array(self, index2)
  File "/home/a001673/.local/lib/python2.7/site-packages/dask/array/slicing.py", line 844, in slice_with_dask_array
    y = elemwise(getitem, x, *index, dtype=x.dtype)
  File "/home/a001673/.local/lib/python2.7/site-packages/dask/array/core.py", line 2751, in elemwise
    out_ndim = len(broadcast_shapes(*shapes))   # Raises ValueError if dimensions mismatch
  File "/home/a001673/.local/lib/python2.7/site-packages/dask/array/core.py", line 2723, in broadcast_shapes
    "shapes {0}".format(' '.join(map(str, shapes))))
ValueError: operands could not be broadcast together with shapes (1024,) (1000000,)

The expected behaviour would be of course to have a dask Array returned.

As a bonus item, the .ravel() thing is a bit annoying, since all we do in the end is ravel and reshape (the data is most often 2D).

dask version is 0.16.0+37.g1fef002
numpy version is 1.13.3

Thanks for a great package !

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions