-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Description
Reported in pydata/xarray#4112, where xarray calling Array.__getitem__ resulted in an unbalanced Array.
import dask.array
arr = dask.array.from_array([0, 1, 2, 3], chunks=(1,))
print(arr.chunks) # ((1, 1, 1, 1),)
# align calls reindex which indexes with something like this
indexer = [0, 1, 2, 3, ] + [-1,] * 111
print(arr[indexer].chunks) # ((1, 1, 1, 112),)That last chunk is large, since all of the -1s are in the same input chunk.
To be explicit, the user could chunk the indexer:
# maybe something like this is a solution
lazy_indexer = dask.array.from_array(indexer, chunks=arr.chunks[0][0], name="idx")
print(arr[lazy_indexer].chunks) # ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),)
If the indexer isn't chunked, perhaps dask could examine the input array and indexer and chunk the indexer if we detect that the output chunksize would exceed dask.config.get('array.chunk-size')?
Reactions are currently unavailable