-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
What happened:
While playing around with rechunking and map_blocks I started getting results which I couldn't readily explain. Specifically, the way empty chunks get handled in a map_blocks call (and probably in blockwise) after a rechunk seems to generate very peculiar results.
What you expected to happen:
I would expect empty chunks to be passed in as an empty array or None.
Minimal Complete Verifiable Example:
import dask.array as da
a = da.ones(5)
b = a.rechunk({0: (0, 5, 0)})
c = b.map_blocks(lambda x: x, dtype=int)
print(c.compute())
# ['1.0' '1.0' '1.0' '1.0' '1.0'
# 'rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '1'
# 'rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '2']Anything else we need to know?:
The above would not be all that surprising if the numerical values were between the task names - that would at least suggest that the chunks were correctly aligned with the task, e.g.:
# ['rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '0'
# '1.0' '1.0' '1.0' '1.0' '1.0'
# 'rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '2']I was attempting to use this behaviour to manufacture chunks in cases where the input is sparse and you don't necessarily have data in every chunk. I also appreciate that this might not be the best way of doing things, but I found this result interesting and confusing enough to maybe flag as a bug.
Environment:
- Dask version: 2.26.0
- Python version: 3.6.9
- Operating System: Pop!_OS 18.04 LTS
- Install method (conda, pip, source): pip