Skip to content

Strange behaviour in map_blocks when using empty chunks #6649

@JSKenyon

Description

@JSKenyon

What happened:
While playing around with rechunking and map_blocks I started getting results which I couldn't readily explain. Specifically, the way empty chunks get handled in a map_blocks call (and probably in blockwise) after a rechunk seems to generate very peculiar results.

What you expected to happen:
I would expect empty chunks to be passed in as an empty array or None.

Minimal Complete Verifiable Example:

import dask.array as da

a = da.ones(5)
b = a.rechunk({0: (0, 5, 0)})
c = b.map_blocks(lambda x: x, dtype=int)

print(c.compute())
# ['1.0' '1.0' '1.0' '1.0' '1.0'
#  'rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '1'
#  'rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '2']

Anything else we need to know?:
The above would not be all that surprising if the numerical values were between the task names - that would at least suggest that the chunks were correctly aligned with the task, e.g.:

# ['rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '0'
#  '1.0' '1.0' '1.0' '1.0' '1.0'
#  'rechunk-merge-6fad39db5dc73de6e4bc404c4ae4dbbb' '2']

I was attempting to use this behaviour to manufacture chunks in cases where the input is sparse and you don't necessarily have data in every chunk. I also appreciate that this might not be the best way of doing things, but I found this result interesting and confusing enough to maybe flag as a bug.

Environment:

  • Dask version: 2.26.0
  • Python version: 3.6.9
  • Operating System: Pop!_OS 18.04 LTS
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrayneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions