Skip to content

dask.array.core.map_blocks mis-handles kwargs in task name #8632

@graingert

Description

@graingert

Minimal Complete Verifiable Example:

import dask
import dask.array as da
import distributed
import numpy as np

def mul(x, y=None, *, ham=None):
    if y is None:
        return x * 3
    return x * 2


def main():
    with distributed.Client() as c:
        x = da.arange(6, chunks=3)
        d1 = x.map_blocks(mul, ham=1, dtype=np.int_)
        d2 = x.map_blocks(mul, {"ham": 1}, dtype=np.int_)
        print(dask.compute(d1, d2))  # (array([ 0,  3,  6,  9, 12, 15]), array([ 0,  3,  6,  9, 12, 15]))
        print(dask.compute(d2, d1))  # (array([ 0,  2,  4,  6,  8, 10]), array([ 0,  2,  4,  6,  8, 10]))

if __name__ == "__main__":
    main()

this prints:

(array([ 0,  3,  6,  9, 12, 15]), array([ 0,  3,  6,  9, 12, 15]))
(array([ 0,  2,  4,  6,  8, 10]), array([ 0,  2,  4,  6,  8, 10]))

but I expected

(array([ 0,  3,  6,  9, 12, 15]), array([ 0,  2,  4,  6,  8, 10]))
(array([ 0,  2,  4,  6,  8, 10]), array([ 0,  3,  6,  9, 12, 15]))

Anything else we need to know?:

name is calculated with

name = f"{name or funcname(func)}-{tokenize(func, *args, **kwargs)}"

however kwargs are treated the same as a dict in the last arg

dask/dask/base.py

Lines 857 to 858 in 7446308

if kwargs:
args = args + (kwargs,)

eg:

>>> import dask.base
>>> dask.base.tokenize("hello", ham="eggs")
'bdafbbae310e8f3f583f8002342c8ebb'
>>> dask.base.tokenize("hello", {"ham":"eggs"})
'bdafbbae310e8f3f583f8002342c8ebb' 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions