Skip to content

Should dask.array.name be a settable property? #7218

@jsignell

Description

@jsignell

I am wondering whether dask.Array.name should be a settable property. Given the relationship between the task graph and name, it feels like name is a special property that should be protected.

Consider the following case:

import numpy as np
import dask.array as da

darr = da.from_array(np.arange(1, 10))
darr.name = "foo"
darr.compute()
Output
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-0163ab538b86> in <module>
      5 print(darr.name)
      6 darr.name = "foo"
----> 7 darr.compute()

~/dask/dask/base.py in compute(self, **kwargs)
    280         dask.base.compute
    281         """
--> 282         (result,) = compute(self, traverse=False, **kwargs)
    283         return result
    284 

~/dask/dask/base.py in compute(*args, **kwargs)
    562         postcomputes.append(x.__dask_postcompute__())
    563 
--> 564     results = schedule(dsk, keys, **kwargs)
    565     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    566 

~/dask/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     74                 pools[thread][num_workers] = pool
     75 
---> 76     results = get_async(
     77         pool.apply_async,
     78         len(pool._pool),

~/dask/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    502                     finish(dsk, state, not succeeded)
    503 
--> 504     return nested_get(result, state["cache"])
    505 
    506 

~/dask/dask/local.py in nested_get(ind, coll)
    298     """
    299     if isinstance(ind, list):
--> 300         return tuple([nested_get(i, coll) for i in ind])
    301     else:
    302         return coll[ind]

~/dask/dask/local.py in <listcomp>(.0)
    298     """
    299     if isinstance(ind, list):
--> 300         return tuple([nested_get(i, coll) for i in ind])
    301     else:
    302         return coll[ind]

~/dask/dask/local.py in nested_get(ind, coll)
    298     """
    299     if isinstance(ind, list):
--> 300         return tuple([nested_get(i, coll) for i in ind])
    301     else:
    302         return coll[ind]

~/dask/dask/local.py in <listcomp>(.0)
    298     """
    299     if isinstance(ind, list):
--> 300         return tuple([nested_get(i, coll) for i in ind])
    301     else:
    302         return coll[ind]

~/dask/dask/local.py in nested_get(ind, coll)
    300         return tuple([nested_get(i, coll) for i in ind])
    301     else:
--> 302         return coll[ind]
    303 
    304 

KeyError: ('foo', 0)

This is an especially confusing behavior since other libraries (like xarray and pandas) use name in a much more user-facing way (xref #7209)

Proposal
If I am right in my understanding that name is intentionally tied to the task graph, then I think the setter for name should be made to raise an error, and downstream libraries should use self._name instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrayneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions