-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
arrayneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Description
I am wondering whether dask.Array.name should be a settable property. Given the relationship between the task graph and name, it feels like name is a special property that should be protected.
Consider the following case:
import numpy as np
import dask.array as da
darr = da.from_array(np.arange(1, 10))
darr.name = "foo"
darr.compute()Output
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-0163ab538b86> in <module>
5 print(darr.name)
6 darr.name = "foo"
----> 7 darr.compute()
~/dask/dask/base.py in compute(self, **kwargs)
280 dask.base.compute
281 """
--> 282 (result,) = compute(self, traverse=False, **kwargs)
283 return result
284
~/dask/dask/base.py in compute(*args, **kwargs)
562 postcomputes.append(x.__dask_postcompute__())
563
--> 564 results = schedule(dsk, keys, **kwargs)
565 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
566
~/dask/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
74 pools[thread][num_workers] = pool
75
---> 76 results = get_async(
77 pool.apply_async,
78 len(pool._pool),
~/dask/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
502 finish(dsk, state, not succeeded)
503
--> 504 return nested_get(result, state["cache"])
505
506
~/dask/dask/local.py in nested_get(ind, coll)
298 """
299 if isinstance(ind, list):
--> 300 return tuple([nested_get(i, coll) for i in ind])
301 else:
302 return coll[ind]
~/dask/dask/local.py in <listcomp>(.0)
298 """
299 if isinstance(ind, list):
--> 300 return tuple([nested_get(i, coll) for i in ind])
301 else:
302 return coll[ind]
~/dask/dask/local.py in nested_get(ind, coll)
298 """
299 if isinstance(ind, list):
--> 300 return tuple([nested_get(i, coll) for i in ind])
301 else:
302 return coll[ind]
~/dask/dask/local.py in <listcomp>(.0)
298 """
299 if isinstance(ind, list):
--> 300 return tuple([nested_get(i, coll) for i in ind])
301 else:
302 return coll[ind]
~/dask/dask/local.py in nested_get(ind, coll)
300 return tuple([nested_get(i, coll) for i in ind])
301 else:
--> 302 return coll[ind]
303
304
KeyError: ('foo', 0)This is an especially confusing behavior since other libraries (like xarray and pandas) use name in a much more user-facing way (xref #7209)
Proposal
If I am right in my understanding that name is intentionally tied to the task graph, then I think the setter for name should be made to raise an error, and downstream libraries should use self._name instead.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
arrayneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.