Conversation
|
This seems like a reasonable extension to the protocol since it's good practice to use a hash/salt when generating keys anyway, which ensures literals won't be clobbered. It also lets us apply keyword arguments via In [1]: import dask
...: import distributed
...:
...: def f(x, y):
...: return x + y
...:
...: d = {
...: 'x': 1,
...: 'y': 2,
...: 'f': (apply, f, (), {'x': 'x', 'y': 'y'}),
...: }
...:
In [2]: c = distributed.Client()
...: c.get(d, 'f') # new spec
...:
Out[2]: 3
In [3]: dask.get(d, 'f') # old spec
Out[3]: 'xy' |
|
I plan to merge this soon if there are no further comments |
|
My gripe in #1731 (comment) still holds. This means that there is no zero-processing overhead way to pass a dictionary, where before there was. Now every dictionary in a task needs to be scanned for any task-like-thing. This might be an imagined fear though, and the overhead for real-world problems may be negligible. This is also only implemented in distributed currently and some parts of dask (e.g. |
|
I think that that is a valid concern. We've had similar issues with lists. For what it's worth I have not experienced performance pain due to traversing dicts when using the distributed scheduler, where this is already enacted. |
|
This issue came up. @jcrist have your thoughts on this changed at all? |
|
I still think it overcomplicates the spec, but it's worse that the local and distributed schedulers don't have matching behavior (and some users have come to rely on it). Since we can't remove the behavior from distributed, we should update the local schedulers to match. My (possibly not valid) worries are:
|
cc @jcrist @eriknw thought we might want to consider the discussion outside of #1731