Skip to content

Extend spec to include dictionaries#1748

Closed
mrocklin wants to merge 1 commit intodask:masterfrom
mrocklin:spec-dicts
Closed

Extend spec to include dictionaries#1748
mrocklin wants to merge 1 commit intodask:masterfrom
mrocklin:spec-dicts

Conversation

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Nov 2, 2016

cc @jcrist @eriknw thought we might want to consider the discussion outside of #1731

@mrocklin mrocklin mentioned this pull request Nov 2, 2016
@eriknw
Copy link
Member

eriknw commented Nov 14, 2016

This seems like a reasonable extension to the protocol since it's good practice to use a hash/salt when generating keys anyway, which ensures literals won't be clobbered.

It also lets us apply keyword arguments via apply as shown here:

In [1]: import dask
   ...: import distributed
   ...: 
   ...: def f(x, y):
   ...:     return x + y
   ...: 
   ...: d = {
   ...:     'x': 1,  
   ...:     'y': 2,  
   ...:     'f': (apply, f, (), {'x': 'x', 'y': 'y'}),
   ...: }
   ...: 

In [2]: c = distributed.Client()
   ...: c.get(d, 'f')  # new spec
   ...: 
Out[2]: 3

In [3]: dask.get(d, 'f')  # old spec
Out[3]: 'xy'

@sinhrks sinhrks added the documentation Improve or add to documentation label Dec 6, 2016
@mrocklin
Copy link
Member Author

I plan to merge this soon if there are no further comments

@jcrist
Copy link
Member

jcrist commented Apr 26, 2017

My gripe in #1731 (comment) still holds. This means that there is no zero-processing overhead way to pass a dictionary, where before there was. Now every dictionary in a task needs to be scanned for any task-like-thing. This might be an imagined fear though, and the overhead for real-world problems may be negligible.

This is also only implemented in distributed currently and some parts of dask (e.g. get_dependencies).

@mrocklin
Copy link
Member Author

I think that that is a valid concern. We've had similar issues with lists. For what it's worth I have not experienced performance pain due to traversing dicts when using the distributed scheduler, where this is already enacted.

@mrocklin
Copy link
Member Author

This issue came up. @jcrist have your thoughts on this changed at all?

@jcrist
Copy link
Member

jcrist commented May 30, 2018

I still think it overcomplicates the spec, but it's worse that the local and distributed schedulers don't have matching behavior (and some users have come to rely on it). Since we can't remove the behavior from distributed, we should update the local schedulers to match. My (possibly not valid) worries are:

  • This opens the spec up to include other python builtin collections (which I think would be a bad idea)
  • Since only tasks as dictionary values are supported (which I think is good), users might be confused by the lack of support for tasks as dictionary keys. This is probably a documentation issue.
  • quote should probably be documented somewhere to allow preventing this behavior, but I don't think should be in the top-level namespace. Its current location in dask.core seems fine to me.

@mrocklin mrocklin closed this Jan 3, 2019
@mrocklin mrocklin deleted the spec-dicts branch January 3, 2019 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improve or add to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants