Skip to content

Key and Task classes #2299

@shoyer

Description

@shoyer

I think it would be worth considering adding optional light-weight classes to represent keys and tasks in a dask graph. These would complement the existing dask.core.quote for literals.

This would allow for much clearer intent when creating dask graphs, and better error messages when things go wrong (e.g., for #2298), because dask could know unambiguously what an object is intended to represent without needing to guess about what it is. For example, if a key is not found, dask could raise an error instead of using it as a literal.

These could be simple tuple subclasses, e.g.,

class Key(tuple):
  __slots__ = ()
  
  def __new__(cls, *args):
    return tuple.__new__(Key, args)

  def __repr__(self):
    contents = repr(tuple(self))
    if len(self) == 1:
      contents = contents[:-len(',)')] + ')'
    return 'Key{}'.format(contents)

The Task class could automatically handle **kwargs in the proper fashion, e.g., Task(pd.read_csv, filename, sep='\t').

This is more verbose than using Python builtins, but not onerously so. E.g., adapting the "Custom Graphs" example from the docs:

from dask import Task, Key

...
dsk = {'load-1': Task(load, 'myfile.a.data'),
       'load-2': Task(load, 'myfile.b.data'),
       'load-3': Task(load, 'myfile.c.data'),
       'clean-1': Task(clean, Key('load-1')),
       'clean-2': Task(clean, Key('load-2')),
       'clean-3': Task(clean, Key('load-3')),
       'analyze': Task(analyze, [Key('clean-%d') % i for i in [1, 2, 3]]),
       'store': Task(store, Key('analyze'))}

Possibly, we would want a "strict evaluation" mode that requires all tasks and keys to be wrapped in the appropriate classes, and switches the default interpretation for everything else to be a literal. Think of this as "strong typing" for dask.

I think this would be really valuable for library code, such as the existing dask collections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussing a topic with no specific actions yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions