Skip to content

eriknw/dask

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dask

A minimal task scheduling abstraction.

See Dask documentation at http://dask.readthedocs.org

LICENSE

New BSD. See License File.

Install

dask is not yet on any package index. It is still experimental.

python setup.py install

Example

Consider the following simple program

def inc(i):
    return i + 1

def add(a, b):
    return a + b

x = 1
y = inc(x)
z = add(y, 10)

We encode this as a dictionary in the following way

d = {'x': 1,
     'y': (inc, 'x'),
     'z': (add, 'y', 10)}

While less aesthetically pleasing this dictionary may now be analyzed, optimized, and computed on by other Python code, not just the Python interpreter.

A simple dask dictionary

Dependencies

dask.core supports Python 2.6+ and Python 3.2+ with a common codebase. It is pure Python and requires no dependencies beyond the standard library.

It is, in short, a light weight dependency.

The threaded implementation depends on networkx. The Array dataset depends on numpy and the blaze family of projects.

Related Work

One might ask why we didn't use one of these other fine libraries:

The answer is because we wanted all of the following:

  • Fine-ish grained parallelism (latencies around 1ms)
  • In-memory communication of intermediate results
  • Dependency structures more complex than map
  • Good support for numeric data
  • First class Python support
  • Trivial installation

Most task schedulers in the Python ecosystem target long-running batch jobs, often for processing large amounts of text and aren't appropriate for executing multi-core numerics.

About

Minimal task scheduling abstraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 86.0%
  • Makefile 7.3%
  • Shell 6.7%