Skip to content

Add set of tasks that should not be added to stack in order#3303

Merged
mrocklin merged 1 commit intodask:masterfrom
mrocklin:order-sen
Mar 21, 2018
Merged

Add set of tasks that should not be added to stack in order#3303
mrocklin merged 1 commit intodask:masterfrom
mrocklin:order-sen

Conversation

@mrocklin
Copy link
Member

Previously we would re-add a task to the stack many times if it had many dependencies.
We now maintain a set of tasks that should not be re-added and check it.
This results in a significant reduction of costs in order in cases where a single output has
many input dependencies.

Real-world cause is here: pangeo-data/pangeo#150 (comment)

I don't know of a nice way to test this. This is one of those situations where having benchmarks directly within the repository would be convenient.

  • Tests added / passed
  • Passes flake8 dask
  • Fully documented, including docs/source/changelog.rst for all changes
    and one of the docs/source/*-api.rst files for new API

@rabernat
Copy link
Contributor

That looks like a pretty simple fix! I will try to find time to test this branch on my real-world problem within the next few days

@mrocklin
Copy link
Member Author

Merging this tomorrow if there are no further comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants