Skip to content

Config array optimize to skip fusion and return a HLG#6751

Merged
jrbourbeau merged 3 commits intodask:masterfrom
madsbk:config_no_fusion_of_arrays
Oct 21, 2020
Merged

Config array optimize to skip fusion and return a HLG#6751
jrbourbeau merged 3 commits intodask:masterfrom
madsbk:config_no_fusion_of_arrays

Conversation

@madsbk
Copy link
Contributor

@madsbk madsbk commented Oct 20, 2020

This PR makes array optimize to exit early when optimization.fuse.active=False and return a high level graph.

  • Tests added / passed
  • Passes black dask / flake8 dask

@madsbk madsbk force-pushed the config_no_fusion_of_arrays branch from a9c472b to 42bc8d3 Compare October 20, 2020 10:41
@TomAugspurger
Copy link
Member

Is this supposed to be user-facing and last long term, or is it just a temporary debugging thing?

If it's user facing then we should add it to https://github.com/dask/dask/blob/master/dask/dask.yaml and https://github.com/dask/dask/blob/master/dask/dask-schema.yaml and our doc build will pick them up.

@madsbk
Copy link
Contributor Author

madsbk commented Oct 20, 2020

Is this supposed to be user-facing and last long term, or is it just a temporary debugging thing?

I think it should be long term. The option is already in dask.yaml and dask-schema.yaml. Basically, this PR makes Blockwise behave as DataFrame when disabling fusion: https://github.com/dask/dask/blob/master/dask/dataframe/optimize.py#L25

@madsbk madsbk marked this pull request as ready for review October 20, 2020 11:49
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sensible as we already have similar early-exit behavior for DataFrames and the fuse function in dask/optimization.py, thanks for the PR @madsbk! I pushed a small commit to move the new test to be alongside the existing array optimization tests.

One difference between the array and DataFrame case is that setting optimization.fuse.active to False for DataFrames just excludes low-level task fusion, while in the array case optimization.fuse.active = False excludes both low-level task fusion and an inline_functions optimization. This is probably okay, but something we should be aware of. #6083 seems like a good next step if we want to implement finer disabling options for different types of optimizations.

Additionally, as we build out our high-level graph optimizations (#5644) we might consider adding a config option to turn off all low-level graph optimizations -- which I think is what @madsbk is after here

@jrbourbeau
Copy link
Member

Also, all the test failures here look to be related to #6754 and not the changes in this PR

@madsbk
Copy link
Contributor Author

madsbk commented Oct 21, 2020

One difference between the array and DataFrame case is that setting optimization.fuse.active to False for DataFrames just excludes low-level task fusion, while in the array case optimization.fuse.active = False excludes both low-level task fusion and an inline_functions optimization. This is probably okay, but something we should be aware of. #6083 seems like a good next step if we want to implement finer disabling options for different types of optimizations.

Good point and I agree #6083 would be great!

@jrbourbeau jrbourbeau merged commit 46ef300 into dask:master Oct 21, 2020
kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
@madsbk madsbk deleted the config_no_fusion_of_arrays branch February 16, 2021 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants