Array optimization: cull before `fuse_roots` by gjoseph92 · Pull Request #8560 · dask/dask

gjoseph92 · 2022-01-12T01:39:11Z

Because fuse_roots materializes Blockwise layers, if you have a situation where you're sub-selecting a few items out of a large initial Blockwise array, culling the graph (cheap) before materializing it (slow) can be massively more performant. As more becomes Blockwise (#7417), this will become more relevant. For example:

a = da.from_zarr("s3://10-trillion-chunk-dataset/a")
b = da.from_zarr("s3://10-trillion-chunk-dataset/b")
metric = a / b
tiny_part = metric[:10]
# Without this PR, optimizing `tiny_part` could require materializing the 10-trillion-chunk
# `from_array` graphs, just to cull them down to nothing a moment later

I doubt this is exactly correct—maybe we need to cull again after fuse_roots? (Culling is cheap but not that cheap; I'd like to not do it twice if possible.)

Because it materializes, I also think fuse_roots should perhaps go after the if config.get("optimization.fuse.active") is False: return dsk bailout.

Frankly, I don't understand fuse_roots. The situation described in #5451 (comment), and the test for it, are already handled by plain Blockwise fusion. In fact, if I remove fuse_roots entirely, test_fuse_roots still passes!

cc @rjzamora @ian-r-rose

Tests added / passed
Passes pre-commit run --all-files

Because `fuse_roots` materializes Blockwise layers, if you have a situation where you're sub-selecting a few items out of a large initial Blockwise array, culling the graph (cheap) before materializing it (slow) can be massively more performant.

rjzamora · 2022-07-14T03:31:35Z

Although I’d like to remove HighLevelGraph.cull completely in the medium-to-long term (along the lines of #9216), I do think it makes sense to address the outstanding issue that: (1) fuse_roots materializes Blockwise layers, and (2) the specific cases that benefit from fuse_roots are poorly understood and not even captured by CI.

I spent some time exploring fuse_roots myself, and I think I agree that the optimization should probably come after the "optimization.fuse.active” bailout, because it is indeed materializing the graph for the purpose of low-level fusion. Also, most of the cases it is designed to target should be covered by Blockwise IO or general low-level fusion.

One case where fuse_roots is currently important is dask.dataframe’s from_delayed. With that said, I don’t see any reason why we couldn’t implement a simpler delayed-specific optimization for this (one that doesn’t materialize the target Blockwise Layer).

gjoseph92 · 2022-07-14T16:14:23Z

I don’t see any reason why we couldn’t implement a simpler delayed-specific optimization for this (one that doesn’t materialize the target Blockwise Layer)

I'm all in favor of removing fuse_roots entirely and doing this instead!

rjzamora · 2022-07-14T20:21:55Z

I'm all in favor of removing fuse_roots entirely and doing this instead!

I took a rough pass at something like this in #9273

github-actions bot added the array label Jan 12, 2022

gjoseph92 mentioned this pull request Jan 14, 2022

Culling massive Blockwise graphs is very slow, not constant-time #8570

Open

ian-r-rose mentioned this pull request Apr 21, 2022

[Experiment][WIP] Perform culling optimization before fuse_roots #8965

Draft

rjzamora mentioned this pull request Jul 14, 2022

Revise fuse_roots optimization to avoid Blockwise materialization #9273

Draft

gjoseph92 mentioned this pull request Aug 12, 2022

Blockwise da.store #9382

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Array optimization: cull before `fuse_roots`#8560

Array optimization: cull before `fuse_roots`#8560
gjoseph92 wants to merge 1 commit intodask:mainfrom
gjoseph92:array-optimize-cull-before-fuse

gjoseph92 commented Jan 12, 2022

Uh oh!

rjzamora commented Jul 14, 2022 •

edited

Loading

Uh oh!

gjoseph92 commented Jul 14, 2022

Uh oh!

rjzamora commented Jul 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gjoseph92 commented Jan 12, 2022

Uh oh!

rjzamora commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gjoseph92 commented Jul 14, 2022

Uh oh!

rjzamora commented Jul 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rjzamora commented Jul 14, 2022 •

edited

Loading