Skip to content

Array optimization: cull before fuse_roots#8560

Draft
gjoseph92 wants to merge 1 commit intodask:mainfrom
gjoseph92:array-optimize-cull-before-fuse
Draft

Array optimization: cull before fuse_roots#8560
gjoseph92 wants to merge 1 commit intodask:mainfrom
gjoseph92:array-optimize-cull-before-fuse

Conversation

@gjoseph92
Copy link
Copy Markdown
Collaborator

Because fuse_roots materializes Blockwise layers, if you have a situation where you're sub-selecting a few items out of a large initial Blockwise array, culling the graph (cheap) before materializing it (slow) can be massively more performant. As more becomes Blockwise (#7417), this will become more relevant. For example:

a = da.from_zarr("s3://10-trillion-chunk-dataset/a")
b = da.from_zarr("s3://10-trillion-chunk-dataset/b")
metric = a / b
tiny_part = metric[:10]
# Without this PR, optimizing `tiny_part` could require materializing the 10-trillion-chunk
# `from_array` graphs, just to cull them down to nothing a moment later

I doubt this is exactly correct—maybe we need to cull again after fuse_roots? (Culling is cheap but not that cheap; I'd like to not do it twice if possible.)

Because it materializes, I also think fuse_roots should perhaps go after the if config.get("optimization.fuse.active") is False: return dsk bailout.

Frankly, I don't understand fuse_roots. The situation described in #5451 (comment), and the test for it, are already handled by plain Blockwise fusion. In fact, if I remove fuse_roots entirely, test_fuse_roots still passes!

cc @rjzamora @ian-r-rose

  • Tests added / passed
  • Passes pre-commit run --all-files

Because `fuse_roots` materializes Blockwise layers, if you have a situation where you're sub-selecting a few items out of a large initial Blockwise array, culling the graph (cheap) before materializing it (slow) can be massively more performant.
@rjzamora
Copy link
Copy Markdown
Member

rjzamora commented Jul 14, 2022

Although I’d like to remove HighLevelGraph.cull completely in the medium-to-long term (along the lines of #9216), I do think it makes sense to address the outstanding issue that: (1) fuse_roots materializes Blockwise layers, and (2) the specific cases that benefit from fuse_roots are poorly understood and not even captured by CI.

I spent some time exploring fuse_roots myself, and I think I agree that the optimization should probably come after the "optimization.fuse.active” bailout, because it is indeed materializing the graph for the purpose of low-level fusion. Also, most of the cases it is designed to target should be covered by Blockwise IO or general low-level fusion.

One case where fuse_roots is currently important is dask.dataframe’s from_delayed. With that said, I don’t see any reason why we couldn’t implement a simpler delayed-specific optimization for this (one that doesn’t materialize the target Blockwise Layer).

@gjoseph92
Copy link
Copy Markdown
Collaborator Author

I don’t see any reason why we couldn’t implement a simpler delayed-specific optimization for this (one that doesn’t materialize the target Blockwise Layer)

I'm all in favor of removing fuse_roots entirely and doing this instead!

@rjzamora
Copy link
Copy Markdown
Member

I'm all in favor of removing fuse_roots entirely and doing this instead!

I took a rough pass at something like this in #9273

@gjoseph92 gjoseph92 mentioned this pull request Aug 12, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants