Change the active-fusion default to False for Dask-Dataframe#7620
Change the active-fusion default to False for Dask-Dataframe#7620jrbourbeau merged 9 commits intodask:mainfrom
Conversation
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @rjzamora! FWIW I'm running the distributed test suite against this PR over in https://github.com/jrbourbeau/distributed/tree/test-dask-7620. Are there any other tests we should run before merging this?
| ) | ||
|
|
||
| with pytest.raises(ValueError): | ||
| with pytest.raises((ValueError, RuntimeWarning)): |
There was a problem hiding this comment.
Why did we start raising a RuntimeWarning?
There was a problem hiding this comment.
Right - Good question. We seem to be taking a different code path with fusion disabled, so we are running into a true-division RunTimeWarning rather than a "No non-trivial arrays found" ValueError. I'll poke around a bit before taking a stance on whether or not this is a problem :)
There was a problem hiding this comment.
I think this is the only lingering thing to address. Any insight into why we started raising a RuntimeWarning? I'm also happy to look into it
|
Just a heads up that the |
|
Thanks James! Looking into the EDIT: I don't have a great understanding of the root cause yet, but the failure is definitely set off by a lack of low-level fusion in the |
|
@jrbourbeau - I seem to have all distributed tests passing with the latest commit. However, I am still not sure why/how the |
|
Update: It seems that the root cause was that the output |
| else: | ||
| # Perform Blockwise optimizations for HLG input | ||
| dsk = optimize_dataframe_getitem(dsk, keys=keys) | ||
| dsk = optimize_blockwise(dsk, keys=keys) | ||
| dsk = fuse_roots(dsk, keys=keys) |
There was a problem hiding this comment.
Just for my own understanding, we're adding the else block here because none of these optimizations will apply to a HLG which is only MaterializedLayers?
There was a problem hiding this comment.
Yes - Thanks for calling this out. My understadning is that these optimizations only make sense for Blockwise Layers.
Now that #7415 is merged, low-level fusion is unnecessary for most Dask-Dataframe workflows (parquet, csv, and orc leverage Blockwise fusion, and other IO mechanisms are on their way to Blockwise). This PR changes the configuration default to be
Nonefor"optimization.fuse.active". Indask.dataframe.optimize, aNoneconfiguration setting will be treated asFalse, while everywhere else theNonesetting will be treated asTrue.To clarify: This PR should only change fusion behavior for Dask-Dataframe.