-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
dataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Description
Since #7620, we've seen a few instances where users have gotten burned by root-task overproduction (see dask/distributed#5555, dask/distributed#5223 for background) because certain DataFrame optimizations still use low-level graphs, and therefore aren't getting fused anymore. Examples:
- Suboptimal graph structure when read-writing a parquet #8445
partition_infoinmap_partitionsmaterializes the graph unnecessarily #8309- Use Blockwise/
map_partitionsin various DataFrame join methods #8306
We do want to get everything to Blockwise eventually, but our bandwith to track these down and fix them is limited. In the interim, I propose that by default, we still do low-level fusion when any of the layers in the graph are materialized.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.