Skip to content

partition_info in map_partitions materializes the graph unnecessarily #8309

@gjoseph92

Description

@gjoseph92

Using a partition_info kwarg in your map_partitions function forces the entire blockwise graph to be materialized. This can be slow, lead to slower scheduling, and eliminates optimization opportunities.

The way the injection of partition_info is currently implemented is a bit convoluted, involving manipulation of individual tasks within the materialized graph.

Since we now have the BlockwiseDep and BlockwiseDepDict interface for just this sort of purpose (including auxiliary information into a blockwise graph; concept described in #7513), this could be done more simply. The main challenge is just that it needs to be given as a kwarg, and we don't currently have the infrastructure in map_partitions to pass blockwise-y things as kwargs (xref #8308), so it may require an odd dance with a wrapper function.

cc @rjzamora @ian-r-rose

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions