Skip to content

dumps_task in SimpleShuffleLayer and BroadcastJoinLayer unpack #7650

@gjoseph92

Description

@gjoseph92

Just a TODO that SimpleShuffleLayer and BroadcastJoinLayer both use dumps_task in their __dask_distributed_unpack__ methods, which leads to pickling on the scheduler, which we want to avoid.

Additionally, could this manual use of pickle.dumps in BroadcastJoinLayer.__dask_distributed_pack__ lead to a double-pickle scenario, where when the task actually gets executed, the kwarg values are bytestrings, not the actual unpickled values:

dask/dask/layers.py

Lines 663 to 671 in b3a8646

def __dask_distributed_pack__(self, *args, **kwargs):
import pickle
# Pickle complex merge_kwargs elements. Also
# tuples, which may be confused with keys.
_merge_kwargs = {}
for k, v in self.merge_kwargs.items():
if not isinstance(v, (str, list, bool)):
_merge_kwargs[k] = pickle.dumps(v)

Related to dask/distributed#4699.

cc @rjzamora @madsbk

Metadata

Metadata

Assignees

No one assigned

    Labels

    highlevelgraphIssues relating to HighLevelGraphs.needs attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions