Skip to content

[DISCUSSION] Can the scheduler use pickle.dumps? #4890

@rjzamora

Description

@rjzamora

This question/discussion is related to the ongoing HighLevelGraph (HLG) work in dask/dask and dask/distributed (and the clear need for documentation).

In order to establish a formal convention for the implementation of Layer objects, it is essential that we also establish a clear specification for task-serialization in Layer.__dask_distributed_pack__ and __dask_distributed_unpack__. That is, we need a clear guideline on what needs to be serialized on the client, and what can be serialized on the scheduler (when the full graph is being materialized). As pointed out in dask/dask#7650 , the existing Layer implementations approach task serialization in a variety of ways, which may be confusing to would-be HLG-Layer contributors.

In order to design and apply a dogmatic HLG-serialization approach, we need to decisively answer the question: Can the scheduler use any serialization logic that leverages pickle? (Note that it is already clear that the scheduler may not un-pickle anything, but it is not clear if the scheduler can rely on pickle at all)

cc @madsbk @mrocklin @gjoseph92

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions