-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
This question/discussion is related to the ongoing HighLevelGraph (HLG) work in dask/dask and dask/distributed (and the clear need for documentation).
In order to establish a formal convention for the implementation of Layer objects, it is essential that we also establish a clear specification for task-serialization in Layer.__dask_distributed_pack__ and __dask_distributed_unpack__. That is, we need a clear guideline on what needs to be serialized on the client, and what can be serialized on the scheduler (when the full graph is being materialized). As pointed out in dask/dask#7650 , the existing Layer implementations approach task serialization in a variety of ways, which may be confusing to would-be HLG-Layer contributors.
In order to design and apply a dogmatic HLG-serialization approach, we need to decisively answer the question: Can the scheduler use any serialization logic that leverages pickle? (Note that it is already clear that the scheduler may not un-pickle anything, but it is not clear if the scheduler can rely on pickle at all)