Skip to content

pickle5 support #2495

@tjb900

Description

@tjb900

Hi! Not really an issue, just a query with the aim of avoiding duplicated work. Our use case for distributed involves keys whose values are often lists of or dicts of (lists of, or dicts of, etc) numpy arrays, and unfortunately the current serialization scheme - while fantastic for numpy arrays not embedded in other objects - does not handle these nested structures particularly well.

Now that numpy 1.16 is out, the pickle5 protocol, and its backport (https://github.com/pitrou/pickle5-backport) would seem to provide a very elegant solution to efficiently communicating these kinds of structures by passing the large data arrays as out-of-band data that doesn't need to be embedded into the pickle bytestream.

Is work already underway somewhere on some remote branch to integrate this into distributed? If not, would such a PR be welcome? or should we wait for more experienced hands to tackle it?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions