Use torch.save in _StorageBase.__reduce__ by mrocklin · Pull Request #9184 · pytorch/pytorch

mrocklin · 2018-07-05T13:10:14Z

Previously this used the .toliist method, which converted the
storage object into a list of Python objects, and then sent those to
pickle. For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the torch.save function to efficiently
turn the Storage object into bytes, and send those instead. This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
from_buffer method.

See #9168 for context

Previously this used the ``.toliist`` method, which converted the storage object into a list of Python objects, and then sent those to pickle. For storgae objects of non-trivial size, this was very slow. Now we reuse the logic of the ``torch.save`` function to efficiently turn the Storage object into bytes, and send those instead. This reduces the semantic information (it's harder to interpret the bytes) but should be orders of magnitude more efficient when serializing data with the pickle protocol. For future work it would be nice to develop a mechanism to get a buffer of bytes out of a Storage object, and use that alongside the current ``from_buffer`` method. See pytorch#9168 for context

mrocklin · 2018-07-05T18:45:32Z

FWIW, tests pass except for the MyPy check, which seems to be a stalled build

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

mrocklin · 2018-07-05T18:51:18Z

The performance difference is nice. This reduces the serialization time by a factor of 38 and the size by a factor of 2

This PR

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 35.7 ms, sys: 16.3 ms, total: 52 ms
Wall time: 51.1 ms
Out[2]: 46838320

In [3]: 46838320 / 0.051 / 1e6  # MB/s
Out[3]: 918.3984313725491

0.4.0

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 1.74 s, sys: 180 ms, total: 1.92 s
Wall time: 1.92 s
Out[2]: 105331304

In [3]: 105331304 / 46838320
Out[3]: 2.248827541209847

In [4]: 1.92 / 0.0511
Out[4]: 37.573385518590996

soumith · 2018-07-06T02:38:31Z

this is great. I just want some eyes wrt backward-compatibility (if needed at all), so cc: @apaszke

mrocklin · 2018-07-06T11:17:02Z

This definitely isn't backwards compatible with data serialized with pickle from previous versions. This is treating pickle more as a wire protocol than as an archival storage format (which is a general recommendation as well).

apaszke · 2018-07-06T11:54:47Z

It changes the format, but it won't break loading of older checkpoints (the constructor will still be able to take in lists so that's fine). Thanks a lot for the PR!

mrocklin · 2018-07-06T11:56:23Z

Ah, good point. My pleasure.

…

On Fri, Jul 6, 2018 at 7:55 AM Adam Paszke ***@***.***> wrote: It changes the format, but it won't break loading of older checkpoints (the constructor will still be able to take in lists so that's fine). Thanks a lot for the PR! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9184 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszDiKF0eT0gcTxtudSFJERVYB2ChRks5uD1AZgaJpZM4VD2wT> .

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

immerrr · 2018-07-25T07:45:44Z

Got here from the blog post, great work. Just a bit of pain I'd like to share regarding my experience with backward-compatible pickles:

The function that's supposed to read the pickled data should be public: if it gets renamed or moved or changed in a backward incompatible way, there's no way to achieve backward compatibility aside from patching the pickled stream. Hence it has the same requirements as public API as soon as a pickle having that function is written by a released version of the package. So creating it as "hidden" by an underscore prefix can lead to a false sense of freedom when dealing with it.
The serialization format should be extensible, sometimes just returning a (FORMAT, VALUE) tuple where the pickle deserializer function can decide how to interpret VALUE based on FORMAT in O(1) could help a lot down the road. Because if that's not the case, a backward compatible deserializer function must do isinstance checks or wait for an exception to be raised when trying one format after another.

mrocklin · 2018-07-25T15:19:29Z

One relatively low-impact way to achieve this would be to have pytorch.load optionally accept bytestrings.

def load(x, ...):
    if isinstance(x, bytes):
        return load(io.BytesIO(x))
    # normal code for handling files
    ...

Summary: Previously this used the ``.toliist`` method, which converted the storage object into a list of Python objects, and then sent those to pickle. For storage objects of non-trivial size, this was very slow. Now we reuse the logic of the ``torch.save`` function to efficiently turn the Storage object into bytes, and send those instead. This reduces the semantic information (it's harder to interpret the bytes) but should be orders of magnitude more efficient when serializing data with the pickle protocol or with copy For future work it would be nice to develop a mechanism to get a buffer of bytes out of a Storage object, and use that alongside the current ``from_buffer`` method. See pytorch#9168 for context Closes pytorch#9184 Differential Revision: D8747794 Pulled By: soumith fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79

mrocklin requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 5, 2018 13:10

mrocklin mentioned this pull request Jul 5, 2018

Serialization of data within a tensor is slow #9168

Closed

mrocklin force-pushed the storage-serialization branch from 68f0c5f to b2165f7 Compare July 5, 2018 17:22

mrocklin mentioned this pull request Jul 5, 2018

Use case with deep-learning frameworks: prediction in parallel dask/dask-ml#281

Closed

apaszke approved these changes Jul 6, 2018

View reviewed changes

facebook-github-bot reviewed Jul 6, 2018

View reviewed changes

facebook-github-bot closed this in eadc507 Jul 6, 2018

mrocklin deleted the storage-serialization branch July 8, 2018 20:28

mrocklin mentioned this pull request Aug 2, 2018

ENH: Hyperband implementation dask/dask-ml#221

Merged

7 tasks

ezyang added the open source label Jun 24, 2019

jrhone mentioned this pull request Apr 27, 2020

Need hash func for Torch.Tensor streamlit/streamlit#1388

Closed

jakirkham mentioned this pull request Jul 13, 2020

Provide wrappers for popular ML libraries dask/dask-ml#696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use torch.save in _StorageBase.reduce#9184

Use torch.save in _StorageBase.reduce#9184
mrocklin wants to merge 1 commit intopytorch:masterfrom
mrocklin:storage-serialization

mrocklin commented Jul 5, 2018 •

edited

Loading

Uh oh!

mrocklin commented Jul 5, 2018

Uh oh!

mrocklin commented Jul 5, 2018

Uh oh!

soumith commented Jul 6, 2018

Uh oh!

mrocklin commented Jul 6, 2018 •

edited

Loading

Uh oh!

apaszke commented Jul 6, 2018

Uh oh!

mrocklin commented Jul 6, 2018 via email

Uh oh!

facebook-github-bot left a comment

Uh oh!

immerrr commented Jul 25, 2018 •

edited

Loading

Uh oh!

mrocklin commented Jul 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mrocklin commented Jul 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Jul 5, 2018

Uh oh!

mrocklin commented Jul 5, 2018

This PR

0.4.0

Uh oh!

soumith commented Jul 6, 2018

Uh oh!

mrocklin commented Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke commented Jul 6, 2018

Uh oh!

mrocklin commented Jul 6, 2018 via email

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

immerrr commented Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Jul 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mrocklin commented Jul 5, 2018 •

edited

Loading

mrocklin commented Jul 6, 2018 •

edited

Loading

immerrr commented Jul 25, 2018 •

edited

Loading