Skip to content

Use torch.save in _StorageBase.__reduce__#9184

Closed
mrocklin wants to merge 1 commit intopytorch:masterfrom
mrocklin:storage-serialization
Closed

Use torch.save in _StorageBase.__reduce__#9184
mrocklin wants to merge 1 commit intopytorch:masterfrom
mrocklin:storage-serialization

Conversation

@mrocklin
Copy link
Contributor

@mrocklin mrocklin commented Jul 5, 2018

Previously this used the .toliist method, which converted the
storage object into a list of Python objects, and then sent those to
pickle. For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the torch.save function to efficiently
turn the Storage object into bytes, and send those instead. This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
from_buffer method.

See #9168 for context

Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storgae objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol.

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See pytorch#9168 for context
@mrocklin mrocklin force-pushed the storage-serialization branch from 68f0c5f to b2165f7 Compare July 5, 2018 17:22
@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 5, 2018

FWIW, tests pass except for the MyPy check, which seems to be a stalled build

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 5, 2018

The performance difference is nice. This reduces the serialization time by a factor of 38 and the size by a factor of 2

This PR

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 35.7 ms, sys: 16.3 ms, total: 52 ms
Wall time: 51.1 ms
Out[2]: 46838320

In [3]: 46838320 / 0.051 / 1e6  # MB/s
Out[3]: 918.3984313725491

0.4.0

In [1]: from torchvision.models.resnet import resnet18
   ...: import pickle
   ...: model = resnet18(pretrained=True)
   ...: 
   ...: 

In [2]: %time len(pickle.dumps(model))
CPU times: user 1.74 s, sys: 180 ms, total: 1.92 s
Wall time: 1.92 s
Out[2]: 105331304

In [3]: 105331304 / 46838320
Out[3]: 2.248827541209847

In [4]: 1.92 / 0.0511
Out[4]: 37.573385518590996

@soumith
Copy link
Collaborator

soumith commented Jul 6, 2018

this is great. I just want some eyes wrt backward-compatibility (if needed at all), so cc: @apaszke

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 6, 2018

This definitely isn't backwards compatible with data serialized with pickle from previous versions. This is treating pickle more as a wire protocol than as an archival storage format (which is a general recommendation as well).

@apaszke
Copy link
Contributor

apaszke commented Jul 6, 2018

It changes the format, but it won't break loading of older checkpoints (the constructor will still be able to take in lists so that's fine). Thanks a lot for the PR!

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 6, 2018 via email

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mrocklin mrocklin deleted the storage-serialization branch July 8, 2018 20:28
@immerrr
Copy link

immerrr commented Jul 25, 2018

Got here from the blog post, great work. Just a bit of pain I'd like to share regarding my experience with backward-compatible pickles:

  • The function that's supposed to read the pickled data should be public: if it gets renamed or moved or changed in a backward incompatible way, there's no way to achieve backward compatibility aside from patching the pickled stream. Hence it has the same requirements as public API as soon as a pickle having that function is written by a released version of the package. So creating it as "hidden" by an underscore prefix can lead to a false sense of freedom when dealing with it.
  • The serialization format should be extensible, sometimes just returning a (FORMAT, VALUE) tuple where the pickle deserializer function can decide how to interpret VALUE based on FORMAT in O(1) could help a lot down the road. Because if that's not the case, a backward compatible deserializer function must do isinstance checks or wait for an exception to be raised when trying one format after another.

@mrocklin
Copy link
Contributor Author

One relatively low-impact way to achieve this would be to have pytorch.load optionally accept bytestrings.

def load(x, ...):
    if isinstance(x, bytes):
        return load(io.BytesIO(x))
    # normal code for handling files
    ...

goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle.  For storage objects of non-trivial size, this was very slow.

Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead.  This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy

For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.

See pytorch#9168 for context
Closes pytorch#9184

Differential Revision: D8747794

Pulled By: soumith

fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants