Use torch.save in _StorageBase.__reduce__#9184
Use torch.save in _StorageBase.__reduce__#9184mrocklin wants to merge 1 commit intopytorch:masterfrom
Conversation
Previously this used the ``.toliist`` method, which converted the storage object into a list of Python objects, and then sent those to pickle. For storgae objects of non-trivial size, this was very slow. Now we reuse the logic of the ``torch.save`` function to efficiently turn the Storage object into bytes, and send those instead. This reduces the semantic information (it's harder to interpret the bytes) but should be orders of magnitude more efficient when serializing data with the pickle protocol. For future work it would be nice to develop a mechanism to get a buffer of bytes out of a Storage object, and use that alongside the current ``from_buffer`` method. See pytorch#9168 for context
68f0c5f to
b2165f7
Compare
|
FWIW, tests pass except for the MyPy check, which seems to be a stalled build |
|
The performance difference is nice. This reduces the serialization time by a factor of 38 and the size by a factor of 2 This PRIn [1]: from torchvision.models.resnet import resnet18
...: import pickle
...: model = resnet18(pretrained=True)
...:
...:
In [2]: %time len(pickle.dumps(model))
CPU times: user 35.7 ms, sys: 16.3 ms, total: 52 ms
Wall time: 51.1 ms
Out[2]: 46838320
In [3]: 46838320 / 0.051 / 1e6 # MB/s
Out[3]: 918.39843137254910.4.0In [1]: from torchvision.models.resnet import resnet18
...: import pickle
...: model = resnet18(pretrained=True)
...:
...:
In [2]: %time len(pickle.dumps(model))
CPU times: user 1.74 s, sys: 180 ms, total: 1.92 s
Wall time: 1.92 s
Out[2]: 105331304
In [3]: 105331304 / 46838320
Out[3]: 2.248827541209847
In [4]: 1.92 / 0.0511
Out[4]: 37.573385518590996 |
|
this is great. I just want some eyes wrt backward-compatibility (if needed at all), so cc: @apaszke |
|
This definitely isn't backwards compatible with data serialized with pickle from previous versions. This is treating pickle more as a wire protocol than as an archival storage format (which is a general recommendation as well). |
|
It changes the format, but it won't break loading of older checkpoints (the constructor will still be able to take in lists so that's fine). Thanks a lot for the PR! |
|
Ah, good point. My pleasure.
…On Fri, Jul 6, 2018 at 7:55 AM Adam Paszke ***@***.***> wrote:
It changes the format, but it won't break loading of older checkpoints
(the constructor will still be able to take in lists so that's fine).
Thanks a lot for the PR!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9184 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszDiKF0eT0gcTxtudSFJERVYB2ChRks5uD1AZgaJpZM4VD2wT>
.
|
facebook-github-bot
left a comment
There was a problem hiding this comment.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Got here from the blog post, great work. Just a bit of pain I'd like to share regarding my experience with backward-compatible pickles:
|
|
One relatively low-impact way to achieve this would be to have def load(x, ...):
if isinstance(x, bytes):
return load(io.BytesIO(x))
# normal code for handling files
... |
Summary: Previously this used the ``.toliist`` method, which converted the storage object into a list of Python objects, and then sent those to pickle. For storage objects of non-trivial size, this was very slow. Now we reuse the logic of the ``torch.save`` function to efficiently turn the Storage object into bytes, and send those instead. This reduces the semantic information (it's harder to interpret the bytes) but should be orders of magnitude more efficient when serializing data with the pickle protocol or with copy For future work it would be nice to develop a mechanism to get a buffer of bytes out of a Storage object, and use that alongside the current ``from_buffer`` method. See pytorch#9168 for context Closes pytorch#9184 Differential Revision: D8747794 Pulled By: soumith fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
Previously this used the
.toliistmethod, which converted thestorage object into a list of Python objects, and then sent those to
pickle. For storage objects of non-trivial size, this was very slow.
Now we reuse the logic of the
torch.savefunction to efficientlyturn the Storage object into bytes, and send those instead. This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy
For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
from_buffermethod.See #9168 for context