Skip to content

Conversation

@driazati
Copy link
Contributor

@driazati driazati commented Aug 27, 2019

The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes PK\x03\x04 in a tensor, it gets recognized as a zipfile. See https://bugs.python.org/issue28494

This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copy python/cpython#5053 fix, but that seems like overkill.

Fixes #25214

Differential Revision: D17102516

The default implementation is lenient in that it recognizes a zipfile if
the magic number appears anywhere in the archive. So if someone has the
bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile.

This implementation only checks the first 4 bytes of the file for the
zip magic number.

Fixes #25214
@pytorchbot pytorchbot added the module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects label Aug 27, 2019
@driazati driazati requested review from apaszke and ezyang August 28, 2019 01:05
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs test unless it's hard to repro the bug. However, it sounds like one just needs to serialize a tensor with the magic numbers?

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Richard said, needs test. Lint failures are real too. Also, shouldn't we be using b'' strings?

@pytorchbot pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 28, 2019
@driazati
Copy link
Contributor Author

b'' doesn't do anything in Python 2, ord() takes care of turning things into bytes

@driazati driazati requested review from ezyang and zou3519 August 28, 2019 23:45
@facebook-github-bot
Copy link
Contributor

@driazati merged this pull request in 7a921ba.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.load issue on loading file created by torch.save

7 participants