API: Raise EOFError when trying to load past the end of a .npy file#23105
API: Raise EOFError when trying to load past the end of a .npy file#23105seberg merged 4 commits intonumpy:mainfrom Phlogistique:patch-1
.npy file#23105Conversation
Currently, the following code:
```
import numpy as np
with open('foo.npy', 'wb') as f:
for i in range(np.random.randint(10)):
np.save(f, 1)
with open('foo.npy', 'rb') as f:
while True:
np.load(f)
```
Will raise:
```
ValueError: Cannot load file containing pickled data when allow_pickle=False
```
While there is no pickled data in the file.
.npy file.npy file
numpy/lib/npyio.py
Outdated
| EOFError | ||
| If the input file is empty or, when calling `np.load` multiple times | ||
| on the same file, if all data has already been read |
There was a problem hiding this comment.
From a bit of local testing, it seems you can only hit this when the input is a file handle instead of a string or path to the file (in which case numpy handles the file opening itself).
In the case of trying to np.load(f) where f is a path to an empty file, you already get an informative ValueError.
IME, np.load is intended to read from a file once - I've never seen np.load used in this way before, so I'm not sure what makes sense to support.
There was a problem hiding this comment.
People do concatenate NPY blocks in this way to make custom file formats. I'm fine with it.
There was a problem hiding this comment.
Yeah, I have seen it used (and maybe even used it myself a long time ago). The addition looks good to me, EOFError also seems right in this context.
Should the error have a message or is there a reason for it being better to forgo that?
Otherwise, this really should have a basic test to ensure continued support. Only in exceptions do we merge even bug-fixes without tests.
There was a problem hiding this comment.
Hmmm, one thing to consider: If we think it is likely that users are using try: np.load(f); except ValueError: pass to read everything currently, it might make sense to create our own EOFError subclass that also inherits from ValueError.
Maybe that is over the top, but we should probably also have a release note (and note that we are happy to add it in case someone notices).
There was a problem hiding this comment.
I would prefer a standard EOFError with a helpful message. That it fell into this codepath was a logic bug.
There was a problem hiding this comment.
@rossbar Using a file handle to concatenate multiple arrays is mentioned in an example in the docstring for np.save:
>>> with open('test.npy', 'wb') as f:
... np.save(f, np.array([1, 2]))
... np.save(f, np.array([1, 3]))
>>> with open('test.npy', 'rb') as f:
... a = np.load(f)
... b = np.load(f)
>>> print(a, b)
# [1 2] [1 3]
There was a problem hiding this comment.
https://stackoverflow.com/questions/51093723/numpy-load-result-file-saved-in-append-mode Here is evidence of people out there using try/except to detect np.load reaching EOF. People in this thread use bare except, but it is logical to think that some people out there are filtering on ValueError instead. That's what one developer in my organization did.
Deferring to you guys as to whether it would be better to create an error that inherits from EOFError and ValueError for bug-to-bug retrocompatibility.
There was a problem hiding this comment.
I don't have a strong opinion. Unless you/we expect libraries with more than a handful of users do this, I don't think it matters. It is easy to update in a backwards compatible way. Overall, I am not usually overly worried abot error type changes.
That said, if anyone actually complained about the change, I wouldn't hesitate to add the compatibility or just use ValueError.
There was a problem hiding this comment.
When allow_pickle=True, the (also unintended) error changes to OSError. There is no consistent behavior that we can really replicate. The behavior was just undefined. Let's take the opportunity to make things consistent and use the correct EOFError.
|
Release note and test added. |
seberg
left a comment
There was a problem hiding this comment.
LGTM, thanks. Just nitpicking to make sure docs build nicely. May just apply+merge soon.
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
|
Thanks @Phlogistique for the release notes and followups. Somehow CircleCI is unhappy and not building the docs, Ross remembered issues around it being activated on the fork. Changes look good so, if they start failing I will follow up. |
.npy file.npy file
Currently, the following code:
Will raise:
While there is no pickled data in the file.