Skip to content

API: Raise EOFError when trying to load past the end of a .npy file#23105

Merged
seberg merged 4 commits intonumpy:mainfrom
Phlogistique:patch-1
Jan 27, 2023
Merged

API: Raise EOFError when trying to load past the end of a .npy file#23105
seberg merged 4 commits intonumpy:mainfrom
Phlogistique:patch-1

Conversation

@Phlogistique
Copy link
Copy Markdown
Contributor

Currently, the following code:

import numpy as np
with open('foo.npy', 'wb') as f:
    for i in range(np.random.randint(10)):
        np.save(f, 1)

with open('foo.npy', 'rb') as f:
    while True:
        np.load(f)

Will raise:

ValueError: Cannot load file containing pickled data when allow_pickle=False

While there is no pickled data in the file.

Currently, the following code:
```
import numpy as np
with open('foo.npy', 'wb') as f:
    for i in range(np.random.randint(10)):
        np.save(f, 1)

with open('foo.npy', 'rb') as f:
    while True:
        np.load(f)
```
Will raise:
```
ValueError: Cannot load file containing pickled data when allow_pickle=False
```
While there is no pickled data in the file.
@Phlogistique Phlogistique changed the title Raise EOFError instead when trying to load past the end of a .npy file Raise EOFError when trying to load past the end of a .npy file Jan 26, 2023
Comment on lines +330 to +332
EOFError
If the input file is empty or, when calling `np.load` multiple times
on the same file, if all data has already been read
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a bit of local testing, it seems you can only hit this when the input is a file handle instead of a string or path to the file (in which case numpy handles the file opening itself).

In the case of trying to np.load(f) where f is a path to an empty file, you already get an informative ValueError.

IME, np.load is intended to read from a file once - I've never seen np.load used in this way before, so I'm not sure what makes sense to support.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People do concatenate NPY blocks in this way to make custom file formats. I'm fine with it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have seen it used (and maybe even used it myself a long time ago). The addition looks good to me, EOFError also seems right in this context.

Should the error have a message or is there a reason for it being better to forgo that?

Otherwise, this really should have a basic test to ensure continued support. Only in exceptions do we merge even bug-fixes without tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, one thing to consider: If we think it is likely that users are using try: np.load(f); except ValueError: pass to read everything currently, it might make sense to create our own EOFError subclass that also inherits from ValueError.

Maybe that is over the top, but we should probably also have a release note (and note that we are happy to add it in case someone notices).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a standard EOFError with a helpful message. That it fell into this codepath was a logic bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rossbar Using a file handle to concatenate multiple arrays is mentioned in an example in the docstring for np.save:

    >>> with open('test.npy', 'wb') as f:
    ...     np.save(f, np.array([1, 2]))
    ...     np.save(f, np.array([1, 3]))
    >>> with open('test.npy', 'rb') as f:
    ...     a = np.load(f)
    ...     b = np.load(f)
    >>> print(a, b)
    # [1 2] [1 3] 

Copy link
Copy Markdown
Contributor Author

@Phlogistique Phlogistique Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/51093723/numpy-load-result-file-saved-in-append-mode Here is evidence of people out there using try/except to detect np.load reaching EOF. People in this thread use bare except, but it is logical to think that some people out there are filtering on ValueError instead. That's what one developer in my organization did.

Deferring to you guys as to whether it would be better to create an error that inherits from EOFError and ValueError for bug-to-bug retrocompatibility.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Unless you/we expect libraries with more than a handful of users do this, I don't think it matters. It is easy to update in a backwards compatible way. Overall, I am not usually overly worried abot error type changes.

That said, if anyone actually complained about the change, I wouldn't hesitate to add the compatibility or just use ValueError.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When allow_pickle=True, the (also unintended) error changes to OSError. There is no consistent behavior that we can really replicate. The behavior was just undefined. Let's take the opportunity to make things consistent and use the correct EOFError.

@Phlogistique
Copy link
Copy Markdown
Contributor Author

Release note and test added.

Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Just nitpicking to make sure docs build nicely. May just apply+merge soon.

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
@seberg
Copy link
Copy Markdown
Member

seberg commented Jan 27, 2023

Thanks @Phlogistique for the release notes and followups. Somehow CircleCI is unhappy and not building the docs, Ross remembered issues around it being activated on the fork. Changes look good so, if they start failing I will follow up.

@seberg seberg changed the title Raise EOFError when trying to load past the end of a .npy file API: Raise EOFError when trying to load past the end of a .npy file Jan 27, 2023
@seberg seberg merged commit 4149999 into numpy:main Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants