Skip to content

ROB: Ignore EOF in NumberObject.read_from_stream#1505

Closed
rraval wants to merge 2 commits intopy-pdf:mainfrom
rraval:number-object-ignore-eof
Closed

ROB: Ignore EOF in NumberObject.read_from_stream#1505
rraval wants to merge 2 commits intopy-pdf:mainfrom
rraval:number-object-ignore-eof

Conversation

@rraval
Copy link
Copy Markdown
Contributor

@rraval rraval commented Dec 16, 2022

Use ignore_eof=True just like NameObject does, which is the only other caller to read_until_regex in this module.

This helps prevent arcane exceptions when trying to parse a number:

PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

The motivation is essentially identical to the change that introduced ignore_eof=True on NameObjects:
431ba70

@rraval
Copy link
Copy Markdown
Contributor Author

rraval commented Dec 16, 2022

Sadly I do not have a reproducing PDF that I can add to the testing corpus. This came up when exercising PdfMerger with a bunch of private files.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 16, 2022

Codecov Report

Base: 91.84% // Head: 92.03% // Increases project coverage by +0.19% 🎉

Coverage data is based on head (b1d7aa7) compared to base (e711846).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b1d7aa7 differs from pull request most recent head ec8b349. Consider uploading reports for the commit ec8b349 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1505      +/-   ##
==========================================
+ Coverage   91.84%   92.03%   +0.19%     
==========================================
  Files          33       32       -1     
  Lines        6202     5976     -226     
  Branches     1228     1163      -65     
==========================================
- Hits         5696     5500     -196     
+ Misses        326      312      -14     
+ Partials      180      164      -16     
Impacted Files Coverage Δ
PyPDF2/generic/_base.py 99.64% <100.00%> (ø)
pypdf/_reader.py
pypdf/generic/_fit.py
pypdf/generic/_outline.py
pypdf/types.py
pypdf/_codecs/zapfding.py
pypdf/_version.py
pypdf/_codecs/std.py
pypdf/_codecs/pdfdoc.py
pypdf/_codecs/__init__.py
... and 56 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rraval rraval force-pushed the number-object-ignore-eof branch from 7ec8040 to b4cb2fa Compare December 16, 2022 05:59
@rraval rraval changed the title Ignore EOF in NumberObject.read_from_stream ROB: Ignore EOF in NumberObject.read_from_stream Dec 16, 2022
Use `ignore_eof=True` just like `NameObject` does, which is the only other
caller to `read_until_regex` in this module.

This helps prevent arcane exceptions when trying to parse a number:

```
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly
```

The motivation is essentially identical to the change that introduced
`ignore_eof=True` on `NameObject`s:
py-pdf@431ba70
@rraval rraval force-pushed the number-object-ignore-eof branch from b1d7aa7 to c93f755 Compare December 29, 2022 02:19
@rraval
Copy link
Copy Markdown
Contributor Author

rraval commented Dec 29, 2022

Hi there, I've rebased the branch off the latest main to permit clean fast-forward merging. Hopefully this can get upstreamed soon.

Copy link
Copy Markdown
Member

@MasterOdin MasterOdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now all usages of read_until_regex override the default ignore_eof value of False, right? If we're going this route, should probably then flip the default value to True and then remove using the third argument in the two places...

rraval added a commit to rraval/PyPDF2 that referenced this pull request Dec 29, 2022
This was initially motivated by `NumberObject.read_from_stream`, which
was calling `read_until_regex` with the default value of
`ignore_eof=False` and thus raising exceptions like:

```
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly
```

py-pdf@431ba70
demonstrates a similar fix for `NameObject.read_from_stream`.

From discussion in py-pdf#1505, it was
realized that the change to `NumberObject.read_from_stream` had now made
ALL callers of `read_until_regex` pass `ignore_eof=True`. It's cleaner
to remove the parameter entirely and change the default behaviour.
@rraval
Copy link
Copy Markdown
Contributor Author

rraval commented Dec 29, 2022

So now all usages of read_until_regex override the default ignore_eof value of False, right? If we're going this route, should probably then flip the default value to True and then remove using the third argument in the two places...

I've opened a separate PR along these lines: #1521

Feel free to pick one and close out the other. I'm not familiar enough with this codebase to judge which is "better".

rraval added a commit to rraval/PyPDF2 that referenced this pull request Jan 9, 2023
This was initially motivated by `NumberObject.read_from_stream`, which
was calling `read_until_regex` with the default value of
`ignore_eof=False` and thus raising exceptions like:

```
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly
```

py-pdf@431ba70
demonstrates a similar fix for `NameObject.read_from_stream`.

From discussion in py-pdf#1505, it was
realized that the change to `NumberObject.read_from_stream` had now made
ALL callers of `read_until_regex` pass `ignore_eof=True`. It's cleaner
to remove the parameter entirely and change the default behaviour.
@MartinThoma
Copy link
Copy Markdown
Member

MartinThoma commented Jan 21, 2023

I'm closing this one as the parameter was now removed with #1521 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants