Skip to content

BUG: Fix Parsing of Inline Images#332

Closed
speedplane wants to merge 12 commits intopy-pdf:mainfrom
speedplane:master
Closed

BUG: Fix Parsing of Inline Images#332
speedplane wants to merge 12 commits intopy-pdf:mainfrom
speedplane:master

Conversation

@speedplane
Copy link
Copy Markdown

The inline image parser does not look for whitespace before the EI keyword as it should. Thus if you have a content stream as follows, the parser would crash:

BI [inline image dictionary]
ID
asfASF213ad>]asf
213lkasdf9as12EI
QsdkfjasdfkjfdiI
EI
Q

Notice the EI on one line and the Q on the following line occurs in two places. To properly check, we need to make sure the EI is preceded by white-space.

Also, added a protection against infinite loops in case the PDF is corrupt and the inline image never ends.

speedplane added 2 commits February 28, 2017 00:25
@vstoykov
Copy link
Copy Markdown
Contributor

vstoykov commented Jul 20, 2017

#331 is also implements protection against incorrect images. Also make parsing of inline images a lot faster.

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels Apr 6, 2022
Comment thread PyPDF2/filters.py Outdated
@MartinThoma
Copy link
Copy Markdown
Member

The current solution is not compatible with the recent BytesIO implementation. Do you mind to adjust your PR?

@MartinThoma MartinThoma added the needs-change The PR/issue cannot be handled as issue and needs to be improved label Apr 16, 2022
@speedplane
Copy link
Copy Markdown
Author

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

@MartinThoma
Copy link
Copy Markdown
Member

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

CI is failing:

image

@MartinThoma
Copy link
Copy Markdown
Member

@speedplane We made some pretty heavy changes to PyPDF2 recently. If you search for if tok2 == b"I": in generic.py, you can see the section that you adjusted. Do you want to adjust the PR / open a new PR?

Do you have an example PDF where this adjustment is necessary? Does it close one of the open issues?

@MartinThoma MartinThoma changed the title Fix Parsing of Inline Images BUG: Fix Parsing of Inline Images Jun 25, 2022
@MartinThoma MartinThoma added needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. and removed needs-change The PR/issue cannot be handled as issue and needs to be improved labels Jun 25, 2022
@MartinThoma MartinThoma added the needs-test A test should be added before this PR is merged. label Jul 24, 2022
@MartinThoma
Copy link
Copy Markdown
Member

It would help me a lot if we had an image that shows the described issue.

@speedplane
Copy link
Copy Markdown
Author

Sorry, this is all I have. I can't remember what this fixed or how it fixes it.

@MartinThoma
Copy link
Copy Markdown
Member

@speedplane The issue you addressed was fixed via #1327 .

May I add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ? Your PR was not merged, but you did make a valuable contribution with this PR. It was just me not being able to understand it at the time.

@MartinThoma MartinThoma closed this Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. needs-test A test should be added before this PR is merged. workflow-images From a users perspective, image handling is the affected feature/workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants