BUG: Non-deterministic accidental object reuse#1995
Closed
sjoerdjob wants to merge 3 commits intopy-pdf:mainfrom
Closed
BUG: Non-deterministic accidental object reuse#1995sjoerdjob wants to merge 3 commits intopy-pdf:mainfrom
sjoerdjob wants to merge 3 commits intopy-pdf:mainfrom
Conversation
The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead. ``` writer = PdfWriter() reader1 = PdfReader(some_file) id_reader1 = id(reader1) writer.add_page(reader1.pages[0]) del reader1 reader2 = PdfReader(other_file) id_reader2 = id(reader2) writer.add_page(reader2.pages[0]) del reader2 writer.write(third_file) ``` because the `reader1` is no longer in memory when `reader2` gets initialized, the area in memory is free, so `id_reader1` and `id_reader2` might end up having the same value. Due to PyPDF using `id(reader)` internally for an object-cache, it sometimes happened that `writer.add_page(reader2.pages[0])` would result in duplicating `reader1.pages[0]` instead.
Using a WeakKeyDictionary was an implementation detail that does not have to be matched in the type.
Member
|
@sjoerdjob Your PR cannot be merged like this as the CI fails:
There are now also a couple of merge conflicts. |
Collaborator
|
@MartinThoma this looks similar (different approach) to a PR you've already about random error. not sure this is still required |
Member
Member
|
We didn't have a test that could detect this type of issue, hence I added the test from this PR via #2244 @sjoerdjob Sorry that it took so long. I'm closing this PR now as the problem was fixed via #1841 and your test was added via #2244 |
Member
|
Thank you for your support 🤗 If you want, I'll add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead.
because the
reader1is no longer in memory whenreader2gets initialized, the area in memory is free, soid_reader1andid_reader2might end up having the same value.Due to PyPDF using
id(reader)internally for an object-cache, it sometimes happened thatwriter.add_page(reader2.pages[0])would result in duplicatingreader1.pages[0]instead.fixes #1788 .