BUG: Non-deterministic accidental object reuse by sjoerdjob · Pull Request #1995 · py-pdf/pypdf

sjoerdjob · 2023-07-21T07:47:32Z

The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead.

writer = PdfWriter()

reader1 = PdfReader(some_file)
id_reader1 = id(reader1)
writer.add_page(reader1.pages[0])
del reader1

reader2 = PdfReader(other_file)
id_reader2 = id(reader2)
writer.add_page(reader2.pages[0])
del reader2

writer.write(third_file)

because the reader1 is no longer in memory when reader2 gets initialized, the area in memory is free, so id_reader1 and id_reader2 might end up having the same value.

Due to PyPDF using id(reader) internally for an object-cache, it sometimes happened that writer.add_page(reader2.pages[0]) would result in duplicating reader1.pages[0] instead.

fixes #1788 .

The following section of code could sometimes have unintended effects where pages of an earlier integrated PDF file were used instead. ``` writer = PdfWriter() reader1 = PdfReader(some_file) id_reader1 = id(reader1) writer.add_page(reader1.pages[0]) del reader1 reader2 = PdfReader(other_file) id_reader2 = id(reader2) writer.add_page(reader2.pages[0]) del reader2 writer.write(third_file) ``` because the `reader1` is no longer in memory when `reader2` gets initialized, the area in memory is free, so `id_reader1` and `id_reader2` might end up having the same value. Due to PyPDF using `id(reader)` internally for an object-cache, it sometimes happened that `writer.add_page(reader2.pages[0])` would result in duplicating `reader1.pages[0]` instead.

Using a WeakKeyDictionary was an implementation detail that does not have to be matched in the type.

MartinThoma · 2023-08-13T07:19:34Z

@sjoerdjob Your PR cannot be merged like this as the CI fails:

pypdf/_protocols.py:68: error: Name "WeakKeyDictionary" is not defined

There are now also a couple of merge conflicts.

pubpub-zz · 2023-08-13T07:38:34Z

@MartinThoma this looks similar (different approach) to a PR you've already about random error. not sure this is still required

MartinThoma · 2023-10-08T09:48:09Z

#1788 was fixed via #1841. test_merging_many_temporary_files succeeds as well.

Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

#2244) Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

MartinThoma · 2023-10-08T09:54:22Z

We didn't have a test that could detect this type of issue, hence I added the test from this PR via #2244

@sjoerdjob Sorry that it took so long. I'm closing this PR now as the problem was fixed via #1841 and your test was added via #2244

MartinThoma · 2023-10-08T09:54:47Z

Thank you for your support 🤗 If you want, I'll add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)

sjoerdjob added 3 commits July 21, 2023 09:36

Change type to more plain type.

ad9ad60

Using a WeakKeyDictionary was an implementation detail that does not have to be matched in the type.

Switch to stringly-typed type.

5e21882

MartinThoma added a commit that referenced this pull request Oct 8, 2023

TST: Regression test against non-deterministic accidental object reuse

6641bda

Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

MartinThoma mentioned this pull request Oct 8, 2023

TST: Regression test against non-deterministic accidental object reuse #2244

Merged

MartinThoma added a commit that referenced this pull request Oct 8, 2023

TST: Regression test against non-deterministic accidental object reuse (

126f6be

#2244) Full credit to sjoerdjob for this contribution via #1995 See #1788 Co-authored-by: Sjoerd Job Postmus <sjoerdjob@sjec.nl>

MartinThoma closed this Oct 8, 2023

biredel mentioned this pull request Mar 29, 2024

BUG: Ambiguous translated references #2558

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Non-deterministic accidental object reuse#1995

BUG: Non-deterministic accidental object reuse#1995
sjoerdjob wants to merge 3 commits intopy-pdf:mainfrom
sjoerdjob:issue-1788

sjoerdjob commented Jul 21, 2023 •

edited by MartinThoma

Loading

Uh oh!

MartinThoma commented Aug 13, 2023

Uh oh!

pubpub-zz commented Aug 13, 2023

Uh oh!

MartinThoma commented Oct 8, 2023

Uh oh!

MartinThoma commented Oct 8, 2023

Uh oh!

MartinThoma commented Oct 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sjoerdjob commented Jul 21, 2023 • edited by MartinThoma Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MartinThoma commented Aug 13, 2023

Uh oh!

pubpub-zz commented Aug 13, 2023

Uh oh!

MartinThoma commented Oct 8, 2023

Uh oh!

MartinThoma commented Oct 8, 2023

Uh oh!

MartinThoma commented Oct 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sjoerdjob commented Jul 21, 2023 •

edited by MartinThoma

Loading