ENH: Rename resources deterministically in merge_page#1543
Merged
MartinThoma merged 5 commits intopy-pdf:mainfrom Jan 22, 2023
Merged
ENH: Rename resources deterministically in merge_page#1543MartinThoma merged 5 commits intopy-pdf:mainfrom
MartinThoma merged 5 commits intopy-pdf:mainfrom
Conversation
2108a7d to
97515fd
Compare
Codecov ReportBase: 91.81% // Head: 91.86% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1543 +/- ##
==========================================
+ Coverage 91.81% 91.86% +0.05%
==========================================
Files 33 33
Lines 6204 6207 +3
Branches 1229 1229
==========================================
+ Hits 5696 5702 +6
Misses 326 326
+ Partials 182 179 -3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
MasterOdin
reviewed
Jan 9, 2023
MartinThoma
added a commit
that referenced
this pull request
Jan 21, 2023
MartinThoma
added a commit
that referenced
this pull request
Jan 22, 2023
Member
|
Thank you @huonw for this PR and for adding good tests 🙏 It's now in If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html |
MartinThoma
added a commit
that referenced
this pull request
Jan 22, 2023
New Features (ENH): - Add page label support to PdfWriter (#1558) - Accept inline images with space before EI (#1552) - Add circle annotation support (#1556) - Add polygon annotation support (#1557) - Make merging pages produce a deterministic PDF (#1542, #1543) Bug Fixes (BUG): - Fix error in cmap extraction (#1544) - Remove erroneous assertion check (#1564) - Fix dictionary access of optional page label keys (#1562) Robustness (ROB): - Set ignore_eof=True for read_until_regex (#1521) Documentation (DOC): - Paper size (#1550) Developer Experience (DEV): - Fix broken combination of dependencies of docs.txt - Annotate tests appropriately (#1551) [Full Changelog](3.2.1...3.3.0)
Contributor
Author
|
Feel free to add me, thanks 😄 I've upgraded to 3.3.0 in our code, it works well. Thanks for merging and releasing! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes #1532 by adjusting the procedure used for renaming resources in
merge_page, so that resources that have the same name (but different contents) are renamed in a deterministic/reproducible way.When merging pages that both have a resource of the same name (say,
/Example) but different values, the resource from the second page will now be renamed to/Example-0. Previously, it would be renamed with a random UUIDv4 (e.g./Exampledbbbe7cb-5f34-4061-b863-41919b326b49) which would be different on every run, even if the inputs were identical.The renamed name may already exist if a PDF is carefully/maliciously crafted, in which case
/Example-1,/Example-2, etc. are tried until an appropriate name is found. If any of these options have the same value, that name/value is reused.The
/Example-0pattern is short and sweet, but can easily be changed, e.g. some options off the top of my head:/Example-pypdf-merged-0for explanation about where it comes from/Example-9c4beb01-0where the9c4beb01is some marker hardcoded into pypdf's code, to reduce the chance of a "normal" PDF requiringidx > 0(only truly maliciously crafted ones)