MAINT: Simplify file identifiers generation#2003
Open
exiledkingcc wants to merge 9 commits intopy-pdf:mainfrom
Open
MAINT: Simplify file identifiers generation#2003exiledkingcc wants to merge 9 commits intopy-pdf:mainfrom
exiledkingcc wants to merge 9 commits intopy-pdf:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2003 +/- ##
==========================================
- Coverage 95.14% 95.13% -0.02%
==========================================
Files 51 51
Lines 8551 8551
Branches 1706 1707 +1
==========================================
- Hits 8136 8135 -1
Misses 261 261
- Partials 154 155 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
MartinThoma
reviewed
Jul 29, 2023
| return ByteStringObject(_rolling_checksum(stream).encode("utf8")) | ||
| def _compute_document_identifier(self) -> ByteStringObject: | ||
| md5 = hashlib.md5() | ||
| md5.update(str(time.time()).encode("utf-8")) |
Member
There was a problem hiding this comment.
This makes document-generation non-deterministic, right?
9092a14 to
5fd1e91
Compare
MartinThoma
added a commit
that referenced
this pull request
Dec 23, 2023
See #2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma
added a commit
that referenced
this pull request
Dec 23, 2023
See #2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma
reviewed
Dec 23, 2023
MartinThoma
added a commit
that referenced
this pull request
Dec 23, 2023
#2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma
added a commit
that referenced
this pull request
Dec 23, 2023
#2003 Co-authored-by: exiledkingcc <exiledkingcc@gmail.com>
MartinThoma
reviewed
Dec 23, 2023
| else: | ||
| id1 = self._compute_document_identifier() | ||
| id2 = id1 | ||
| id2 = ByteStringObject(id1.original_bytes) |
Member
There was a problem hiding this comment.
id1 is a ByteStringObject already. So .original_bytes just returns id1. Then wrapping it in ByteStringObject doesn't do anything, right?
MartinThoma
reviewed
Dec 23, 2023
| return ByteStringObject(_rolling_checksum(stream).encode("utf8")) | ||
| md5 = hashlib.md5() | ||
| md5.update(str(time.time()).encode("utf-8")) | ||
| md5.update(str(self.fileobj).encode("utf-8")) |
Member
There was a problem hiding this comment.
Is self.fileobj equivalent to self._write_pdf_structure(stream)?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The PDF standard says:
the identifiers are also be used for encryption.
Why this PR improves pypdf:
PdfWriter.encryptis called, the identifiers are genearated by uncrypted pdf stream,then
PdfWriter.writecalled, the content of pdf file is encrypted, so the hash changed.for encrypted pdf, identifiers must be generated before write to stream, since the identifier will be used to calculate the key,
so the identifiers cannot be the hash of pdf stream content.