PI: Fix O(n²) performance in NameObject read/write#3679
Merged
stefan6419846 merged 4 commits intopy-pdf:mainfrom Mar 12, 2026
Merged
PI: Fix O(n²) performance in NameObject read/write#3679stefan6419846 merged 4 commits intopy-pdf:mainfrom
stefan6419846 merged 4 commits intopy-pdf:mainfrom
Conversation
b68a96a to
2b7d583
Compare
2b7d583 to
f629056
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3679 +/- ##
=======================================
Coverage 97.39% 97.39%
=======================================
Files 55 55
Lines 9964 9977 +13
Branches 1829 1830 +1
=======================================
+ Hits 9704 9717 +13
Misses 151 151
Partials 109 109 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
Author
|
Also added exponential chunk growth in And also added more tests for |
…ames Three functions had quadratic behavior that caused hangs on PDFs with extremely long Name objects (e.g. repeatedly mis-encoded UTF-8 names): - read_until_regex: searched entire accumulated buffer on each 16-byte chunk instead of only the new chunk, and used bytes concatenation - NameObject.unnumber: rebuilt entire bytes object on each # replacement - NameObject.renumber: used out += concatenation in a loop
TST: Add read_until_regex coverage tests
885b246 to
4e74f71
Compare
stefan6419846
approved these changes
Mar 12, 2026
stefan6419846
added a commit
that referenced
this pull request
Mar 15, 2026
## What's new ### New Features (ENH) - Expose /Perms verification result on Encryption object (#3672) by @costajohnt ### Performance Improvements (PI) - Fix O(n²) performance in NameObject read/write (#3679) by @dmitry-kostin - Batch-parse all objects in ObjStm on first access (#3677) by @dmitry-kostin ### Bug Fixes (BUG) - Avoid sharing array-based content streams between pages (#3681) by @stefan6419846 - Avoid accessing invalid page when inserting blank page under some conditions (#3529) by @j-t-1 [Full Changelog](6.8.0...6.9.0)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3678
read_until_regex: search only new chunk instead of rescanning entire buffer; use list accumulation instead of bytes concatenationNameObject.unnumber: use bytearray instead of rebuilding bytes on each#xxreplacementNameObject.renumber: useparts.append()+join()instead ofout +=A real-world PDF with a ~786KB pathologically encoded name (262,144 hex escapes from repeated UTF-8 mis-encoding) went from hanging indefinitely to completing in ~3 seconds.