PI: Fix O(n²) performance in NameObject read/write by dmitry-kostin · Pull Request #3679 · py-pdf/pypdf

dmitry-kostin · 2026-03-10T18:22:07Z

read_until_regex: search only new chunk instead of rescanning entire buffer; use list accumulation instead of bytes concatenation
NameObject.unnumber: use bytearray instead of rebuilding bytes on each #xx replacement
NameObject.renumber: use parts.append() + join() instead of out +=

A real-world PDF with a ~786KB pathologically encoded name (262,144 hex escapes from repeated UTF-8 mis-encoding) went from hanging indefinitely to completing in ~3 seconds.

codecov · 2026-03-10T19:03:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.39%. Comparing base (cf2e518) to head (32d9771).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3679   +/-   ##
=======================================
  Coverage   97.39%   97.39%           
=======================================
  Files          55       55           
  Lines        9964     9977   +13     
  Branches     1829     1830    +1     
=======================================
+ Hits         9704     9717   +13     
  Misses        151      151           
  Partials      109      109

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dmitry-kostin · 2026-03-10T19:41:53Z

Also added exponential chunk growth in read_until_regex (16 → 32 → 64 → ... → 8192). Saves ~25% on top of the existing fix when benchmarked with add_page() on a large scanned PDF.

And also added more tests for read_until_regex since coverage there was pretty thin.

pypdf/_utils.py

tests/test_utils.py

…ames Three functions had quadratic behavior that caused hangs on PDFs with extremely long Name objects (e.g. repeatedly mis-encoded UTF-8 names): - read_until_regex: searched entire accumulated buffer on each 16-byte chunk instead of only the new chunk, and used bytes concatenation - NameObject.unnumber: rebuilt entire bytes object on each # replacement - NameObject.renumber: used out += concatenation in a loop

TST: Add read_until_regex coverage tests

@costajohnt

## What's new ### New Features (ENH) - Expose /Perms verification result on Encryption object (#3672) by @costajohnt ### Performance Improvements (PI) - Fix O(n²) performance in NameObject read/write (#3679) by @dmitry-kostin - Batch-parse all objects in ObjStm on first access (#3677) by @dmitry-kostin ### Bug Fixes (BUG) - Avoid sharing array-based content streams between pages (#3681) by @stefan6419846 - Avoid accessing invalid page when inserting blank page under some conditions (#3529) by @j-t-1 [Full Changelog](6.8.0...6.9.0)

dmitry-kostin force-pushed the fix-name-object-on2-perf branch from b68a96a to 2b7d583 Compare March 10, 2026 18:36

dmitry-kostin changed the title ~~Fix O(n²) hangs in NameObject read/write~~ PI: Fix O(n²) performance in NameObject read/write Mar 10, 2026

dmitry-kostin force-pushed the fix-name-object-on2-perf branch from 2b7d583 to f629056 Compare March 10, 2026 18:38

stefan6419846 reviewed Mar 11, 2026

View reviewed changes

pypdf/_utils.py Show resolved Hide resolved

stefan6419846 reviewed Mar 11, 2026

View reviewed changes

tests/test_utils.py Outdated Show resolved Hide resolved

dmitry-kostin added 3 commits March 11, 2026 14:12

PI: Exponential chunk growth in read_until_regex

645cb75

TST: Add read_until_regex coverage tests

Address review comments

4e74f71

dmitry-kostin force-pushed the fix-name-object-on2-perf branch from 885b246 to 4e74f71 Compare March 11, 2026 13:13

Add test for fixed tail overlap in read_until_regex

32d9771

dmitry-kostin requested a review from stefan6419846 March 11, 2026 13:22

stefan6419846 approved these changes Mar 12, 2026

View reviewed changes

stefan6419846 merged commit 3a4e913 into py-pdf:main Mar 12, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PI: Fix O(n²) performance in NameObject read/write#3679

PI: Fix O(n²) performance in NameObject read/write#3679
stefan6419846 merged 4 commits intopy-pdf:mainfrom
dmitry-kostin:fix-name-object-on2-perf

dmitry-kostin commented Mar 10, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

dmitry-kostin commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dmitry-kostin commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dmitry-kostin commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dmitry-kostin commented Mar 10, 2026 •

edited

Loading

codecov bot commented Mar 10, 2026 •

edited

Loading