PdfWriter.add_page() hangs on PDFs with pathologically encoded Name objects

PDFs produced by some Apple software can contain Name objects with repeatedly mis-encoded UTF-8 characters. For example, the German word "Hauptbeschäftigung" gets re-encoded multiple times, turning a 20-byte name into a ~786KB name with 262,144 hex escape sequences.

When calling `PdfWriter.add_page()` on such a PDF, three functions exhibit O(n²) behavior and effectively hang:

1. **`read_until_regex`** — re-scans the entire accumulated buffer (`name + tok`) on every 16-byte read
2. **`NameObject.unnumber`** — rebuilds the full bytes object on each `#xx` replacement
3. **`NameObject.renumber`** — uses `out +=` concatenation in a loop

pikepdf/QPDF handles the same file in milliseconds because its C++ parser reads names in a single O(n) pass.

The fix is straightforward: use list accumulation + `join()` instead of repeated concatenation in all three functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfWriter.add_page() hangs on PDFs with pathologically encoded Name objects #3678

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PdfWriter.add_page() hangs on PDFs with pathologically encoded Name objects #3678

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions