-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Description
PdfReader() enters an infinite loop when opening a PDF with circular /Prev references in the cross-reference (xref) chain. The process hangs at 100% CPU, spamming "Overwriting cache for 0 129" warnings indefinitely.
Environment
- pypdf version: 6.7.1 (latest)
- Python: 3.11
- OS: Linux
Reproducer
The attached PDF (FPC-05F-22PH20.pdf, 300KB, an LCSC component datasheet) triggers the issue:
from pypdf import PdfReader
# This hangs forever, printing "Overwriting cache for 0 129" in a loop
reader = PdfReader("FPC-05F-22PH20.pdf")The process never completes and must be killed.
Root Cause
_read_xref_tables_and_trailers() in _reader.py has a while startxref is not None loop that follows /Prev pointers in the xref chain. If a malformed PDF has circular /Prev references (xref A → xref B → xref A), the loop runs forever. Each iteration re-parses and re-caches the same objects, triggering the "Overwriting cache" warning.
There is no visited-set guard to detect that an xref offset has already been processed.
Proposed Fix
Track visited xref offsets in a set. If startxref has already been seen, log a warning and break the loop:
visited_xref_offsets: set[int] = set()
while startxref is not None:
if startxref in visited_xref_offsets:
logger_warning(
f"Circular xref chain detected at offset {startxref}, stopping",
__name__,
)
break
visited_xref_offsets.add(startxref)
# ... rest of loopThis is a minimal, targeted fix — same pattern used elsewhere in pypdf for cycle detection (e.g., _known_objects in get_object()).
I have a PR ready: https://github.com/rampageservices/pypdf/tree/fix/circular-xref-infinite-loop
Impact
- Denial of service: Any application using pypdf to process untrusted PDFs (web services, document processors) can be hung indefinitely by a crafted PDF
- Similar to: CVE-2026-24688 (infinite loop in outlines), GHSA-hm9v-vj3r-r55m (infinite loop in read_object) — same class of vulnerability, different code path