Skip to content

BUG: Infinite loop from circular /Prev xref references #3654

@rampageservices

Description

@rampageservices

Description

PdfReader() enters an infinite loop when opening a PDF with circular /Prev references in the cross-reference (xref) chain. The process hangs at 100% CPU, spamming "Overwriting cache for 0 129" warnings indefinitely.

Environment

  • pypdf version: 6.7.1 (latest)
  • Python: 3.11
  • OS: Linux

Reproducer

The attached PDF (FPC-05F-22PH20.pdf, 300KB, an LCSC component datasheet) triggers the issue:

from pypdf import PdfReader

# This hangs forever, printing "Overwriting cache for 0 129" in a loop
reader = PdfReader("FPC-05F-22PH20.pdf")

The process never completes and must be killed.

Root Cause

_read_xref_tables_and_trailers() in _reader.py has a while startxref is not None loop that follows /Prev pointers in the xref chain. If a malformed PDF has circular /Prev references (xref A → xref B → xref A), the loop runs forever. Each iteration re-parses and re-caches the same objects, triggering the "Overwriting cache" warning.

There is no visited-set guard to detect that an xref offset has already been processed.

Proposed Fix

Track visited xref offsets in a set. If startxref has already been seen, log a warning and break the loop:

visited_xref_offsets: set[int] = set()
while startxref is not None:
    if startxref in visited_xref_offsets:
        logger_warning(
            f"Circular xref chain detected at offset {startxref}, stopping",
            __name__,
        )
        break
    visited_xref_offsets.add(startxref)
    # ... rest of loop

This is a minimal, targeted fix — same pattern used elsewhere in pypdf for cycle detection (e.g., _known_objects in get_object()).

I have a PR ready: https://github.com/rampageservices/pypdf/tree/fix/circular-xref-infinite-loop

Impact

  • Denial of service: Any application using pypdf to process untrusted PDFs (web services, document processors) can be hung indefinitely by a crafted PDF
  • Similar to: CVE-2026-24688 (infinite loop in outlines), GHSA-hm9v-vj3r-r55m (infinite loop in read_object) — same class of vulnerability, different code path

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions