Skip to content

RecursionError when adding a page to the writer (Length references own object) #3112

@BLeQuerrec

Description

@BLeQuerrec

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.6.72-1-lts-x86_64-with-glibc2.40

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none

Tested with python 3.11 and 3.13.

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
from pypdf import PdfWriter
reader = PdfReader('pdf.pdf')
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

PDF file that crashes: https://www.orne.gouv.fr/contenu/telechargement/19233/154986/file/Sp%C3%A9cial%20n%C2%B0%2015%20du%20jeudi%2022%20f%C3%A9vrier%202024.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 583, in read_from_stream
    key = read_object(stream, pdf)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 1456, in read_object
    return NameObject.read_from_stream(stream, pdf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 858, in read_from_stream
    name += read_until_regex(stream, NameObject.delimiter_pattern)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_utils.py", line 240, in read_until_regex
    m = regex.search(name + tok)
        ^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded while calling a Python object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/tmp.pbkfI82lWz/test.py", line 11, in <module>
    writer.add_page(page)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_writer.py", line 574, in add_page
    return self._add_page(page, len(self.flattened_pages), excluded_keys)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_writer.py", line 487, in _add_page
    "PageObject", page_org.clone(self, False, excluded_keys).get_object()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
    obj.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 348, in clone
    obj = self.get_object()
          ^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 368, in get_object
    return self.pdf.get_object(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_reader.py", line 450, in get_object
    retval = read_object(self.stream, self)  # type: ignore
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 1462, in read_object
    return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 643, in read_from_stream
    length = pdf.get_object(length)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_reader.py", line 450, in get_object
    retval = read_object(self.stream, self)  # type: ignore

...

Full traceback attached (too long for Github)

traceback.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfReaderThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessworkflow-mergeFrom a users perspective, merging is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions