-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-mergeFrom a users perspective, merging is the affected feature/workflowFrom a users perspective, merging is the affected feature/workflow
Description
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-6.6.72-1-lts-x86_64-with-glibc2.40
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=noneTested with python 3.11 and 3.13.
Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader
from pypdf import PdfWriter
reader = PdfReader('pdf.pdf')
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)PDF file that crashes: https://www.orne.gouv.fr/contenu/telechargement/19233/154986/file/Sp%C3%A9cial%20n%C2%B0%2015%20du%20jeudi%2022%20f%C3%A9vrier%202024.pdf
Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 583, in read_from_stream
key = read_object(stream, pdf)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 1456, in read_object
return NameObject.read_from_stream(stream, pdf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 858, in read_from_stream
name += read_until_regex(stream, NameObject.delimiter_pattern)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_utils.py", line 240, in read_until_regex
m = regex.search(name + tok)
^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded while calling a Python object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/tmp.pbkfI82lWz/test.py", line 11, in <module>
writer.add_page(page)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_writer.py", line 574, in add_page
return self._add_page(page, len(self.flattened_pages), excluded_keys)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_writer.py", line 487, in _add_page
"PageObject", page_org.clone(self, False, excluded_keys).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
v.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
obj.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
v.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
obj.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
v.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
obj.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
v.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 356, in clone
obj.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 307, in clone
d__._clone(self, pdf_dest, force_duplicate, ignore_fields, visited)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 419, in _clone
v.clone(pdf_dest, force_duplicate, ignore_fields)
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 348, in clone
obj = self.get_object()
^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_base.py", line 368, in get_object
return self.pdf.get_object(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_reader.py", line 450, in get_object
retval = read_object(self.stream, self) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 1462, in read_object
return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 643, in read_from_stream
length = pdf.get_object(length)
^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp.pbkfI82lWz/lib/python3.11/site-packages/pypdf/_reader.py", line 450, in get_object
retval = read_object(self.stream, self) # type: ignore
...
Full traceback attached (too long for Github)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-mergeFrom a users perspective, merging is the affected feature/workflowFrom a users perspective, merging is the affected feature/workflow