-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-uncaught-exceptionUse this label only for issues caused by broken PDF documents that cannot be recovered.Use this label only for issues caused by broken PDF documents that cannot be recovered.
Description
Hi!
I've found IndexError when pdf file is relatively large. Necessary information is provided below.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.15.0-56-generic-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.0.1, crypt_provider=('cryptography', '3.1'), PIL=nonecommit 8e1799e
Code + PDF
This is a minimal, complete example that shows the issue:
#! /usr/bin/env python3
import pypdf
from pypdf.errors import EmptyFileError, PdfReadError, PdfStreamError
import sys
def TestOneInput(fname):
try:
pdf_reader = pypdf.PdfReader(fname)
for page_number, page in enumerate(pdf_reader.pages):
page.extract_text()
except (EmptyFileError, PdfReadError, PdfStreamError):
pass
if __name__ == "__main__":
if len(sys.argv) < 2:
exit(1)
TestOneInput(sys.argv[1])PoC
crash-e8a85d82de01cab5eb44e7993304d8b9d1544970.pdf
Traceback
This is the complete stderr I see:
entry <entry> in Xref table invalid but object found
...
entry <entry> in Xref table invalid; object not found
Traceback (most recent call last):
File "/fuzz/./poc.py", line 18, in <module>
TestOneInput(sys.argv[1])
File "/fuzz/./poc.py", line 9, in TestOneInput
pdf_reader = pypdf.PdfReader(fname)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 132, in __init__
self._initialize_stream(stream)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 154, in _initialize_stream
self.read(stream)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 615, in read
self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 871, in _read_xref_tables_and_trailers
startxref = self._read_xref(stream)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 910, in _read_xref
self._read_standard_xref_table(stream)
File "/usr/local/lib/python3.9/dist-packages/pypdf/_reader.py", line 781, in _read_standard_xref_table
while line[0] in b"\x0D\x0A":
IndexError: index out of range
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-uncaught-exceptionUse this label only for issues caused by broken PDF documents that cannot be recovered.Use this label only for issues caused by broken PDF documents that cannot be recovered.