-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
The attached PDF cannot be parsed by pypdf - it returns ValueError: negative seek value -1. Chrome & MacOS' Preview open the PDF without any issue. pdf-online's validator checks the PDF as faulty:
Compliance | pdf1.2
-- | --
Result | Document does not conform to PDF/A.
Details | Validating file "malformed4.pdf" for conformance level pdf1.2
The file trailer dictionary is missing or invalid.
The key Count is required but missing.
The value of the key Count is 0 but must be 39.
The value of the key Count is 0 but must be 27.
The value of the key Count is 0 but must be 6.
The document does not conform to the requested standard.
The file format (header, trailer, objects, xref, streams) is corrupted.
The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.).
The document does not conform to the PDF 1.2 standard.
Done.
It would be nice if this file was parsed, but IMO at least if it can't be parsed, it should return a malformed error.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
macOS-15.3.1-arm64-arm-64bit
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('cryptography', '43.0.0'), PIL=noneCode + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader
PdfReader("./bible.pdf")Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "/Users/shachar/repositories/sample/test.py", line 3, in <module>
PdfReader("./bible.pdf")
File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 133, in __init__
self._initialize_stream(stream)
File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 155, in _initialize_stream
self.read(stream)
File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 613, in read
xref_issue_nr = self._get_xref_issues(stream, startxref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 1037, in _get_xref_issues
stream.seek(startxref - 1, 0) # -1 to check character before
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: negative seek value -1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness