Skip to content

Negative seek values when reading file with two xref tables #3151

@nihohit

Description

@nihohit

The attached PDF cannot be parsed by pypdf - it returns ValueError: negative seek value -1. Chrome & MacOS' Preview open the PDF without any issue. pdf-online's validator checks the PDF as faulty:

Compliance | pdf1.2
-- | --
Result | Document does not conform to PDF/A.
Details | Validating file "malformed4.pdf" for conformance level pdf1.2
The file trailer dictionary is missing or invalid.
The key Count is required but missing.
The value of the key Count is 0 but must be 39.
The value of the key Count is 0 but must be 27.
The value of the key Count is 0 but must be 6.
The document does not conform to the requested standard.
The file format (header, trailer, objects, xref, streams) is corrupted.
The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.).
The document does not conform to the PDF 1.2 standard.
Done.

It would be nice if this file was parsed, but IMO at least if it can't be parsed, it should return a malformed error.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-15.3.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('cryptography', '43.0.0'), PIL=none

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

PdfReader("./bible.pdf")

bible.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/Users/shachar/repositories/sample/test.py", line 3, in <module>
    PdfReader("./bible.pdf")
  File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 133, in __init__
    self._initialize_stream(stream)
  File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 155, in _initialize_stream
    self.read(stream)
  File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 613, in read
    xref_issue_nr = self._get_xref_issues(stream, startxref)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shachar/repositories/sample/venv/lib/python3.12/site-packages/pypdf/_reader.py", line 1037, in _get_xref_issues
    stream.seek(startxref - 1, 0)  # -1 to check character before
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: negative seek value -1

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions