Skip to content

Uncaught exceptions on generic file containing 'startxref' string #626

@tin-z

Description

@tin-z

Bug report
Several issues were spotted in the PdfFileReader class.

The following code was used during tests:

import sys
from PyPDF2 import PdfFileReader
from PyPDF2.utils import PyPdfError

if __name__ == "__main__":
    with open(sys.argv[1], "rb") as f:
        try:
            r = PdfFileReader(f)
        except PyPdfError:
            pass

output - bug1.pdf :

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/pdf.py", line 1768, in read
    startxref = int(line)
ValueError: invalid literal for int() with base 10: b'startxref'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    r = PdfFileReader(f)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/pdf.py", line 1148, in __init__
    self.read(stream)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/pdf.py", line 1773, in read
    startxref = int(line[9:].strip())
ValueError: invalid literal for int() with base 10: b''

output - bug2.pdf :

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    r = PdfFileReader(f)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/pdf.py", line 1148, in __init__
    self.read(stream)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/pdf.py", line 1948, in read
    stream.seek(-11, 1)
OSError: [Errno 22] Invalid argument

Solution
Validate the signature of the input file, that should be %PDF- and alike, ref:https://en.wikipedia.org/wiki/List_of_file_signatures

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions