Skip to content

Will hang on invalid PDFs #77

@wolever

Description

@wolever

Doing some testing, I noticed that PyPDF2 will hang if it encounters an invalid PDF… for example, the skipOverComment function:

def skipOverComment(stream):
    tok = stream.read(1)
    stream.seek(-1, 1)
    if tok == b_('%'):
        while tok not in (b_('\n'), b_('\r')):
            tok = stream.read(1)

Will hang indefinitely.

I would propose three courses of action:

  1. Wrap the stream in a method which will raise an exception after a certain number of empty reads; ex:
class SafeStream(object):
    def __init__(self, stream):
        self.stream = stream
        self.seek = stream.seek
        self.tell = stream.tell
        self._empty_reads = 0

    def read(self, *args):
        res = self.stream.read(*args)
        if res == "":
             self._empty_reads += 1
             if self._empty_reads > 1000:
                 raise Exception("too many empty reads")
        else:
             self._empty_reads = 0
        return res
  1. Add a script for automating fuzz testing to the repo

  2. Fix the bugs as the script from step (2) finds them

What do you think? Would you be open to patches for those?

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFneeds-pdfThe issue needs a PDF file to show the problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions