Skip to content

Reading large PDFs gives "PdfReadError: Unable to find 'endstream' marker after stream" #167

@watt2000

Description

@watt2000

Hi,
When I try to read some big file (>10Mo) I have a "PyPDF2.utils.PdfReadError: Unable to find 'endstream' marker after stream" error.

Here is the stack :

 File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 316, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 405, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 381, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 405, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 381, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 390, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 405, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 381, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 381, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 419, in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 381, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 410, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1384, in getObject
    retval = readObject(self.stream, self)
  File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 65, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 617, in readFromStream
    raise utils.PdfReadError("Unable to find 'endstream' marker after stream at byte %s." % utils.hexStr(stream.tell()))
PyPDF2.utils.PdfReadError: Unable to find 'endstream' marker after stream at byte 0x413068.

How can I read big files?

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfReaderThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessneeds-pdfThe issue needs a PDF file to show the problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions