Skip to content

IndirectObject has no len() #399

@TZanke

Description

@TZanke

Trying to extract the text from a PDF failed with PyPDF2==2.4.2

MCVE: Code + PDF

The PDF: pdf/1d652bd0d8c958b28b6b5a0e53cfe66e.pdf

>>> from PyPDF2 import PdfReader
>>> reader = PdfReader('pdf/1d652bd0d8c958b28b6b5a0e53cfe66e.pdf')
>>> reader.pages[1].extract_text()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1316, in extract_text
    return self._extract_text(self, self.pdf, space_width, PG.CONTENTS)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1138, in _extract_text
    content = ContentStream(content, pdf, "bytes")
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1191, in __init__
    stream_data = stream.get_data()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1157, in get_data
    decoded._data = decode_stream_data(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/filters.py", line 508, in decode_stream_data
    if len(filters) and not isinstance(filters[0], NameObject):
TypeError: object of type 'IndirectObject' has no len()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions