-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Description
Trying to extract the text from a PDF failed with PyPDF2==2.4.2
MCVE: Code + PDF
The PDF: pdf/1d652bd0d8c958b28b6b5a0e53cfe66e.pdf
>>> from PyPDF2 import PdfReader
>>> reader = PdfReader('pdf/1d652bd0d8c958b28b6b5a0e53cfe66e.pdf')
>>> reader.pages[1].extract_text()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1316, in extract_text
return self._extract_text(self, self.pdf, space_width, PG.CONTENTS)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1138, in _extract_text
content = ContentStream(content, pdf, "bytes")
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1191, in __init__
stream_data = stream.get_data()
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 1157, in get_data
decoded._data = decode_stream_data(self)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/filters.py", line 508, in decode_stream_data
if len(filters) and not isinstance(filters[0], NameObject):
TypeError: object of type 'IndirectObject' has no len()
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF