I've got an IndexError when extracting text. The file opens fine in Chrome.
Environment
$ python -m platform
Linux-5.4.0-121-generic-x86_64-with-glibc2.31
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.4.2
Code + PDF
The file: pdf/5cf3eb1c20fb4bea8654f2a9b64b5a62.pdf
>>> from PyPDF2 import PdfReader
>>> reader = PdfReader('pdf/5cf3eb1c20fb4bea8654f2a9b64b5a62.pdf')
/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py:1229: PdfReadWarning: incorrect startxref pointer(1)
warnings.warn(
>>> for page in reader.pages: print(page.extract_text())
[...]
Invalid FloatObject b'71.5131592.8861'
Invalid FloatObject b'58.1.5131592.63'
Invalid FloatObject b'71.5131592.8861'
Invalid FloatObject b'58.1.5131592.63'
Invalid FloatObject b'71.5131592.8861'
Invalid FloatObject b'58.1.5131592.63'
Invalid FloatObject b'71.5131592.8861'
Invalid FloatObject b'58.1.5131592.63'
[...]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1507, in extract_text
return self._extract_text(
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1441, in _extract_text
process_operation(operator, operands)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1301, in process_operation
float(operands[5]),
IndexError: list index out of range
It's print(reader.pages[10].extract_text()) to be exact.
I've got an IndexError when extracting text. The file opens fine in Chrome.
Environment
$ python -m platform Linux-5.4.0-121-generic-x86_64-with-glibc2.31 $ python -c "import PyPDF2;print(PyPDF2.__version__)" 2.4.2Code + PDF
The file:
pdf/5cf3eb1c20fb4bea8654f2a9b64b5a62.pdfIt's
print(reader.pages[10].extract_text())to be exact.