-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
When I try to extrac the text from the PDF below, I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1263, in extract_text
return self._extract_text(self, self.pdf, space_width, PG.CONTENTS)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1245, in _extract_text
process_operation(operator, operands)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1197, in process_operation
text += operands[0].translate(cmap)
TypeError: a bytes-like object is required, not 'dict'
Fixing this issue would likely also fix #523
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.4.0-113-generic-x86_64-with-debian-bullseye-sid
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.0.0 (current main - 2.1.0)Code
PDF: https://github.com/mstamy2/PyPDF2/files/3796761/17343_2008_Order_09-Jan-2019.pdf
from PyPDF2 import PdfReader
reader = PdfReader('17343_2008_Order_09-Jan-2019.pdf')
page = reader.pages[0]
page.extract_text()Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow