-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
Hi,
I am using pyPDF2 to extract text from a PDF file, and I am having problems with the Euro sign.
This is what the pdf looks like.

A copy/paste from acrobat reader properly gives back the euro sign.
Also extracting with pdftotext correctly yields the character:
pyPDF2, however, recognises it as a bullet (U+2022):
Is there anything I can do to fix this? I do not seem to find any encoding options I can tweak in extractText.
Thanks for your help,
Andrea.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow

