Euro sign not being recognized by extractText

Hi,

I am using pyPDF2 to extract text from a PDF file, and I am having problems with the Euro sign.

This is what the pdf looks like.
![image](https://user-images.githubusercontent.com/25104002/42809633-583fb8a8-89b6-11e8-9301-f9c9b4977ce4.png)

A copy/paste from acrobat reader properly gives back the euro sign.
Also extracting with pdftotext correctly yields the character:

![image](https://user-images.githubusercontent.com/25104002/42809762-a2bf13f6-89b6-11e8-884d-3db5aa9ab21d.png)


pyPDF2, however, recognises it as a bullet (U+2022):

![image](https://user-images.githubusercontent.com/25104002/42809833-ca4f6e7a-89b6-11e8-9ae0-b5276c85d837.png)

Is there anything I can do to fix this? I do not seem to find any encoding options I can tweak in extractText.

Thanks for your help,

Andrea.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Euro sign not being recognized by extractText #443

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Euro sign not being recognized by extractText #443

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions