reading spanish text - mark convert issue 

Hello ,

I am using pypdf2 to extract Spanish text. I am using below code piece to do that. The problem is comes to in the part of these marks -> “ ” . In **extractText** part, ı get the throughput that format->  **fi + someting + fl** 

For example -> “Quijote”  : **ﬁ**Quijote**ﬂ**  or  “De la Mancha” :  **ﬁ**De La Mancha**ﬂ** 

I have tried to remove them as like that -> `page_text= re.sub(r"['',\“\”\Œ]",'',page_text) ` Have not worked. 
Is there any way to prevent it? Thanks.

```
import PyPDF2

pdfFileObj = open('X.pdf', 'rb')
text=[]
pdfReader = PyPDF2.PdfFileReader(pdfFileObj,strict=False)

for p in range(4,pdfReader.numPages):
    pageObj = pdfReader.getPage(p) 
    page_text=pageObj.extractText()
    text.append(page_text)
pdfFileObj.close()
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reading spanish text - mark convert issue #635

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reading spanish text - mark convert issue #635

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions