TypeError: replace() argument 1 must be str, not bytes

I just want Covert a pdf file to a txt file , but the run failed

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python -m platform
root@0cc46add0ae3:/home/learn/IR/irBooks# python -m platform
Linux-5.10.124-linuxkit-x86_64-with-glibc2.31

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
root@0cc46add0ae3:/home/learn/IR/irBooks# python -c "import PyPDF2;print(PyPDF2.__version__)"
2.11.0
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python

import PyPDF2

print(PyPDF2.__version__)

def hanleOnePage(pdfreader, pIdx, outputFile) :
    print("hanle Page:%s" % (pIdx))
    pageobj=pdfreader.getPage(pIdx)

    text=pageobj.extractText()
    # print(text)
    file1=open(outputFile,"a")
    file1.writelines(text)

#create file object variable
#opening method will be rb
# pdffileobj=open('01bool.pdf','rb')
pdffileobj=open('02voc.pdf','rb')
 
#create reader variable that will read the pdffileobj
pdfreader=PyPDF2.PdfFileReader(pdffileobj)
 
#This will store the number of pages of this pdf file
x=pdfreader.numPages
print("PDF:numPages:%s" % (x))

#create a variable that will select the selected number of pages
pIndex = 0
while pIndex < x:
    hanleOnePage(pdfreader, pIndex, "all.txt")
    pIndex += 1

pdffileobj.close()


```

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
[02voc.pdf](https://github.com/py-pdf/PyPDF2/files/9712729/02voc.pdf)

## Traceback

This is the complete Traceback I see:

root@0cc46add0ae3:/home/learn/IR/irBooks# python tokens-step1.py
2.11.0
PDF:numPages:29
hanle Page:0
hanle Page:1
hanle Page:2
Traceback (most recent call last):
  File "/home/learn/IR/irBooks/tokens-step1.py", line 39, in <module>
    hanleOnePage(pdfreader, pIndex, "3.txt")
  File "/home/learn/IR/irBooks/tokens-step1.py", line 12, in hanleOnePage
    text=pageobj.extractText()
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_page.py", line 1865, in extractText
    return self.extract_text()
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_page.py", line 1818, in extract_text
    return self._extract_text(
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_page.py", line 1323, in _extract_text
    cmaps[f] = build_char_map(f, space_width, obj)
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_cmap.py", line 27, in build_char_map
    map_dict, space_code, int_entry = parse_to_unicode(ft, space_code)
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_cmap.py", line 193, in parse_to_unicode
    cm = prepare_cm(ft)
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_cmap.py", line 210, in prepare_cm
    .replace(b"beginbfchar", b"\nbeginbfchar\n")
TypeError: replace() argument 1 must be str, not bytes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: replace() argument 1 must be str, not bytes #1379

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TypeError: replace() argument 1 must be str, not bytes #1379

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions