Skip to content

encoding error #260

@snowlord

Description

@snowlord
#coding:utf-8

from PyPDF2 import PdfFileReader


def main():
    fname="E:\\b.pdf"
    with open(fname,'rb') as f:
        readpdf = PdfFileReader(f)
        page1=readpdf.getPage(1)

        print(page1.extractText())

when i extracted text from a pdf file made by chinese.it shows that:
UnicodeEncodeError: 'gbk' codec can't encode character '\xfd' in position 11: il
legal multibyte sequence

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-cjk-issueIssue related to CJK (Chinese-Japanese-Korean)workflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions