-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-cjk-issueIssue related to CJK (Chinese-Japanese-Korean)Issue related to CJK (Chinese-Japanese-Korean)workflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
#coding:utf-8
from PyPDF2 import PdfFileReader
def main():
fname="E:\\b.pdf"
with open(fname,'rb') as f:
readpdf = PdfFileReader(f)
page1=readpdf.getPage(1)
print(page1.extractText())
when i extracted text from a pdf file made by chinese.it shows that:
UnicodeEncodeError: 'gbk' codec can't encode character '\xfd' in position 11: il
legal multibyte sequence
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-cjk-issueIssue related to CJK (Chinese-Japanese-Korean)Issue related to CJK (Chinese-Japanese-Korean)workflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow