-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-cjk-issueIssue related to CJK (Chinese-Japanese-Korean)Issue related to CJK (Chinese-Japanese-Korean)workflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
As I have tested, pure English content in a PDF can be extracted without problem.
But nothing readable could be extracted for a Chinese page.
I guess it's caused by the encoding.
I tried to modify the following line to below
https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/utils.py#L246
def u_(s):
if sys.version_info[0] < 3:
return unicode(s, encoding='utf-8')
else:
return sBut it doesn't work.
My environment:
- Python 2.7.10
- OS X El Capitan
- PyPDF2 version 1.25.1
Thank you.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-cjk-issueIssue related to CJK (Chinese-Japanese-Korean)Issue related to CJK (Chinese-Japanese-Korean)workflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow