-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Crash during page text extraction #2975
Copy link
Copy link
Closed
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Metadata
Metadata
Assignees
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Trying to extract text from first two pages of the PDF and the error occurred. I have a sample workaround at neeraj9@75b4e42 to get past the error
Environment
OS: Windows 11 version 23H2
Python: Python 3.12
Code + PDF
This is a minimal, complete example that shows the issue:
Sample workaround:
neeraj9@75b4e42
PDF causing error:
9E5E080E-C8DB-4A6B-822B-9A67DC04E526-120438.pdf
Traceback
This is the complete traceback I see: