-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
Hi all,
I coverting pdf file to text for processing. code was workig fine an drecently it started giving errors like below and not text extraction
PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666]
MCVE
from PyPDF2 import PdfReader
reader = PdfReader("TN_24.08.2020.pdf")
text = reader.pages[0].extract_text()
assert "Directorate" in text, textmy pdf file and process code are attached
pdf2txt.py.txt
TN_24.08.2020.pdf
Thanks in advance
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness