-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
I'm trying to automate sorting pdfs by the date on the pdf. However the issue I continue having is that the /'s in the dates continually get read as 1's. Wouldn't be a problem 90% of the time unfortunately it reads a lot of January and November dates as the same
1/11/2022
11/1/2022
Both end up as 111112022
I tried getting the new pdfs to change to a new format to have 01/11/2022 but they aren't able to do that. Is there a way to fix this?
from PyPDF2 import PdfReader
reader = PdfReader("TestPackingSlip637860440227283947.pdf")
print(f"Total pages= {len(reader.pages)}")
for i, page in enumerate(reader.pages, start=1):
print(f"Page: {i}")
print(page.extract_text())The info on the pdf I'm uploading is randomized and does not represent anyone's real info.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow