Inspiration

Every day, medical and clerical staff waste time entering patient or customer information from ID’s into spreadsheets. And a small typo in data entry can lead to a larger problem. I wanted to create a project to automate this entire process and decrease manual labor.

What it does

AutoFill DL OCR scans an image of a driver’s license or ID that runs OCR (optical character recognition) preprocessing for accuracy, and will extract useful fields, such as name, DOB, address, and ZIP code from the ID. The program will automatically append this structured data to an Excel sheet. It also implements the functionality of the library watchdog, which monitors the folder that it is ran within which helps me automate the process.

How I built it

This project implements Python as the core for logic and file management so that everything is coordinated appropriately. Tesseract OCR—specifically pytesseract—is used at the center of text recognition to simply retrieve the raw text from driver’s license images. To enhance text accuracy, Pillow (PIL) is utilized for image preprocessing in the application of contrast enhancement, sharpening, and resizing. Once the raw text has been acquired, Regex were applied to intuitively format the raw OCR output to be converted into organized fields for First Name, Last Name, Date of Birth, etc. The newly organized fields are then efficiently written to Excel files using Pandas, which makes for organized storage and quick access to the data. The project also has optional extensions of which, with the use of Watchdog, a background monitoring service, the folder could be monitored and any new images added could be auto-processed to avoid any manual processing altogether.

Challenges I ran into

The most difficult aspect of this project was the fact that OCR technology is never completely reliable, so proper image preprocessing steps, like contrast settings, binarization thresholds, and resizing, took a lot of trial and error to get reliable results. Another challenge was testing across the different state licenses. I primarily used Delaware and Maryland ID's, but licenses from various states have different structures, fonts, and security patterns so the same OCR settings don’t always work. For example, fields such as name and DOB at times can move based on state, thus creating edge cases that necessitate updates or individual parsing rules. To work around this, I have approached testing as an iterative process- running the OCR, assessing misclassifications, then updating either regex patterns or pre-processing steps. Lastly, flexibility for end users was a significant consideration in all this. Instead of hardcoding file paths, I coded the system so that users can specify their own image and Excel locations which would make the tool more user friendly and portable across computers.

Accomplishments that I'm proud of

This project ultimately created the entire end-to-end pipeline, moving an image through OCR processing, turning raw text into structured data, and ultimately writing it in Excel. The tool was designed to be general and user-friendly so if anyone wanted to run the project, all they needed to do was point to their own image and Excel file without worrying about the underlying logic. I was happy that the project successfully ran to provide real-world impact by decreasing the manual data-entry burden for a franchise by about 40% while saving time and limiting errors. This impact was measured by calculating the total amount of paperwork that needs to be completed (Medicine orders, bloodwork orders, customer entry, etc) and then calculating the percentage customer entry played into that, which was 40% and subtracting it from the total amount.

What I learned

Throughout this project, I realized just how critical OCR preprocessing is to achieving accurate results. Even seemingly insignificant changes like tweaking contrast, refining binarization thresholds, or resizing images—had a dramatic impact on recognition quality. It illustrated the significance of improving quality when working with real-world data. Other than the OCR, I learned about how to structure a Python project—not as a class project or assignment, but something for real, professional use. This meant building reliable workflows with directory watchers, automating Excel integration, and implementing careful exception handling to create a process that is both robust and user-friendly. More importantly, it taught me how to think like a developer working on a tool that others might depend on, where maintainability, flexibility, and reliability are just as important as getting the code to run.

What's next for AutoFill DL OCR

Looking ahead, there are a few possibilities to extend this project to the next level. One possibility could be improving the OCR for better accuracy with deep learning models and more defined parsing and recognition using Tesseract associative recognition for more complicated scenarios or scenarios where images of the Ids have low quality. Secondly, a web app would be a much simpler-to-use experience for uploading images through a browser, as opposed to having users run scripts in the terminal. A simple React frontend could even make this a drag-and-drop process, with a simple Flask or FastAPI back-end processing the OCR pipeline and returning the results as structured data in a single response. Not only would this make the system easier to use, but it would be scalable to other ID formats, such as, passports, insurance cards, or other government-issued documents, widening its real-world usage. For instance, a more robust OCR engine, such as EasyOCR or PaddleOCR, could extract information from even less structured documents like insurance cards, and PassportEye could process the MRZ on passports. These future enhancements demonstrate a clear roadmap for scaling the project into a more versatile and professional tool.

Built With

Share this project:

Updates