Handwriting Recognition Project

Inspiration

Our initial spark of inspiration came from one member’s internship experience digitizing archive files. This led us to explore computer vision with handwriting recognition. We also thought this project would be a fun and interactive showcase at the hackathon expo. Additionally, we had no prior experience with machine learning models and computer vision, making it an exciting challenge.

Initial Approach

We planned to train an open-source machine learning model on a large handwriting dataset. Our search on Kaggle yielded several promising datasets. However, we quickly realized that training a model from scratch was too ambitious for the timeframe, especially since we wanted to incorporate user interaction. Thus, we pivoted to using a pre-trained model.

Choosing an OCR Model

Our research led us to Tesseract, an open-source optical character recognition (OCR) engine. However, initial tests showed that Tesseract was not accurate enough for handwriting. We then explored other handwriting-specific models, eventually discovering TrOCR, a transformer-based model with several pre-trained versions for English handwriting.

Computer Vision with OpenCV

On the input side, we used OpenCV, which was robust and well-suited for our needs. It enabled us to process images from a camera feed, making the project real-time and interactive. The image was then sent to the model after preprocessing.

Preprocessing Steps

Auto-cropping: The model struggled to find text in images with backgrounds. To solve this, we used OpenCV to detect the bounding box of the paper and crop the image accordingly.
De-skewing: The model had issues with tilted text. We applied transformations to correct the skew.
Contrast Adjustment: Shadows and paper textures impacted accuracy. Increasing contrast helped letters stand out and removed printed lines.

Graphical User Interface (GUI)

We built a GUI to enhance usability and display statistics about the recognized text. Initially, we used command-line interactions, but this proved slow and cumbersome.

Key Features

Camera Feed with Countdown: Displays a live preview with a countdown before capturing the image.
Image Approval & Manual Cropping: Users can view and manually adjust the cropped image.
Real-time Processing Feedback: Displays the manipulated image after auto-cropping, de-skewing, and contrast adjustments.
Retake Option: Allows users to retake images as needed.
Text Analysis: Runs the recognized text through the Natural Language Toolkit (NLTK) for part-of-speech breakdown.
Spell Check: Implements a spell-checking library for refining output.

Implementation

We built the GUI using CustomTKinter, a modern library based on Python’s TKinter. Though we had prior experience with React, we opted for CustomTKinter due to its ability to quickly create clean, styled interfaces. Integrating OpenCV’s video feed with TKinter presented a unique challenge but was ultimately successful.

Final Thoughts

Our project scope mainly changed during the planning phase, allowing us to implement all desired features. Given more time, we would have refined the user experience further by polishing the interface and debugging edge cases.

Built With

fine-tuned-on-iam)
hugging-face-trocr-model-(base-sized-model
matplot-lib
numpy
opencv
pandas
python
sci-kit-learn

Updates

Celia Mercier started this project — Feb 09, 2025 01:21 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.