An end-to-end document processing pipeline that combines deep learning with an agentic orchestration layer. Built for MIS 382N Advanced Machine Learning at UT Austin.
Upload a receipt, invoice, or business letter and the system will:
- Extract text using OCR (EasyOCR)
- Classify the document type (ResNet18 ensemble trained on RVL-CDIP)
- Pull out structured fields like vendor, date, totals (LayoutLMv3)
- Make an approval decision based on business rules
- Flag anomalies and route edge cases to human review
Document → OCR Agent → Router Agent → Field Agent → Decision Agent → HITL Manager
↓ ↓ ↓ ↓
EasyOCR ResNet18 LayoutLMv3 Rule Engine
Ensemble
Each agent is independent and logs its decisions for a full audit trail.
| File | What's in it |
|---|---|
AML_Document_Processing_Pipeline_v2.ipynb |
Main notebook - OCR, LayoutLM, CNN, agentic layer, Gradio UI |
RVL_CDIP_Classification.ipynb |
ResNet18 training on document images |
- Open in Google Colab (GPU runtime recommended)
- Run the cells in order
- Phase 10-11 will launch a Gradio demo with a shareable link
The .pt model files aren't in the repo (too large). Train them yourself or grab from Google Drive.
- PyTorch + torchvision (ResNet18)
- HuggingFace Transformers (LayoutLMv3)
- EasyOCR
- Gradio for the demo UI
- Good old regex for fallback field extraction
A few things that would make this better:
- Isolation Forest for anomaly detection - currently using rules, ML would catch more edge cases
- Fine-tune LayoutLM on real SROIE data - we trained on synthetic receipts, real data would help
- Add XGBoost approval predictor - learn from historical approval decisions
- Better OCR preprocessing - deskewing, noise removal before OCR
- Async processing - handle batch uploads without blocking
- Model versioning - track which model made each decision
Built by the RogueTex crew for our grad ML class.
If you're grading this: yes, all the phases work. Run it in Colab with a GPU.