Document ML Agent

An end-to-end document processing pipeline that combines deep learning with an agentic orchestration layer. Built for MIS 382N Advanced Machine Learning at UT Austin.

What it does

Upload a receipt, invoice, or business letter and the system will:

Extract text using OCR (EasyOCR)
Classify the document type (ResNet18 ensemble trained on RVL-CDIP)
Pull out structured fields like vendor, date, totals (LayoutLMv3)
Make an approval decision based on business rules
Flag anomalies and route edge cases to human review

Architecture

Document → OCR Agent → Router Agent → Field Agent → Decision Agent → HITL Manager
              ↓            ↓              ↓              ↓
          EasyOCR     ResNet18      LayoutLMv3     Rule Engine
                      Ensemble

Each agent is independent and logs its decisions for a full audit trail.

Notebooks

File	What's in it
`AML_Document_Processing_Pipeline_v2.ipynb`	Main notebook - OCR, LayoutLM, CNN, agentic layer, Gradio UI
`RVL_CDIP_Classification.ipynb`	ResNet18 training on document images

Running it

Open in Google Colab (GPU runtime recommended)
Run the cells in order
Phase 10-11 will launch a Gradio demo with a shareable link

The .pt model files aren't in the repo (too large). Train them yourself or grab from Google Drive.

Tech stack

PyTorch + torchvision (ResNet18)
HuggingFace Transformers (LayoutLMv3)
EasyOCR
Gradio for the demo UI
Good old regex for fallback field extraction

Next steps

A few things that would make this better:

Isolation Forest for anomaly detection - currently using rules, ML would catch more edge cases
Fine-tune LayoutLM on real SROIE data - we trained on synthetic receipts, real data would help
Add XGBoost approval predictor - learn from historical approval decisions
Better OCR preprocessing - deskewing, noise removal before OCR
Async processing - handle batch uploads without blocking
Model versioning - track which model made each decision

Team

Built by the RogueTex crew for our grad ML class.

If you're grading this: yes, all the phases work. Run it in Colab with a GPU.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
AML_Document_Processing_Pipeline_v2.ipynb		AML_Document_Processing_Pipeline_v2.ipynb
Advanced Machine Learning Document (1).pdf		Advanced Machine Learning Document (1).pdf
Advanced Machine Learning Document.pdf		Advanced Machine Learning Document.pdf
README.md		README.md
RVL_CDIP_Classification.ipynb		RVL_CDIP_Classification.ipynb
agentic_pipeline.png		agentic_pipeline.png
feature_importance.png		feature_importance.png
sample_documents.png		sample_documents.png
training_curves.png		training_curves.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document ML Agent

What it does

Architecture

Notebooks

Running it

Tech stack

Next steps

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document ML Agent

What it does

Architecture

Notebooks

Running it

Tech stack

Next steps

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages