PayBack: AI-Powered Medical Bill Decoder

Inspiration

80% of medical bills contain errors. Disputing one requires CPT expertise, Medicare rate knowledge, and federal statute citations—none of which patients have. Hospitals know every insurer rate; patients know nothing. We built PayBack to close that gap.

What We Learned

Medical billing is a translation problem. CPT 99284 is opaque; "ER visit, high severity, urgent evaluation" is actionable. PayBack’s value is translating billing language into plain language.

The data exists. The Hospital Price Transparency Rule (2021) mandates published negotiated rates. DoltHub hosts the transparency-in-pricing dataset—millions of rates queryable via SQL. We use it as our benchmark source.

How We Built It

Tech stack: FastAPI, React, LangGraph, Google Gemini 2.0 Flash / 2.5 Pro, DoltHub API, Actian VectorAI DB, sentence-transformers, MongoDB Atlas.

LangGraph pipeline (StateGraph):
ocr → extract → query → rules → END

  1. OCR — Gemini 2.0 Flash reads PDF/image bills and extracts raw text (handles low-quality scans).
  2. Extract — Gemini structures output into typed line items: CPT codes, descriptions, quantities, billed amounts, facility, insurance, diagnosis codes.
  3. Query — Async DoltHub lookups per CPT code; insurance-payer match or market-average fallback; concurrent requests via semaphore.
  4. Rules — Deterministic Python engine (duplicates, quantity anomalies, extreme markup, facility fees, self-referral, discharge-day billing) plus Gemini-assisted checks (relationship, upcoding, unbundling).
  5. Vector DB similarity — Actian VectorAI DB with all-MiniLM-L6-v2 embeddings (384d, normalized for cosine similarity). Precedent collection seeded with historical dispute cases; rules engine fetches top‑3 similar cases and injects them as context for Gemini prompts.
  6. Dispute letter — Gemini 2.5 Pro generates a formal dispute citing flags, DoltHub benchmarks, and precedent language. Stored in MongoDB with full case metadata.

Data flow: Upload → PDF conversion (PyPDF2/pdf2image) → LangGraph astream → status polling → Results UI with precedent search.

User-triggered second pass: On the Results page, a "Re-check rules" button lets users run the rules engine again on demand (POST /bills/{bill_id}/rerun-rules). The second pass re-runs deterministic checks, precedent search, and Gemini-assisted analysis, then updates flags and summary in place.

Challenges

  • Benchmark consistency — Same CPT across line items can return different DoltHub rows; we normalize to min benchmark per CPT.
  • Vector DB integration — Actian VectorAI (gRPC) separate from Gemini stack; sentence-transformers runs locally for embeddings.
  • Severity calibration — Combined flag severity with overcharge thresholds: HIGH ≥ $200, MEDIUM ≥ $50, LOW for small amounts and unbundling/duplicates.

Built With

Share this project:

Updates