RedlineAI
RedlineAI ingests real-world contracts (PDF/DOCX/scans), classifies clauses, extracts key terms, flags risks against your house policy, proposes redlines, and pushes alerts (email/SMS/calls/calendar).
It runs on FastAPI + TiDB Serverless (SQL + Vector + FTS) + OpenAI with a practical, agentic pipeline.
🌱 Inspiration
Legal review weeks are chaos: scattered PDFs, manual searches, forgotten renewal windows, and “where’s that indemnity clause?” at 11:58 PM.
We wanted a reviewer-first system that:
- Understands contracts, not just text.
- Explains risk with citations to policy.
- Suggests concrete edits (redlines) you can paste into Word.
- Remembers deadlines and pings the right humans automatically.
🧠 What it does
- Ingest: OCR, parse, chunk by headings/clauses; create embeddings + FTS.
- Classify: Clause types (Auto-Renewal, Indemnity, DPA, SLA Uptime, etc.).
- Extract: Dates, thresholds, renewal windows, liability caps, uptime %, notice periods.
- Assess risk: Compare to policy; score and explain; link to rules.
- Redline: Suggest strict/medium/soft rewrite alternatives.
- Summarize: One-page exec brief; “what’s unusual, what’s due.”
- Search/Q&A: Hybrid semantic + keyword (contract-scoped).
- Alert: Email/SMS/voice/calls when severity ≥ threshold or deadlines approach.
🧰 How we built it
Stack
- API: FastAPI + Uvicorn
- DB: TiDB Serverless (transactional SQL + Vector + Full-Text Search)
- Tables:
contracts,tidb_vector_langchain(embeddings),clauses,risks,alerts,audit_log,users
- Tables:
- Embeddings: OpenAI via
langchain-openai - Vector store:
TiDBVectorStore(stores document, embedding, meta) - LLM: OpenAI (classification, extraction, risk rationale, redlines)
- Storage: S3 (original files) with presigned GET
- Notifications: SendGrid (email), Twilio (SMS + voice), Google Calendar (optional)
- Agent runtime: LangGraph (ingest pipeline), lightweight services for processing/alerts
Endpoints (core)
POST /api/v1/ingest— parse + chunk + embed (+S3 if logged-in); idempotent by file hashPOST /api/v1/contracts/{id}/process?use_llm=true— classify/extract/assess/write risks (idempotent,force=true)GET /api/v1/contracts/{id}/risks?min_severity=5&clause_type=Auto-RenewalPOST /api/v1/contracts/{id}/qa— MMR retrieval + answers with citationsGET /api/v1/contracts/{id}/summaryGET /api/v1/alerts/duePOST /api/v1/alerts/dispatch— agentic notifications- Users:
GET /api/v1/users/{user_id}/contractsGET /api/v1/users/{user_id}/contracts/{contract_id}/presign
Data model (simplified)
contracts(id, user_id, tenant, doc_type, original_filename, file_url, sha256, uploaded_at, …)
tidb_vector_langchain(id, embedding, document, meta JSON)
← meta.contract_id, meta.chunk_index, meta.page, …
clauses(id, contract_id, chunk_id, clause_type, confidence, extracted_json)
risks(id, contract_id, clause_id, severity, rule_id, rationale, suggested_fix)
alerts(id, contract_id, risk_id, kind, severity, message, channel_json, due_at, status, …)
# 😵💫 Challenges
- **PDFs are messy**: mixed fonts, headers/footers, TOCs — chunking by headings + layout metadata helped a lot.
- **Latency**: embeddings + LLM can be slow — we parallelized where safe, cached embeddings, and streamed UI updates.
- **Notifications**: ensuring we don’t spam — alerts table has `status`, `channel_json`, `due_at`, and unique keys to dedupe.
- **Policy drift**: we version rules (`rule_id`) and log to `audit_log` for repeatability.
---
# 🧪 What we learned
- **One database is cleaner**: TiDB’s SQL + Vector + FTS removed glue code and frustration.
- **Typed agent steps**: JSON schemas per step tame LLM variability and make retries sane.
- **Contract-scoped RAG matters**: restrict retrieval by `meta.contract_id` to avoid cross-document leakage.
- **Idempotency everywhere**: `sha256` ingest, `force=true` process, alert upserts → less production pain.
- **People want evidence**: every risk points back to the clause, policy rule, and a suggested fix.
---
# 🚀 What’s next for RedlineAI
- Smarter clause similarity detection.
- More integrations (Slack, Teams, Jira).
- Fine-tuned models for niche contract types.
- Multi-tenant dashboards with audit + reporting.
- Workflow APIs for plugging into enterprise CLM.
Log in or sign up for Devpost to join the conversation.