Login Screen
Upload a document
Document Processing
Analyze Risk
List of Alerts
AI Chat Assist with your document
Chat with your document

RedlineAI

RedlineAI ingests real-world contracts (PDF/DOCX/scans), classifies clauses, extracts key terms, flags risks against your house policy, proposes redlines, and pushes alerts (email/SMS/calls/calendar).

It runs on FastAPI + TiDB Serverless (SQL + Vector + FTS) + OpenAI with a practical, agentic pipeline.

🌱 Inspiration

Legal review weeks are chaos: scattered PDFs, manual searches, forgotten renewal windows, and “where’s that indemnity clause?” at 11:58 PM.

We wanted a reviewer-first system that:

Understands contracts, not just text.
Explains risk with citations to policy.
Suggests concrete edits (redlines) you can paste into Word.
Remembers deadlines and pings the right humans automatically.

🧠 What it does

Ingest: OCR, parse, chunk by headings/clauses; create embeddings + FTS.
Classify: Clause types (Auto-Renewal, Indemnity, DPA, SLA Uptime, etc.).
Extract: Dates, thresholds, renewal windows, liability caps, uptime %, notice periods.
Assess risk: Compare to policy; score and explain; link to rules.
Redline: Suggest strict/medium/soft rewrite alternatives.
Summarize: One-page exec brief; “what’s unusual, what’s due.”
Search/Q&A: Hybrid semantic + keyword (contract-scoped).
Alert: Email/SMS/voice/calls when severity ≥ threshold or deadlines approach.

🧰 How we built it

Stack

API: FastAPI + Uvicorn
DB: TiDB Serverless (transactional SQL + Vector + Full-Text Search)
- Tables: contracts, tidb_vector_langchain (embeddings), clauses, risks, alerts, audit_log, users
Embeddings: OpenAI via langchain-openai
Vector store: TiDBVectorStore (stores document, embedding, meta)
LLM: OpenAI (classification, extraction, risk rationale, redlines)
Storage: S3 (original files) with presigned GET
Notifications: SendGrid (email), Twilio (SMS + voice), Google Calendar (optional)
Agent runtime: LangGraph (ingest pipeline), lightweight services for processing/alerts

Endpoints (core)

POST /api/v1/ingest — parse + chunk + embed (+S3 if logged-in); idempotent by file hash
POST /api/v1/contracts/{id}/process?use_llm=true — classify/extract/assess/write risks (idempotent, force=true)
GET /api/v1/contracts/{id}/risks?min_severity=5&clause_type=Auto-Renewal
POST /api/v1/contracts/{id}/qa — MMR retrieval + answers with citations
GET /api/v1/contracts/{id}/summary
GET /api/v1/alerts/due
POST /api/v1/alerts/dispatch — agentic notifications
Users:
- GET /api/v1/users/{user_id}/contracts
- GET /api/v1/users/{user_id}/contracts/{contract_id}/presign

Data model (simplified)

contracts(id, user_id, tenant, doc_type, original_filename, file_url, sha256, uploaded_at, …)

tidb_vector_langchain(id, embedding, document, meta JSON)
  ← meta.contract_id, meta.chunk_index, meta.page, …

clauses(id, contract_id, chunk_id, clause_type, confidence, extracted_json)

risks(id, contract_id, clause_id, severity, rule_id, rationale, suggested_fix)

alerts(id, contract_id, risk_id, kind, severity, message, channel_json, due_at, status, …)

# 😵‍💫 Challenges

- **PDFs are messy**: mixed fonts, headers/footers, TOCs — chunking by headings + layout metadata helped a lot.  
- **Latency**: embeddings + LLM can be slow — we parallelized where safe, cached embeddings, and streamed UI updates.  
- **Notifications**: ensuring we don’t spam — alerts table has `status`, `channel_json`, `due_at`, and unique keys to dedupe.  
- **Policy drift**: we version rules (`rule_id`) and log to `audit_log` for repeatability.  

---

# 🧪 What we learned

- **One database is cleaner**: TiDB’s SQL + Vector + FTS removed glue code and frustration.  
- **Typed agent steps**: JSON schemas per step tame LLM variability and make retries sane.  
- **Contract-scoped RAG matters**: restrict retrieval by `meta.contract_id` to avoid cross-document leakage.  
- **Idempotency everywhere**: `sha256` ingest, `force=true` process, alert upserts → less production pain.  
- **People want evidence**: every risk points back to the clause, policy rule, and a suggested fix.  

---

# 🚀 What’s next for RedlineAI

- Smarter clause similarity detection.  
- More integrations (Slack, Teams, Jira).  
- Fine-tuned models for niche contract types.  
- Multi-tenant dashboards with audit + reporting.  
- Workflow APIs for plugging into enterprise CLM.

Built With

css
fastapi
langchain
langgraph
openai
python
react
tailwind
tidb
tidbvectordb

Updates

Philip Awobusuyi started this project — Sep 16, 2025 01:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.