Skip to content

rwang5412/Donna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intake Screening Pipeline

Automates everything a law-firm intake screener does up to the accept/reject decision, then presents one structured recommendation for a human to approve or override. A stateful LangGraph pipeline does the work; Claude (Opus 4.7) handles entity extraction and the reasoning; a local Chroma + sentence-transformers store handles conflict retrieval (no second API key).

What it deliberately does not do: make the final call (a human does), give the prospective client legal advice (UPL), or learn online — the profitability signal arrives months later and is reconciled offline (reconcile).

Pipeline

START → extract → conflicts ─┬─ hard conflict ───────────────→ synthesize → human_review → END
                             └─ else → deadline → value ─────→ ┘
Node What it does
extract Claude parses the raw transcript into structured facts (messages.parse, structured outputs).
conflicts Embeds claimant + adverse parties, retrieves nearest firm parties from Chroma, flags hard/soft conflicts.
deadline Computes the statute-of-limitations filing deadline from case type + jurisdiction + accrual date.
value Claude produces a rough pre-screen value estimate; firm economics computed deterministically.
synthesize Combines signals into a deterministic ACCEPT/REJECT/REVIEW + score, with a Claude-written rationale/risks.
human_review Pauses the graph (interrupt) for a human decision; records it to close the loop later.

A hard conflict short-circuits straight to synthesize (no point valuing a case the firm can't ethically take). The graph is stateful: a run pauses at human_review and is resumed by thread_id with the human's decision.

Setup

cd network
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt          # first install pulls torch (sentence-transformers)
pip install -U anthropic                 # ensure messages.parse / Opus 4.7 support

cp .env.example .env                      # then add your ANTHROPIC_API_KEY

Run

# 1. Seed the firm's conflict store (clients + prior adverse parties)
python -m intake.cli seed

# 2. Screen a transcript — prints the structured recommendation, then pauses
python -m intake.cli run data/sample_transcripts/clean_case.txt

# 3. Record a human decision (approve or override) in one shot
python -m intake.cli run data/sample_transcripts/clean_case.txt \
    --decision ACCEPT --reviewer "J. Okafor"

Sample transcripts exercise each path:

  • clean_case.txt — strong PI case, clear liability → ACCEPT
  • conflict_case.txt — names Meridian Health Systems, an existing client → REJECT (hard conflict)
  • expired_case.txt — CA premises fall from Jan 2023, past the 2-yr SOL → REJECT (expired)

Closing the loop (§6 — offline)

Decisions are appended to data/decisions.jsonl. When realized economics land months later in data/realized.jsonl ({"matter_ref": ..., "realized_net_to_firm_usd": ...}), reconcile predicted vs. realized:

python -m intake.cli reconcile

This reports prediction error and accept-vs-profitable rates to recalibrate the screening thresholds — reviewed by a human, never fed back into the live graph automatically.

Layout

intake/
  state.py        Pydantic schemas + LangGraph state
  llm.py          Anthropic SDK wrapper (model, caching, structured parse)
  rag.py          Chroma + sentence-transformers conflict store
  deadlines.py    Statute-of-limitations table + computation
  valuation.py    Value prediction + firm economics
  nodes.py        LangGraph node functions
  graph.py        Graph wiring + checkpointer
  pipeline.py     start_intake / resume_with_decision
  seed_data.py    Firm parties for the conflict store
  outcomes.py     Offline predicted-vs-realized reconciliation
  cli.py          Command-line entry point

Notes

  • Embeddings are local. If sentence-transformers can't load, rag.py falls back to a deterministic hashing embedding so the pipeline still runs (lower retrieval quality). intake.cli seed prints which embedder is active.
  • State persistence. The graph uses an in-process MemorySaver, so a paused intake resumes within the same process. Swap it for SqliteSaver in graph.py to persist across processes / serve from an API.
  • The SOL table is illustrative, not a maintained legal reference.

About

ucsb network hackathon project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors