Inspiration

Trial lawyers spend $50K to $500K per case hiring forensic animators, medical illustrators, and expert witnesses, all working in silos. When one fact changes, every vendor redoes their work independently. Paralegals cross-reference documents by hand for hours hunting contradictions. We realized this is a multimodal AI problem. Every piece of evidence (PDFs, audio, dashcam clips, photos) is unstructured data that needs to be parsed, connected, and synthesized. Gemini's native multimodal understanding makes it possible to replace six separate vendors in minutes.

What it does

Lawyers drop files onto an infinite canvas where Gemini auto-parses, labels, and summarizes each one, clustering nodes by type and drawing animated threads between related evidence. Three intelligence systems work behind the scenes: dimension discovery identifies what facts matter for any case type with zero configuration, contradiction detection compares only facts sharing the same dimension and entity from different sources to eliminate 95% of false positives, and gap detection finds evidence referenced in documents but never uploaded. From there, Clarion streams a fully cited trial report with AI-generated images, video scene reconstructions from witness testimony, and counter-arguments, all editable in real time through a Gemini Live API voice agent.

How we built it

We split the project into three services sharing one Pydantic schema. The backend uses FastAPI and Gemini to handle multimodal evidence parsing, dynamic citation indexing, contradiction detection, and gap analysis. The engine takes the analyzed evidence and generates interleaved report blocks with AI video reconstructions from witness testimony, streamed to the frontend via SSE. The experience layer is a Next.js app built around a React Flow infinite canvas, a streaming report viewer, and a Gemini Live API voice agent with seven function-calling tools for hands-free editing.

Challenges we ran into

Our biggest early mistake was refactoring the shared schema from Enums to Literals midway through, which broke every downstream service and taught us to define the schema first and treat it as immutable. We also learned that hardcoded dimensions don't generalize when our 21 fixed categories turned out to be useless for anything beyond car accidents, so we let Gemini discover dimensions dynamically from the actual evidence. Contradiction detection initially produced overwhelming false positives until we added a structural filter that only compares facts sharing the same dimension and entity from different sources, cutting 95% of noise. On the voice agent side, Gemini kept hearing itself through the browser speakers and responding in a loop, which we solved with echo cancellation and mic muting during agent output.

Accomplishments that we're proud of

Works for any litigation type with zero configuration Near-zero false positive contradiction detection AI video reconstructions from witness descriptions Voice-powered report editing via Gemini Live API function calling

What we learned

The shared schema is the most important file. Define it before anything else. Gemini's native multimodal eliminates separate STT/TTS/vision pipelines Dynamic discovery beats hardcoded taxonomies Live API function calling makes voice agents that do things, not just talk

What's next for Clarion

Batch Gemini calls to halve analysis latency Multi-case entity linking Courtroom export formats (PDF, PowerPoint) Deposition prep with auto-generated questions targeting contradictions

Built With

Share this project:

Updates