Inspiration

Three months ago, when Team USA competed at the Olympic Winter Games, a curling stone got pulled from play. Nobody at home understood why. Hundreds of people asked the same question online. A dozen news sites wrote whole articles trying to explain one rule.

That single moment is the entire premise of Laurel. The viewer sees the moment. The viewer does not understand the moment. By the time the explainer article is written, the broadcast has moved on.

This happens hundreds of times every Games, across every sport, on every broadcast. We wanted to close the gap to seconds.

What it does

Laurel is a second-screen companion you keep on your phone or laptop. When something happens you do not understand, you point Laurel at the TV with your camera, share your screen laptop, or upload a screenshot. Laurel:

  1. Identifies the sport with Gemini Vision across the captured frames.
  2. Retrieves the relevant rule and historical context from a curated knowledge base of six sports (three Olympic, three Paralympic).
  3. Streams a grounded explanation in 3 to 5 seconds. The rule, what typically happens next, why it matters.
  4. Lets you ask follow-ups by tapping a suggested chip, typing, or speaking.
  5. Generates a sharable link with a rich preview so the family group chat does not have to stay confused either.

Two scenarios drove the design:

  • Confused viewer. A judge ruling negates the highlight. Capture the frame, get a clear rule-based explanation in under five seconds.
  • "Was that a big deal?" A world record falls. Capture the frame, get historical comparison and a sharable link.

Olympic and Paralympic content is treated as structurally equal: same retrieval pipeline, same depth of explanation, same matter-of-fact tone.

Try the pre-loaded scenarios:

How we built it

Frontend. Next.js 16 App Router, React 19, Tailwind 4, deployed on Vercel. Three capture modes (camera, screen share, upload) feed a MomentViewer that consumes a Server-Sent Events stream from the backend. Voice input is the browser-native Web Speech API.

Backend. Python 3.12 + FastAPI + uv, containerized and deployed on Cloud Run. Two endpoints power the entire experience: /api/explain (captured frames in, streamed explanation out) and /api/follow-up (moment ID + history + question in, streamed next turn out). Both return SSE so the user sees tokens within ~600ms.

AI. Gemini 2.5 Flash on Vertex AI for both vision and text. Multi-frame Vision returns structured JSON. Synthesis uses a system prompt that locks Laurel to conditional phrasing, forbids naming individuals, requires grounding in retrieved context, and treats Para sports as equal in significance.

Knowledge base. Six hand-authored markdown files. On boot, the backend chunks each file by section, embeds each chunk with text-embedding-005, and holds the vectors in an in-memory index. Per-query retrieval scopes results to the identified sport.

Sharing. Captured frames live in Cloud Storage; moment metadata in Firestore. Each /m/<id> permalink generates an OpenGraph image dynamically.

Stub-first developer experience. The backend ships with a deterministic stub Gemini client that runs the full request flow without GCP credentials. Anyone can clone the repo and walk through the entire UX before connecting a Google Cloud project.

Challenges we ran into

  1. Compliance pivot mid-build. Re-reading the rules late surfaced that Games footage, athlete NIL, NGB names, and any non-Google corporate brand were forbidden in submission materials. We pivoted: built two self-produced 3D broadcast mockups in Three.js, swept the KB and prompts of every athlete name and NGB reference, and re-shot the demo against the post-pivot prod stack.
  2. iOS Safari capture quirks. getDisplayMedia is not implemented on iOS Safari. We probe browser support after mount and hide the Share Screen chip on devices where it would throw. Plus iOS auto-zooms any input under 16px font on focus, which broke the share gesture during testing.
  3. Streaming latency budgeting. Live demos are unforgiving. We needed first-token-on-screen in under a second. Switched the explain pipeline to true SSE and tightened the system prompt to minimize lead-in tokens. End-to-end now lands at ~600ms first token, 3 to 5s for the full explanation.
  4. Glow ring visibility in dark mode. Outset box-shadow was clipped by the parent's overflow: hidden. Switched to a stacked inset shadow with a static fallback that paints before the first animation frame.

Accomplishments we're proud of

  • Two scenarios end-to-end, in seconds. Both work on phone and laptop with grounded, conditionally-phrased answers and rule citations.
  • Equal Olympic and Paralympic coverage as a structural choice, not a footnote. Three sports each, same pipeline, same depth, same tone.
  • Stub-first dev experience. Anyone can clone the repo and walk through the entire UX (including the demo scenarios) without a single GCP credential.
  • Compliance built in, not bolted on. System prompt forbids naming individuals regardless of source. We swept the live app for NGB references, prescribed Games terminology, and corporate brands, then verified end-to-end with an automated transcription audit of the demo video before submission.
  • Self-produced 3D broadcast mockups. Built a Vite + React Three Fiber site with two cinematic 3D scenes so we could demonstrate the product without using any Games footage.

What we learned

  • Read the rules first, twice. The compliance pivot cost half a build day. Doing the second read up front would have saved a re-shoot and a re-render.
  • Stub-first pays for itself. The same pattern (get_client() returns stub or Vertex based on env vars) extended cleanly to storage and meant zero code changes between dev and prod.
  • Streaming feels like magic. Spinners feel like waiting. Same total latency, completely different perception.
  • Compliance is a feature, not a constraint. Forbidding names, requiring conditional phrasing, treating Para sports as equal — these all improved the product. Laurel sounds informed and humble instead of breathless.

What's next

  • Broadcast graphic OCR. Reading lower-third graphics turns a Vision-only signal into Vision + OCR fusion.
  • Live officiating feeds. Wiring in real-time call data turns "I think this is a double-touch" into "this is officially logged."
  • Multi-language explanations. Same KB, twelve languages, voice-grade output via Gemini's multilingual capability.
  • Coverage expansion. Six sports today; the KB pattern (one markdown file per sport) scales linearly.
  • Native broadcaster integration. The architecture can ship as a tab inside an official Games companion experience.
  • On-device sport classification. Distilling identification into a small on-device model would drop end-to-end to sub-second.

Built With

  • artifact-registry
  • cloud-build
  • cloud-run
  • cloud-storage
  • fastapi
  • firestore
  • gemini
  • google-cloud
  • nextjs
  • pytest
  • python
  • react
  • react-markdown
  • react-three-fiber
  • remotion
  • server-sent-events
  • tailwindcss
  • text-embedding-005
  • three.js
  • typescript
  • uv
  • vercel
  • vertex-ai
  • web-speech-api
Share this project:

Updates