Inspiration

Every week, millions of people listen to deep science podcasts — Lex Fridman, Huberman Lab, Sean Carroll — and hear fascinating claims they can't easily verify. A guest says "bioelectric patterns control organ development" and the listener thinks that's incredible, but is it true? They'd have to find the right papers, parse dense academic prose, and connect it back to what they heard. Almost nobody does. The same research effort gets duplicated — or more often, abandoned — by thousands of listeners independently. We wanted to build the bridge between "I heard it on a podcast" and "here's the paper."

What it does

Noeron is an AI research companion that listens alongside you. As a podcast plays, it identifies scientific claims in real-time, retrieves relevant papers from a curated corpus, and generates context cards that link what the host said to the evidence behind it. Beyond passive surfacing, users can chat with the AI (with visible thinking traces), generate mini-podcasts that debate specific claims, create slide decks, visualize the knowledge landscape of a research field, and quiz themselves on what they learned.

How we built it

Gemini 3 is the core engine. We use its 1M-token context window to load entire podcast transcripts alongside 150+ papers simultaneously, context caching to process the paper corpus once and query it thousands of times (25x cost reduction), and thinking-level reasoning to distinguish genuine scientific claims from casual conversation. The pipeline: AssemblyAI transcribes and diarizes podcasts into 60-second windows, GROBID converts PDFs into structured JSON, papers get chunked (400 tokens, 50-token overlap) and embedded with Gemini's text-embedding-004 into Supabase pgvector. The frontend is Next.js + TypeScript + Tailwind, and the backend exposes tools via FastMCP (Model Context Protocol) over a FastAPI server. The demo video was built with Remotion.

Challenges we ran into

Claim detection was harder than expected — distinguishing a genuine scientific assertion ("gap junctions propagate bioelectric signals across cell networks") from casual commentary ("that's really interesting") required careful prompt engineering with Gemini's thinking levels. PDF parsing at scale was brutal: academic papers have wildly inconsistent formatting, and GROBID output needed extensive post-processing. Context caching was critical for cost but tricky to get right — we had to architect the pipeline so the expensive paper corpus gets cached once while per-query synthesis stays dynamic. And making 150+ papers genuinely searchable (not just retrievable) meant iterating heavily on chunking strategy and embedding quality.

Accomplishments that we're proud of

The system actually works end-to-end: you press play on a podcast episode and relevant papers start appearing, linked to specific claims with real citations. The context caching architecture makes this economically viable — processing the full corpus once means each listener's queries cost pennies, not dollars. The mini-podcast feature (where two AI hosts debate a scientific claim) turned out surprisingly compelling. And the knowledge cartography view — visualizing 150+ papers as research territories — gives you an instant map of an entire field that would take weeks to build manually.

What we learned

Long-context models change what's architecturally possible. Being able to load an entire transcript plus hundreds of papers into a single context window eliminates a whole class of retrieval problems. Context caching is the unlock for making this economically sustainable — without it, the per-query cost would be prohibitive. We also learned that the gap between "informal explanation" and "formal literature" is an underserved design space — people are hungry for tools that meet them where they are (listening to podcasts) rather than where academics are (reading journals).

What's next for Noeron

Expand beyond bioelectricity to any science podcast episode — the pipeline is domain-agnostic, it just needs a paper corpus. Add collaborative research notebooks so listeners can build on each other's explorations. Integrate with podcast apps directly so papers surface in the player itself. And build out the "one person's deep synthesis becomes reusable for thousands" vision: when one user explores a claim deeply, that research trail becomes available to every future listener of the same episode.

Built With

Share this project:

Updates