Evidence Synthesis

Inspiration

Every day, 7,000 new research papers are published. By the time you finish reading this sentence, another paper just dropped. Here's the nightmare: most of them contradict each other. One study says caffeine boosts memory. Another says it does nothing. A third says it only works if you're sleep-deprived. Scientists, policymakers, and journalists are drowning in information but starving for intelligence. Existing tools like Perplexity retrieve text. They don't resolve conflicts. They don't explain why studies disagree. They don't tell you which evidence to trust. We built EvidenceSynthesis because we were tired of "it depends" being the only honest answer to complex questions.

What it does

EvidenceSynthesis is a research intelligence platform that performs cross-document reasoning with explicit uncertainty quantification. Upload 2-5 research papers on any topic. Our system: Reads like a researcher — Gemini 3.0 Pro natively understands PDFs including text, tables, figures, and scanned pages (not just OCR) Detects contradictions automatically — Identifies when studies make opposing claims on the same outcome measure Explains root causes — Tells you why they disagree (different populations, methodologies, measurement biases) Quantifies uncertainty — Confidence scores for every synthesis step, sensitivity analysis showing how conclusions shift when specific studies are excluded Visualizes evidence consensus — Interactive Agreement Map showing which papers agree, which conflict, and why The "What If" mode lets you remove studies in real-time and watch confidence levels change. This is sensitivity analysis no existing tool provides.

How we built it

Architecture: Frontend: Next.js 14 (App Router), TypeScript 5.3, Tailwind CSS, shadcn/ui State Management: Zustand (sliced stores), React Query (5-minute staleTime caching) Visualization: React Flow (agreement networks), Recharts (evidence quality) AI Engine: Google Gemini 3.0 Pro with native multimodal PDF understanding Backend: Firebase (Firestore, Authentication, Hosting) Development: Antigravity agentic platform — AI agents built the full-stack architecture Technical Deep Dive: The critical innovation is native multimodal PDF processing. We don't extract text with pdf-parse (which fails on tables, figures, scanned pages). Instead, we send PDFs directly to Gemini 3.0 as base64-encoded documents. Gemini reads: Text layers and OCR'd content Tables with numerical data Figure captions and visual context Complex layouts and formatting This enables true cross-document synthesis. When Paper A claims "caffeine improves reaction time (p<0.01, n=200)" and Paper B claims "no significant effect (p=0.34, n=50)", our system doesn't just spot the contradiction. It reasons about why — different sample sizes, different populations, different measurement protocols — and scores which evidence is stronger. Strict Zod validation ensures structural integrity of all AI outputs. Zero TypeScript errors in production build.

Challenges we ran into

Challenge 1: PDF Processing Hell We started with pdf-parse. It failed on 60% of real research papers — scanned documents, complex tables, embedded figures. Judges would upload a paper and see "Failed to process." Disaster. Solution: Purged pdf-parse entirely. Switched to Gemini 3.0's native multimodal PDF understanding. Now the system reads documents as visual+textual artifacts, not just extracted strings. Challenge 2: Static Export vs Dynamic Routes Next.js output: 'export' breaks dynamic routes (/dashboard/[id]). Firebase Hosting requires static files. Solution: Refactored to query parameters (/dashboard?id=xxx) with Suspense boundaries. Maintained SPA behavior while satisfying static hosting constraints. Challenge 3: Synthesis Consistency Early Gemini outputs varied wildly — sometimes detecting contradictions, sometimes missing obvious conflicts. Solution: Brutal prompt engineering with explicit reasoning chains. Required structured JSON with confidence calibration. Added Zod validation to reject malformed outputs and retry. Challenge 4: Demo Reliability Live PDF upload could fail mid-demo due to network, file size, or edge cases. Solution: Built dual-mode system. Live multimodal upload for real use. Pre-seeded "Quick Demo" with 3 caffeine studies for instant, guaranteed showcase. Both use identical synthesis engine.

Accomplishments that we're proud of

True Multimodal Understanding We don't extract text and discard visuals. Gemini processes the full PDF — tables, figures, scans, equations. This is research-grade document intelligence, not chatbot retrieval. Cross-Document Reasoning Most AI tools answer "What does this paper say?" We answer "What does the evidence say, where does it conflict, and why?" This requires synthesizing claims across sources, not summarizing individually. Uncertainty Quantification Every synthesis includes confidence scores, methodological limitations, and explicit uncertainty flags. We tell users when we don't know. This is scientific integrity encoded in software. 48-Hour Production Build Zero TypeScript errors. Strict Zod validation. Firebase deployment. Mobile responsive. Dark mode. Guided tour. Built with Antigravity AI agents orchestrating the architecture. "What If" Sensitivity Analysis Real-time client-side recomputation showing how conclusions change when studies are excluded. This is meta-analytic thinking made interactive.

What we learned

PDFs are visual documents, not text files. The research community uses complex layouts, embedded tables, and scanned legacy papers. Text extraction alone misses half the signal. Multimodal AI is essential for research tools. Synthesis > Summarization. Retrieving 10 paper abstracts is easy. Understanding that 3 agree, 4 contradict, and 3 are irrelevant requires reasoning. The hard problem is judgment, not memory. Uncertainty is a feature, not a bug. Users don't want false confidence. Explicitly modeling uncertainty — confidence scores, limitations, sensitivity analysis — builds trust and enables better decisions. Agentic development 10x's speed. Antigravity agents handled boilerplate architecture. We focused on prompt engineering and product logic. 48 hours to production-grade full-stack app. Demo reliability beats demo flash. A working "Quick Demo" with pre-loaded data impresses more than a broken live upload. Build for the worst-case presentation scenario.

What's next for Evidence Synthesis

Immediate (Post-Hackathon): Citation network analysis (which papers cite which, detecting citation bias) Automated meta-analysis effect size calculation Export to PRISMA-compliant systematic review formats Short-term (3 months): Integration with Zotero/Mendeley for research workflow Collaborative synthesis (teams annotating contradictions together) API for research institutions to integrate into existing tools Long-term (12 months): Real-time monitoring of new publications in a field, alerting researchers to contradictions with their existing synthesis Domain-specific fine-tuning (medical trials, climate science, economics) Evidence-based policy brief generation with uncertainty quantification for government agencies