RarePath

Inspiration

300 million people worldwide live with a rare disease, and the average patient waits 5-7 years and sees 8 doctors before getting a correct diagnosis. 95% of rare diseases have no FDA-approved treatment, making early diagnosis critical. We built RarePath because clinicians — especially in resource-limited settings — shouldn't have to memorize 4,335 rare conditions to help their patients. We wanted to put the entire Orphanet disease database, real-time medical literature, and multilingual voice briefings into a single tool that works in under 15 seconds.

What it does

RarePath is an AI-powered diagnostic assistant that maps patient symptoms to rare diseases. A clinician enters symptoms by typing or speaking into an Omi AI wearable. The system:

Standardizes symptoms to HPO ontology codes (19,944 medical terms) using 3-tier fuzzy matching
Cross-references against 4,335 Orphanet rare diseases via a reverse index
Generates a ranked differential diagnosis (top 5 candidates with probabilities) using an LLM
Recommends confirmatory tests with cost estimates optimized for resource-limited settings
Pulls live research citations (clinical trials, recent advances, specialist centers) via Perplexity Sonar
Delivers on-demand audio briefings in 10 languages for patients and families
Exports professional clinical PDF reports
Visualizes privacy-preserving federated learning across 7 simulated global hospital nodes

How we built it

Backend: Python FastAPI server handling the full diagnostic pipeline. We built a fuzzy symptom matcher using rapidfuzz (token_sort_ratio with prefix, substring, and fuzzy tiers) that maps casual language like "bad headaches" to formal HPO codes like HP:0002315. A precomputed reverse index maps HPO codes to Orphanet diseases for instant lookup. The LLM (Groq Llama 3.3 70B) receives the matched HPO terms and generates a structured differential diagnosis with probabilities, explanations, and confirmatory tests.

Federated Learning: We implemented real Federated Averaging (FedAvg) in pure NumPy — no simulation. A 2-layer neural network (8,701 HPO features, 128 hidden units, 200 disease classes) trains across 7 hospital nodes with non-IID data splits. Each hospital gets ~60% of diseases with geographic overlap. Synthetic patients are generated by sampling 40-80% of each disease's HPO symptoms. Model weights are averaged each round — no patient data is shared. The dashboard animates the full 20-round training process with a live convergence chart.

Voice: ElevenLabs Multilingual v2 generates audio on demand. For non-English languages, the LLM translates the clinical summary first, then ElevenLabs synthesizes it. Supports English, Spanish, Hindi, French, German, Portuguese, Chinese, Japanese, Korean, and Arabic.

Omi Integration: The Omi AI wearable streams real-time transcripts to our FastAPI webhook via ngrok. Transcript segments accumulate within a conversation session, and symptoms are continuously mapped to HPO terms as the clinician speaks.

Frontend: React 19 with Vite, custom dark-mode CSS (no component library). Features include side-by-side diagnosis comparison, animated SVG network topology for the federation dashboard, and a language button grid for voice briefings.

Research: Perplexity Sonar API fetches real-time literature for the top diagnosis — active clinical trials, recent publications, treatment advances, and specialist centers with cited sources.

Challenges we ran into

Getting the Omi wearable webhook working locally required debugging through multiple layers: ngrok tunneling, payload format mismatches (the Omi API sends segments in different structures depending on the trigger), FastAPI route conflicts, and a uid parameter mismatch between what Omi sends and what the frontend polls. The fuzzy HPO matching was tricky to tune — too aggressive and casual speech like "I feel tired" matches dozens of irrelevant terms, too conservative and real symptoms get missed. We landed on a 75 threshold with token_sort_ratio. Port conflicts from earlier hackathon processes forced us to switch from port 8003 to 8005 mid-build.

Accomplishments that we're proud of

The full pipeline — from speaking symptoms into a wearable to receiving a ranked diagnosis with live research citations and a voice briefing in your language — works end-to-end in under 15 seconds. The federated learning dashboard runs real FedAvg training on actual Orphanet disease data, not a canned animation. The 3-tier HPO matcher handles both medical terminology ("bilateral sensorineural hearing loss") and casual descriptions ("can't hear well") and maps them to the same standardized codes. The system recommends low-cost confirmatory tests first, prioritizing global health equity.

What we learned

Rare disease data is surprisingly well-structured thanks to Orphanet and HPO, but bridging the gap between how patients describe symptoms and formal medical ontology is the hardest NLP problem in this space. Federated learning convergence is significantly harder with non-IID data — our model hit 78.6% accuracy across 200 disease classes, but IID splits would likely reach 90%+. Real-time webhook integration with hardware devices requires flexible payload parsing because the same API can send different formats depending on the trigger event. On-demand voice generation is far more practical than pre-generating audio for every language — latency is acceptable and you avoid wasting API calls.

What's next for RarePath

Clinical validation with real diagnostic cases and physician feedback
EHR integration via FHIR for seamless workflow embedding
Expanding beyond Orphanet to cover additional rare disease databases (OMIM, GARD)
Deploying real federated learning across partner institutions
Mobile app for point-of-care use in rural clinics
FDA pathway exploration for clinical decision support classification