Inspiration
Canada is facing a quiet crisis. There are not enough physicians, the ones we have are burning out, and nearly two hours of every doctor's day disappears into paperwork rather than patient care. Across Canada's 92,000 physicians, that adds up to tens of millions of hours of lost clinical time every year. We built ClinicEar around the idea of physician mobility — a doctor should be able to walk into any clinic, anywhere, connect to the local patient database, and start delivering care immediately. No learning a new documentation system. No catching up on reports at the end of a long day. ClinicEar handles the notes so the doctor can focus on what they actually came to do.
What it does
ClinicEar listens to a doctor-patient consultation in real time and automatically handles all clinical documentation, so the physician can stay fully present in the conversation rather than splitting attention between the patient and a keyboard.
- Live transcript — streams in real time with Doctor/Patient speaker labels as the conversation happens
- SOAP note generation — produces a structured clinical note with ICD-10 suggestions, confidence scores per section, and documentation gap flags
- IBM watsonx.ai audit — scores the note 0–100 for completeness and flags non-standard terminology
- Patient summary — translates the clinical findings into plain language the patient can keep for their own records
- Multilingual delivery — if a preferred language is on file, the patient summary is automatically delivered in their mother tongue, so medical terminology is never a barrier to understanding
- Automatic export — the finished SOAP note is written directly into the patient's record when the session ends
- Database-agnostic — plugs into existing healthcare record systems with minimal setup, so there is no lengthy onboarding when moving between clinics or hospitals
How we built it
- Backend — FastAPI + Python, WebSocket proxy to ElevenLabs Scribe v2 for real-time transcription and speaker diarization
- SOAP generation — LLM via OpenRouter / Hugging Face orchestrated through Railtracks, with a structured extraction prompt enforcing strict JSON output
- Audit — IBM watsonx.ai running Llama 3.3 70B, with automatic fallback to the inference server
- Translation — OpenAI detects the patient's preferred language from their record and translates the summary before delivery
- Email delivery — Resend handles patient summary emails with the translated content attached
- Frontend — React 18 + TypeScript + Vite + Tailwind, WebSocket client for live transcript streaming
- Auth + DB — Supabase for JWT-gated endpoints (access / refresh tokens), patient records, and consultation history
- Deployment — Vercel (frontend) + Koyeb (backend)
Challenges we ran into
We faced unpredictable hallucinations from the ElevenLabs realtime speech-to-text API. During brief pauses or moments of ambient noise, the transcription model would occasionally return completely unrelated text payloads, seemingly attempting to auto-complete sentences or heavily infer context that wasn't spoken. Because these "phantom" segments were returned as valid transcriptions, they risked injecting false information into the clinical notes.
We also wanted to use ElevenLabs tooling for our real time speech to text panel, however, in order for us to differentiate between the doctor and patient we needed a technique known as speaker diarization. It is an AI process that partitions audio recordings containing multiple speakers into homogeneous segments, effectively answering "who spoke when". Unfortunately, ElevenLabs does not offer a real time speech to text endpoint that provides speaker diarization. This made it difficult to implement our vision using only ElevenLabs technology.
During the development of the clinical note auditing feature, integrating IBM watsonx.ai presented unexpected challenges, particularly regarding strict environment configuration requirements and occasional API timeouts within the hackathon setup.
Accomplishments that we're proud of
It was immensely satisfying to see the realtime speaker diarization working for the first time locally. This took lots of experimenting and failed ideas such as trying to profile the doctor's voice before recording to make out if it is the doctor speaking or not. This involved trying to embed voice using various embedding models and executing similarity searches on audio inputs we get during recording. Unfortunately, this method was far too reliable to continue pursuing.
To guarantee high availability and consistent auditing, a robust fallback mechanism was implemented. If the Watsonx API is misconfigured, unavailable, or encounters runtime errors, the system automatically fails over and routes the audit request to a secondary inference server via OpenRouter. This dual-layered architecture ensures that SOAP note quality scores, completeness checks, and flagged term evaluations are executed reliably under all conditions.
What we learned
- Reliable AI output is harder than it looks. We developed three extensive prompts through rigorous testing to ensure reproducible, clinically accurate results — getting confidence scores to vary meaningfully, ICD-10 codes to be correct, and documentation gaps to actually catch missing information took far more iteration than expected.
- Conversational AI is still a wide open space. Real-time transcription with accurate speaker diarization turned out to be one of our biggest technical challenges — most APIs offered one or the other, not both. We had to get creative with how we chunked and processed audio to achieve the effect we needed.
- Fallback design is not optional. When you depend on multiple third-party APIs during a live demo, every external call needs a graceful degradation path. Building that in early saved us more than once.
- Rapid collaboration under pressure is a skill. Navigating merge conflicts efficiently across a 36-hour sprint, with parallel workstreams across backend, frontend, prompts, and design, forced us to communicate clearly and make fast architectural decisions we could all commit to.
What's next for ClinicEar
- Real-time two-way translation with audio playback — using ElevenLabs TTS, the doctor and patient could speak in their own languages and hear each other in theirs, eliminating the language barrier entirely during the consultation itself and coupling naturally with the translated patient summary on the way out
- Expanded language support — moving beyond Latin-character languages to cover Mandarin, Arabic, Hindi, and other scripts, making ClinicEar genuinely useful in the diverse communities it is designed to serve
- Real EHR integration — our current database connection is a working mock; the next step is a live HL7 FHIR API integration for true plug-and-play compatibility with existing hospital and clinic systems
- Voice commands — letting physicians flag corrections or add notes hands-free during the consultation, without ever touching a keyboard
- Mobile app — a lightweight PWA or Capacitor wrapper so physicians working in the field are not tied to a desktop browser
Built With
- css
- elevenlabs
- fastapi
- html
- ibm-watson
- koyeb
- node.js
- openrouter
- python
- react
- resend
- supabase
- tailwindcss
- typescript
- uvicorn
- vercel
- vite
Log in or sign up for Devpost to join the conversation.