Inspiration
This project addresses a real-world, high-stakes issue: unintelligible emergency radio communication. Our team demonstrates an offline pipeline that enhances and reconstructs speech in disaster scenarios, prototyping a tool that could reduce miscommunication, speed up response times, and potentially save lives.
What it does
First responders and field teams rely on radios that produce noisy, clipped, and hard-to-understand audio. This leads to:
Misheard instructions Missed location or hazard details Slower and less effective response Internet access is often unavailable in these environments, making cloud solutions unreliable.
ClearComms solves this by running transcription and structuring fully on device.
How we built it
- Offline Speech Recognition Audio is transcribed locally using an optimized Whisper model (e.g. Whisper Base En from Qualcomm AI Hub, or Whisper large-v3-turbo).
Engineering focus includes:
Running Whisper with ONNX Runtime (QNN/NPU on Qualcomm hardware) Model optimization and quantization Parameter tuning for noisy radio audio Low-latency on-device inference
- Structured Incident Extraction
The transcript can be processed by a local LLM (e.g. LLaMA via Qualcomm Genie) to turn raw speech into structured outputs. This makes communication faster to interpret and act on.
- Offline End-to-End Pipeline Radio Audio ↓ Whisper (ONNX, on device) ↓ Transcript ↓ Local LLaMA (Genie) [optional] ↓ Structured incident / action summary Everything runs fully offline.
Built With
- fastapi
- llama
- python
- pytorch
- react
- typescript
- whisper
Log in or sign up for Devpost to join the conversation.