Inspiration
Journaling is a powerful tool for mental clarity, but it is often a one-way street. We wanted to transform the static, passive diary into an active, intelligent companion. Inspired by the need for Radical Privacy in healthcare, we built Vera: a "Digital Confidant" that doesn't just store your words - it hears the emotion behind them. We wanted to create a space where users can be their most vulnerable without the fear of their mental health data ever touching a cloud server.
What it does
Vera is a privacy-first, on-device mental health companion that bridges the gap between a private diary and a responsive therapist.
Acoustic Emotion Analysis: Using Wave2Vec converted to CoreML for on-device audio processing, Vera detects the emotional tone and sentiment of a user's voice, understanding the "how" behind their words in real-time.
Voice Pipeline: Vera listens using Apple's SFSpeechRecognizer for fully on-device speech-to-text transcription with automatic silence detection. For high-quality text-to-speech, we deployed Kokoro TTS natively on-device via the FluidAudio framework, as built-in system voices were less than desirable for a companion that needs to sound warm and human.
Intelligent Memory (RAG): Vera uses a fully on-device Retrieval-Augmented Generation pipeline to ground responses in real health knowledge. User queries are embedded in real-time using a MiniLM-L6-v2 sentence-transformer model (exported to Core ML), then matched against a pre-computed vector index of health documents via Accelerate-powered cosine similarity search. The top-matching chunks are injected into each conversation turn with an adaptive token budget that tapers across turns to balance context richness with conversation history. Everything runs locally; no data leaves the device.
On-Device LLM: Conversations are powered by NVIDIA Nemotron-Mini-4B-Instruct, a 4-billion parameter language model running entirely on-device via llama.cpp (Q4_K_M 4-bit quantization, ~2.7 GB). The model uses a persistent KV cache for multi-turn context, Nemotron chat formatting, and automatic context-overflow recovery, delivering real-time, private conversational AI without any server dependency.
Apple Watch Health Integration: Vera syncs with HealthKit to pull real-time biometric data from Apple Watch — heart rate, HRV, sleep duration, step count, and respiratory rate. Vera provides insights about these metrics to the users.
Picture-in-Picture: Vera supports Picture-in-Picture mode, so the conversation stays with you even when you switch to other apps. The floating overlay shows live conversation status, waveform bars, and mood color, so you never lose your connection with Vera while multitasking. Active Support: Instead of just recording text, Vera engages in a therapeutic dialogue, helping users process their feelings by reflecting on their unique history.
Zero-Cloud Privacy: By processing all audio and data locally, Vera ensures that a user's most intimate thoughts remain strictly on their device - a "Mental Health Vault." How we built it
We focused on a high-performance, native stack to ensure seamless local execution: Audio & Speech: We utilized SFSpeechRecognizer for high-accuracy local transcription, a CoreML-exported Wave2Vec model for emotion classification on vocal patterns, and Kokoro TTS via FluidAudio for natural-sounding on-device voice synthesis.
Knowledge Engine: We implemented a local RAG pipeline using MiniLM-L6-v2 embeddings on Core ML with Accelerate-powered vector search. This allows the app to query a mental health guideline vault to provide responses grounded in robust advice and insight.
Intelligence: The conversational layer is powered by NVIDIA Nemotron-Mini-4B-Instruct running via llama.cpp, ensuring the "therapist" logic never requires an internet connection.
Biometrics: HealthKit integration pulls real-time Apple Watch data (HR, HRV, sleep, steps, respiratory rate) so Vera can reference actual numbers and provide insight to the user.
Frontend: A clean, minimal SwiftUI interface designed to reduce cognitive load and keep the focus on the user's journey, with Picture-in-Picture support via AVKit so Vera stays present across apps.
Challenges we ran into
The primary technical hurdle was implementing the various ML models and RAG on-device. Managing weights and vector embeddings and efficient inference within the processing and memory constraints of a mobile device required a very lean architecture. We also had to port over weights, a fine-tuned version of Wave2Vec for our emotion detection to CoreML to ensure it could accurately distinguish between subtle vocal shifts without relying on massive, cloud-based GPU clusters. Running five concurrent ML workloads — speech recognition, emotion classification, RAG embeddings, a 4B parameter LLM, and TTS synthesis — on a single mobile device required careful orchestration of the CPU, GPU, and Neural Engine to avoid thermal throttling and memory pressure.
Accomplishments that we're proud of
We are incredibly proud of achieving Zero-Leak Privacy. Demonstrating a fully functional "Therapist-Diary" with RAG that works entirely in Airplane Mode was our "Eureka" moment.
What we learned
We gained deep experience in optimizing CoreML models to run efficiently on the Apple Neural Engine. As well as how to build Cascade Models on edge devices in an efficient way and with a good user experience.
What's next for Vera
The future of Vera is about deeper integration into the user's life.
Actionable Outputs: Integrating with services like Spotify to suggest mood-shifting music based on the detected vocal sentiment. Self-care plan at the end.
Cross-Device Sync: Implementing encrypted, peer-to-peer syncing so users can access their "Vera Vault" across their devices without cloud intermediaries.
Built With
- coreml
- edge-inference
- fluidaudio
- llama.cpp
- nvidia-nemo
- sfspeech
- swift

Log in or sign up for Devpost to join the conversation.