session paused
themes
locked in

FlowState

FlowState is a voice-first focus companion that listens and watches how you work to detect distraction, idleness, and fatigue. It suggests timely breaks or task switches and helps you stay locked in without rigid timers or pressure.

Inspiration

When you’re working, it’s easy to drift without even realizing it. We wanted something that actually holds you accountable while you work, not just a timer you ignore. FlowState watches and listens while you work, helps manage your task list, and nudges you when you lose focus. If you’re distracted, it suggests a break or a task switch. It also talks to you using voice, so it feels more like an assistant keeping you locked in than another productivity app.

What it does

FlowState listens and watches you in real time to track focus, distraction, and fatigue. It manages your task list, maintains a live productivity score, and suggests breaks or task switches when your focus drops. You can talk to it naturally to add or remove tasks, start or end breaks, and switch focus modes.

How we built it

We use OpenCV and MediaPipe for real-time camera analysis to detect presence, distraction, and fatigue. These signals feed into a TypeScript frontend that displays focus levels, session state, and task management.

Voice input is handled through open mic or push-to-talk and sent through ElevenLabs for speech-to-text. Gemini processes the transcribed input to understand user intent and convert natural speech into structured commands like adding or removing tasks or controlling breaks. Together, ElevenLabs and Gemini allow users to interact with FlowState hands-free while staying focused on their work.

Challenges we ran into

Interpreting casual speech reliably was difficult. Gemini sometimes hallucinated or misidentified tasks when users referenced them indirectly. We also ran into issues with ElevenLabs occasionally transcribing speech into the wrong language, which required tighter constraints and validation.

Accomplishments that we’re proud of

We’re especially proud of the real-time facial detection system and the end-to-end pipeline that connects camera signals, voice input, Gemini reasoning, and live UI updates into a seamless feedback loop.

What we learned

We learned the differences between Gemini’s generative and non-generative models and how to use them for both reasoning and decision-making. We also learned how to extract meaningful focus signals from camera input and reliably turn speech into structured commands for an LLM.