Lumi AI behavior coach (HumanOS)

Inspiration

From "Open-Loop Reminders" to "Closed-Loop Control": Solving the Global Intention-Execution Gap

Our inspiration stems from a universal pain point: "The Intention-Execution Gap." Approximately 366 million adults worldwide have ADHD. For the ADHD brain, the core problem has never been "not knowing what to do," but rather "being unable to start doing it."

Traditional productivity tools (such as alarms and to-do lists) are essentially "open-loop systems." They assume that users will act once they receive a reminder — but this is ineffective for the ADHD brain. This disconnect not only renders tools useless but actually triggers "alarm fatigue" and a deep sense of shame.

The Birth of Lumi

What we set out to build is not another to-do list, but a "Personal AI Behavioral Coach."

Core Theoretical Foundation: From Atomic Habits to Cybernetics

Our theoretical inspiration draws from two seemingly different yet deeply connected fields:

James Clear's Atomic Habits: This book reveals the Four Laws of Behavior Change — make the cue obvious, make the habit attractive, make the action easy, and make the reward satisfying. This provides the methodological framework for our AI "tool layer."
Norbert Wiener's Cybernetics: The core idea of cybernetics is regulating system behavior through feedback loops. A thermostat maintains a stable room temperature not because it issues a single command, but because it continuously senses the current temperature, compares it against the target, and then dynamically adjusts its heating or cooling strategy.

We combined these two with modern Coding Agent architecture and arrived at a key insight: If we can build closed-loop feedback Agents in software systems, why can't we build an equivalent closed-loop Agent system for human behavior?

Thus, Lumi was born — a Closed-Loop Behavioral Agent that can sense human states, compare the gap between reality and goals, and dynamically adjust intervention strategies, aiming to truly bridge the chasm between "I want to do it" and "I did it."

What It Does

Lumi is an AI behavioral coach powered by Gemini 3. It doesn't just remind you — it accompanies you throughout the entire process until action happens. Its core philosophy is: the AI's cyclical feedback process is essentially the same as a Coding Agent's operational loop — we use the same framework, with Human Behavior data as an API.

Lumi's complete workflow can be divided into three core stages: Perception & Planning, Tools & Environment, and Feedback Loop.

1. Perception & Planning

Lumi has built a multi-dimensional context-awareness system:

The Goal Store: The user's original goals (e.g., "I want to go to bed early").
Three-Dimensional Database:
1. Interaction Memory: Your historical conversations and contracts with the AI.
2. Behavior Log: Real-time feedback and operational records during task execution.
3. Physiological Data: Objective body states integrated via Apple Health.

AI Goal Decomposition & Negotiation: When a user sets an ambitious goal, Gemini 3 acts as a "planner," breaking it down into tiny, low-resistance atomic actions (Atomic Habits) based on historical behavioral data.

For example, "go to bed early" gets compiled into:

22:00 — Receive Lumi's reminder, start preparing → 22:15 — Put down the phone → 22:25 — Go brush your teeth → 22:40 — Get into bed, begin pre-sleep relaxation activity

Each step is specific, actionable, and low in cognitive load. This is the process of transforming a vague intention into precise instructions.

2. Tools & Environment

Once it understands you, Lumi needs to "do something" to drive you into action. These tools are derived directly from the behavior change methodology in Atomic Habits, encoded as AI-callable intervention mechanisms that enable deep interactions between AI and humans.

Tool 1: Proactive VoIP Call Intervention — "The First AI That Calls You"

Instead of sending easily-ignored push notifications, Lumi directly places a real-time video call to you.

Why a phone call? Because push notifications are the most easily ignored form of notification, while an incoming call is the hardest to dismiss. During the call, Lumi can sense your hesitation and emotions, gently interrupting your procrastination like a real coach and guiding you to take the first step.

Tool 2: App Lockdown & Commitment Mechanism — "An Innovative Application of Behavioral Engineering"

If you hang up Lumi's call and continue scrolling through short videos, Lumi doesn't give up. It uses the iOS Screen Time API to lock the "temptation apps" you've pre-designated (such as TikTok, social media, etc.).

But this isn't a crude, brute-force block. To unlock, you must complete a "Consequence Commitment" ritual: read aloud or type a statement about the consequences of your behavior (e.g., "I am willing to accept the consequences of not going to bed on time, which will damage my performance tomorrow"). The system verifies your spoken statement via Azure Speech Recognition before unlocking.

The psychological significance of this design is: it's not punishment — it's forced metacognition — making you clearly aware of the choice you're making and its costs, rather than unconsciously sliding into procrastination.

Tool 3: Body Double Mode — "Digital Companionship for Activation"

For the ADHD community, "Body Doubling" (having someone nearby while working) is a widely validated and effective strategy. Many people don't lack the desire to act — they simply can't initiate alone.

Lumi's Body Double mode uses voice and video to make you feel like someone is working alongside you. While you're focused, Lumi suspends the expensive voice model and switches to playing white noise animations, providing low-cost digital companionship. The moment you speak, the AI performs a "Hot Reconnection" with full context, seamlessly resuming the conversation — preserving the "someone is here with me" experience while saving approximately 90% of token costs.

Tool 4: Apple HomeKit Environment Control — "Change the Environment, Change the Behavior"

Atomic Habits emphasizes: rather than relying on willpower, design your environment. Through Apple HomeKit APIs, Lumi directly controls the smart lighting in your home. For example, during your "bedtime preparation" phase, it automatically dims the lights to a warm tone, sending your body an environmental signal that says "it's time to rest."

Tool 5: Visual Verification & Gamified Rewards — "Trust, but Verify"

Leveraging Gemini's multimodal visual verification capabilities, Lumi can verify task completion when you claim you've finished. For example, if you say "I've brushed my teeth," Lumi may ask you to take a photo of your toothbrush, or verify through visual/audio analysis. Upon verification, the system grants gamified coin rewards, reinforcing a positive behavioral loop.

3. Feedback Loop (Reasoning & Feedback Loop)

This is Lumi's core brain. The AI reflects based on human interaction feedback, subjective feelings (Ripple Memory), objective behavior (Task Log), and physiological data (Apple Health).

Detective-Style Reasoning Engine:

Lumi's thinking process resembles a detective investigating a case, following a four-step logic:

Gap Analysis: At this moment, why don't the user's "goal card" and "reality snapshot card" match up?
Root Cause Diagnosis: Using physiological data (e.g., heart rate, sleep) to determine whether it's because "they're too tired" (physiological) or "they're procrastinating" (psychological).
Decision Generation: Based on the diagnosis, decide the strategy. Continue pushing, or suggest resting?
Dynamic Replanning: Transform vague goals into specific action instructions (e.g., "It's 10:15 now. Please put down your phone and start meditating").

Cross-Data Insight Weekly Report: Generates visualized causal analysis reports (e.g., "Friday's social event caused your sleep regression"), helping users build long-term awareness.

How We Built It

We directly encoded behavioral psychology principles into the technical architecture.

1. Core Architecture: 3-Layer AI Collaboration

To resolve the contradiction between real-time responsiveness and logical depth in a single model, we designed a unique "3-Layer Architecture" through the coordination of Gemini Live (WebSocket) and Gemini 3 (Reasoning):

📐 3-Layer AI Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Gemini Live (2.0/2.5 Flash) - Perception Layer     │
│ - Handles real-time voice conversation & emotional support   │
│ - Maintains WebSocket persistent connection                  │
│ - Feature: Ultra-fast response, avoids complex tool calls    │
│   to prevent disconnection                                   │
└───────────────┬─────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Gemini 3 Flash - Intent Detection Layer (Router)   │
│ - Rapidly analyzes conversational context                    │
│ - Determines whether user intent requires external tool call │
└───────────────┬─────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Gemini 3 Flash - Execution & Planning Layer        │
│ - Executes complex analysis functions                        │
│ - Structured data extraction & database read/write           │
│ - Memory injection: dynamic context injection via            │
│   parallel-running AI                                        │
└─────────────────────────────────────────────────────────────┘

2. Tech Stack Details

Frontend Interaction: Gemini Live as the fast multimodal frontend, enabling sub-second voice interaction.
Backend Logic: Gemini 3 Pro model as the brain, responsible for analyzing user resistance patterns, physiological states, and generating strategies.
Infrastructure: Supabase for data storage, TypeScript for the web client, Swift (iOS Native APIs) for the mobile app.
Tool Integration: Deep integration with iOS Screen Time API, CallKit, HomeKit, and Apple HealthKit.

Accomplishments That We're Proud Of

1. Proposing the "Human-as-Tool" Paradigm

In our design documentation, we explicitly introduced the "Human-as-Tool" paradigm shift.

The typical Agent logic is: humans give instructions to AI, and AI calls tools (like calculators, search engines) to execute.

Lumi's logic is reversed: The AI (as an orchestrator) treats the "human" as an API tool to be called upon to execute specific physical actions in order to achieve goals.

2. Technical & Experience Breakthroughs

The First AI Agent That Proactively Calls You: Breaking the paradigm of apps "passively waiting," achieving real-time voice intervention with sub-second latency.
"Voice Commitment" Unlock Mechanism: An innovative application of behavioral engineering. Requiring users to speak the consequences aloud before unlocking is more educationally meaningful and psychologically impactful than a simple block.
Campfire Mode "Hot Reconnection" Technology: Successfully achieved seamless switching between local white noise and cloud-based LLM, preserving the "someone is here with me" experience while saving 90% of token costs.
AI Coach Reflection Layer: The AI self-evolves based on feedback. It knows when to push and when to let go (based on physiological data), achieving truly personalized strategies.

Challenges We Ran Into

1. Gemini Live's Function Calling Dilemma & Architecture Overhaul

We initially attempted to use the Gemini 2.5 Live Stream model directly for tool calls. However, we discovered that complex Function Calling caused the AI to disconnect, with poor execution stability.

Solution: This forced us to develop the "3-Layer AI Architecture" described above. By decoupling the "conversation layer" from the "execution layer" and introducing parallel-running AI for dynamic memory injection, we elegantly solved the challenge of dynamic context invocation with Gemini Live.

2. Balancing the "Good AI" vs. "Bad AI" (Meta-Cognition)

How do you make forceful interventions not feel like an annoying butler?

Solution: We had to teach the AI "metacognition" capabilities. Using Apple Watch physiological data as the "Source of Truth," the AI can distinguish whether a user is genuinely physically exhausted (override rules activate, suggesting rest) or simply psychologically procrastinating (execution strategy activates, continue pushing). This data-driven empathy is Lumi's core moat.

What We Learned

The Power of Interdisciplinary Thinking: When we aligned the principles of Behavioral Science with Computer Science, we discovered striking similarities (e.g., human behavior vs. API calls), which provided an entirely new perspective for product design.
Practical Skills in Vibe Coding: When building high-emotion-engagement Agents, tuning the AI's "tone" and "vibe" is just as important as writing logical code.
Hands-On Experience with Cutting-Edge AI Agents: We gained deep understanding of the real-time capabilities, context management, and tool-calling boundaries of multimodal models.

What's Next for HumanOS (Lumi)

Lumi is just the beginning of HumanOS:

Full-Spectrum Health Ecosystem: Further integrate diet, exercise, and additional Apple Health data. By combining medical data, lifestyle habits, and physiological data, enable disease prediction and longevity management — expanding from single-task management to a comprehensive HumanOS.
Social Accountability System: Introduce a friend-ranking and mutual accountability mechanism similar to Duolingo. Allow friends to set shared goals and monitor each other, leveraging social pressure to further increase execution rates. ## What it does