Google Gemini Challenge 2026 submission
EdelWise is a mobile AI assistant that helps elderly users learn to use smartphone apps through real-time voice guidance. Powered by Gemini Live API, it watches the user's screen and speaks step-by-step instructions in natural, adaptive conversation.
The agent guides, not operates — instead of automating tasks for the user, EdelWise teaches them to do it themselves, building confidence and digital literacy.
"I want to send a photo to my friend on Instagram."
EdelWise sees the user's screen, identifies the current state, and speaks: "Great! I can see you've opened Instagram. Now tap the paper airplane icon at the top right corner to open your messages."
- Real-time voice guidance — Gemini Live API for natural, conversational coaching
- Screen understanding — Captures and analyzes screenshots to know exactly where the user is
- Step-by-step task tracking — Progress visualization with advance/retry/abandon logic
- Adaptive pacing — Adjusts instruction detail based on user ability and responses
- Elderly-accessible UI — Large text, high contrast, haptic feedback, minimal cognitive load
- Resilient connectivity — Automatic reconnection, session recovery, and TTS fallback
| Layer | Technology |
|---|---|
| Frontend | React Native 0.81 + Expo SDK 54, React 19, Expo Router 6, TypeScript |
| Backend | Node.js + WebSocket + Gemini Live API + Google Cloud TTS |
| Database | Supabase (PostgreSQL) + Edge Functions |
| Native | Android MediaProjection for screen capture |
- Node.js 20+
- npm
- Expo CLI (
npx expo) - Supabase CLI (
npx supabase) - Google Cloud project with:
- Gemini API key
- Cloud Text-to-Speech API enabled
git clone https://github.com/wenn00/EdelWise-GoogleGeminiChallenge.git
cd EdelWise-GoogleGeminiChallenge
npm installCreate .env in the project root (frontend):
SUPABASE_URL=<your-supabase-url>
SUPABASE_ANON_KEY=<your-supabase-anon-key>
GEMINI_API_KEY=<your-gemini-api-key>
API_BASE_URL=http://localhost:8080
WS_URL=ws://localhost:8080/v1/wsCreate backend/.env:
GEMINI_API_KEY=<your-gemini-api-key>
SUPABASE_URL=<your-supabase-url>
SUPABASE_SERVICE_ROLE_KEY=<your-supabase-service-role-key>
SESSION_JWT_SECRET=<random-secret>
PORT=8080npx supabase start # Start local Supabase
npx supabase db push # Apply migrations
npx supabase functions serve # Start edge functionscd backend
npm install
npm run devnpx expo start
# Press 'a' for Android, 'i' for iOS, 'w' for Web├── app/ # Expo Router screens
│ ├── (tabs)/ # Bottom tab navigator (Home + History)
│ ├── guidance/ # Guidance session flow
│ └── task-select.tsx # App and task selection
├── backend/ # Node.js WebSocket server
│ ├── server.ts # Entry point
│ ├── services/ # Gemini Live, TTS, step engine, prompt builder
│ └── prompts/ # System prompt for agent behavior
├── components/ # React Native components
│ └── ui/ # Elderly-accessible UI primitives
├── services/ # Frontend service layer
│ ├── websocket-manager.ts
│ ├── audio-capture.ts # PCM 16kHz microphone streaming
│ ├── audio-player.ts # TTS playback with jitter buffer
│ ├── session-orchestrator.ts
│ └── ...
├── supabase/ # Database migrations & edge functions
├── modules/ # Native modules (screen capture)
├── context/ # React context (session state)
├── hooks/ # Custom hooks
├── types/ # TypeScript type definitions
└── constants/ # Theme, colors, config
┌──────────────┐ WebSocket ┌──────────────────┐
│ Mobile App │ ◄────────────────► │ Backend Server │
│ (Expo/RN) │ audio + screens │ (Node.js + WS) │
└──────┬───────┘ └────────┬─────────┘
│ │
│ UI / Audio │ Gemini Live API
│ Haptics │ Google Cloud TTS
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ User's │ │ Gemini + TTS │
│ Phone │ │ (Google Cloud) │
└──────────────┘ └────────┬─────────┘
│
┌────────▼─────────┐
│ Supabase │
│ (PostgreSQL + │
│ Edge Functions) │
└──────────────────┘
Flow:
- User speaks a goal → mic audio streams to backend via WebSocket
- Backend forwards audio to Gemini Live API for understanding
- Gemini analyzes user intent + screenshot context → generates guidance
- Backend evaluates step progress, converts response to speech via Cloud TTS
- Audio instruction streams back to the mobile app
- App displays visual overlay + plays voice guidance
npx expo start # Start dev server
npx expo start --android # Android
npx expo start --ios # iOS
npx expo start --web # Web
npx expo lint # ESLint
npm run typecheck # TypeScript type checking- hjcloog — Frontend architecture, UI components, E2E integration
- wenn00 — Backend services, Gemini integration, iOS fixes
Built for the Google Gemini Challenge 2026.