About the project
Inspiration:
The user is asked a series of questions. Based on the mood of response, the environment changes. The sky becomes darker if the user responds angrily. If they respond joyfully, the sky becomes brighter. Issue: Generating continuous new world models is exceptionally demanding, so world models remain static. We fixed this by having four preloaded environments.
We worked with SensAI — SensAI, a WebXR immersive experience, transforms the user's emotional state into a living, responsive 3D environment. Built for the Pico 4 VR headset, it uses voice-driven conversation, real-time cognitive inference, and Gaussian splat environments to create a world that changes based on emotions.
Core Features
1. Voice-Activated Conversation System The experience begins when the user speaks the wake phrase "Hello World Model." This activates a guided conversation loop where the model asks emotionally targeted questions and listens for the user's natural spoken responses. The system uses the Web Speech API for speech-to-text and browser SpeechSynthesis for text-to-speech, all running through the VR headset's built-in microphone and speakers. Turn-taking is clearly managed: the model speaks (blue indicator), then the user responds (green indicator), with silence detection automatically ending each turn.
2. Real-Time Cognitive Inference (Brain API) User responses are sent to a Python FastAPI backend service powered by OpenAI GPT-4o-mini. The Brain API analyzes speech content and returns scores across four cognitive dimensions — reflection, defensiveness, curiosity, and stress — along with a dominant state, a voice reflection (the model's empathetic response), and a follow-up question. A confidence threshold ensures that only clear emotional signals trigger a scene change, avoiding flicker on ambiguous input.
3. Emotion-Reactive Gaussian Splat Environments The 3D environment is built from Gaussian splat (.spz) files rendered via SparkJS. Five distinct Venice-based environments are preloaded at startup — one neutral and four mapped to cognitive states. When the Brain API detects a dominant emotion above the confidence threshold, the scene switches instantly by toggling splat visibility. Because all environments are resident in GPU memory from the start, transitions are synchronous and immediate with no loading delay.
4. Structured Question Sequence The conversation follows a fixed four-question sequence designed to surface each emotion in order: Defensiveness → Curiosity → Stress → Reflection. Each question is crafted to elicit a specific emotional response. After all four turns, the environment returns to neutral and the system re-enters wake word listening mode for another session.
5. In-VR Voice Status HUD A heads-up display is pinned to the bottom-left of the user's field of view and follows head movement. It provides real-time feedback on every phase of the interaction: Status indicator — Color-coded dot and label showing the current phase (starting up, waiting for wake word, listening, processing, model speaking, error) Model text — The exact question or reflection the model is speaking, displayed in blue so the user can read along User transcript — Live transcription of what the microphone is hearing, shown in green Scene status — The detected emotion and confidence score when a scene change occurs A matching HTML overlay provides the same information on desktop for development and testing.
6. VR Panel UI with Manual Controls A 3D panel rendered in the VR space provides manual override buttons for each cognitive state (Reflection, Defensiveness, Curiosity, Stress). These allow toggling between states without voice input, useful for testing and demonstration. Desktop HTML buttons mirror this functionality.
7. WebXR Immersive VR with Locomotion The experience runs as a full WebXR Immersive VR session with hand tracking support, locomotion (teleport/continuous movement over an invisible floor plane), and object grabbing. The IWSDK framework handles session management, input, and the ECS architecture.
Infrastructure
Dev server: Vite with mkcert for local HTTPS, served on port 8081 Brain server: Python FastAPI with uvicorn on port 8000, CORS-enabled Tunnel: ngrok forwards HTTPS traffic to the local dev server for headset access Environment configuration: .env files for API keys (OpenAI, optional TTS, optional World Labs)
Future-Ready The codebase includes scaffolding for features not yet active: World Labs Marble integration — Client code for programmatic world generation from text prompts (both frontend and backend) Premium TTS — OpenAI voice synthesis GLTF object injection — Plugin and config ready for loading 3D models per cognitive state Controller button mapping — WebXR gamepad input system for direct state switching via Pico 4 controller buttons
Built With
- architecture-layer-technology-purpose-frontend-vite-+-typescript-+-three.js-(super-three)-build-tooling-and-3d-rendering-webxr-framework-iwsdk-vr-session
- ecs
- emotion
- hand-tracking
- https
- locomotion
- lod
- ngrok
- panel-ui-splat-rendering-sparkjs-gaussian-splat-loading
- speech-to-speech
- uikitml
- vr
Log in or sign up for Devpost to join the conversation.