landing page
library
world

Flow

speak a concept, step inside it in 3d

Inspiration

education is stuck with textbooks and powerpoints. what if you could just say "show me ancient rome" and walk around inside it? we wanted spatial learning that feels like stepping into the concept itself.

What It Does

flow converts voice commands into explorable 3d gaussian splat environments. you speak or type a concept, wait ~5 mins while 6 apis chain together, then first-person explore a photorealistic space with educational overlays. press 't' mid-exploration to ask questions about what you're seeing and get voice responses.

the flow:

deepgram captures your voice → gemini orchestrates educational content
gemini generates cinematic image → marble converts to gaussian splat
sparkjs renders .spz file at 60fps → you wasd around with collision detection
screenshot → gemini vision → elevenlabs narration for contextual q&a

scene library checks local files first (free), then mongodb saved scenes, then generates new. rate limited to prevent api abuse. admins bypass cooldown.

How We Built It

frontend: react + typescript + three.js + sparkjs for gaussian splat rendering
backend: express + socket.io for websocket pipeline updates
storage: vultr object storage for .spz files, mongodb for scene metadata
apis: deepgram stt → gemini orchestration + image gen → marble 3d conversion → elevenlabs tts

the pipeline runs async with real-time progress (orchestrating → generating_image → creating_world → loading_splat → complete). collision uses sphere-based raycasting against glb meshes. voice q&a screenshots your view, sends to gemini vision, responds via elevenlabs.

Challenges We Ran Into

deepgram websocket dying instantly until we explicitly declared linear16 pcm at 48khz mono
gemini model compatibility issues solved with backend proxy and fallback chain
marble api cors blocked client calls, built express proxy for full async workflow
collision detection needed multiple raycasts for smooth wall sliding
converting data uris to file objects for formdata backend upload

Accomplishments That We're Proud Of

6-api integration with real-time websocket feedback for 5-minute world generation. scene library prevents redundant calls. gaussian splats render at 60fps with collision. voice q&a uses gemini vision to answer based on what you're actually looking at. production-ready with rate limiting, auth, error handling.

What We Learned

gaussian splatting enables photorealistic browser 3d without traditional meshes. websockets essential for long async operations. gemini image quality works for 3d conversion when prompts are optimized. backend proxy solves cors and enables rate limiting. scene library system pays off fast for popular concepts.

What's Next

improve collision mesh processing, multi-user collaborative exploration, vr/ar support, ai tutoring guide that follows you through scenes, educator tools for custom experiences, community marketplace for user-generated worlds.

Sponsor API Integration

deepgram: streaming stt with flux model, voice q&a capture, command pattern matching
gemini: orchestrates educational content, generates images via 2.0-flash-exp-image-generation, vision api for screenshot analysis, fallback model chain
elevenlabs: educational narration, voice q&a responses, integrated real-time audio
mongodb atlas: stores scenes with metadata, scene library queries, user collections
vultr: object storage for .spz files, thumbnails, collider meshes, cors proxy