Inspiration
World models like Marble can generate a photorealistic 3D environment from a text prompt in seconds. But once you put on the headset, you're alone in it. The AI that built the world immediately forgot it existed — there's no intelligence inside, no guidance, nothing that understands the space you're standing in.
We kept asking: what if the world knew you were there? What if there was something already inside it when you arrived, ready to explore with you? That's the gap Scout fills. Generated worlds are stunning but passive. We wanted to make them alive.
What it does
Scout is an AI agent that lives inside a Marble-generated 3D world and explores it with you. You speak a mission — "scout this space station for entry points" — and the world generates around that prompt. You step inside via a PICO headset, and Scout immediately gets to work: it reasons about the spatial structure of the environment, narrates what it finds in real time, and drops 3D pins at points of interest exactly where it's describing them.
The loop is: voice input, world generation, spatial reasoning, waypoint placement, voice output. That's a complete perceive-reason-act cycle happening inside a generated environment. You're not just looking at a world model — you have an intelligent partner operating inside it with you.
How we built it
We built Scout entirely in WebXR — no Unity, no APK, runs directly in the PICO browser. That decision saved the project given the 28-hour timeline.
The Marble API generates Gaussian splat worlds from a text prompt, which we load directly into a Three.js WebXR scene. A lightweight Node.js proxy handles API calls to avoid CORS issues in the headset browser. Voice input comes through the Web Speech API, which gets passed to Claude with a structured world descriptor prompt. Claude returns JSON containing a narration script and normalized 3D waypoints. Those coordinates get mapped into real positions in the scene, pins render in 3D space, and ElevenLabs reads the narration back through the headset audio. The whole pipeline runs end to end in a few seconds.
Challenges we ran into
CORS was the first wall — both the Marble and Claude APIs needed a server-side proxy to work inside the PICO browser environment, which ate a couple of hours early on.
The harder problem was spatial grounding. Claude returns normalized waypoint coordinates, but Marble worlds have no built-in metadata or spatial map — the world is just a Gaussian splat render. Getting the agent's abstract reasoning about "north corridor" or "entry point" to land on positions that actually made sense in the scene required careful prompt engineering and a coordinate mapping layer we had to build from scratch.
Web Speech API reliability inside the PICO browser was also inconsistent, so we built a text input fallback that ended up being cleaner than expected for the demo.
Accomplishments that we're proud of
It works on the headset. That was not a given at 11 AM Saturday with zero setup and no Unity experience on the team. The decision to go WebXR instead of fighting the Unity-Android-PICO pipeline was the right call and we're proud we made it fast.
More than the technical side, we're proud that the agentic loop is real. Scout isn't a chatbot dressed up as an agent — it genuinely perceives the world structure, reasons about it, and takes physical action in the scene by placing markers in 3D space. That's the full criteria for Agentic Mission Control, and we actually hit it.
What we learned
World models are more powerful as context than as scenery. Marble isn't just generating a backdrop — it's generating a spatial environment that an AI agent can reason about. That's a fundamentally different value proposition than "pretty VR environment," and it only clicked for us once we were deep in the build.
What's next for Scout
For game design, a developer could generate a level and ask Scout to find balance issues: choke points, dead zones, cover that's too strong on one side. For architecture, you generate a building from plans and ask Scout to flag accessibility problems or emergency exit coverage. For training simulations — military, medical, emergency response — Scout becomes a guide that adapts in real time based on what the trainee does.
The version we shipped today forgets everything when the session ends. Give Scout persistent world memory and it becomes something much more interesting: an agent that actually knows the history of a place.
Log in or sign up for Devpost to join the conversation.