Rewrite

Rewrite
GIF
Front-end startup on mobile devices
GIF
Revisiting Neil Armstrong’s Moon landing
GIF
Noah walks with VR goggles on
The DGX Spark we used
Our Modal dashboard (we spent $110+)

Inspiration

We were inspired by the idea of bringing any image or idea to life as an explorable world. Whether it's stepping into a historical photograph, wandering through a painting, exploring a fantasy landscape from your imagination, or experiencing a scene from a book—we wanted to create a tool that could transform any static moment into an immersive, interactive experience. History education is one powerful use case we envision, whether in museums or classrooms, but the possibilities are endless.

What it does

If the Cuban Missile Crisis had escalated instead of de-escalated, what would the world look like today? Rewrite turns history into an explorable, living world. A teacher enters a VR environment generated in real time by diffusion models, while an LLM agent reasons about historical trends and counterfactual decisions to evolve the world dynamically. The experience is streamed live to students via Zoom, who can propose alternate timelines that directly influence what the teacher sees, with a narration agent explaining the changes as they happen.

How we built it

Rewrite is built on Hunyuan Worldplay 1.5 as the core model, which generates environments in real time from initial images or text prompts. On the frontend, a mobile phone mounted on the VR headset uses its built-in IMU sensors to track the teacher’s position and head orientation. These signals are transformed into model-interpretable movement commands, such as WASD and directional inputs. The VR experience is implemented in Unity using C#. Python handles model inference, while a TypeScript backend connects visual context to audio generation with Suno AI. The system continuously generates the next video frame, with user movements influencing the prompt stream in real time.

Challenges we ran into

Getting streaming to work across all components while maintaining performance was a major challenge. Our initial inference ran at around 1 fps even on an NVIDIA DGX Spark. To reach real-time performance, we deployed the model on Modal using H200 GPUs x4, with tensor parallelism and applied extensive low-level optimizations, including using splitting frames into chunks, FlashAttention, SageAttention, etc. In the end, we achieved the inference speed of 12 fps, which is enough for real-time streaming. We also encountered networking issues when connecting to NVIDIA DGX in the beginning, which we ultimately resolved by using Tailscale to connect machines across subnets.

Accomplishments that we're proud of

We're proud of achieving real-time performance. Going from 1 fps to a 12 fps, explorable experience was a big technical challenge. Successfully implementing the streaming architecture between all our components (video generation, audio synthesis, VR rendering, and user controls) was another major accomplishment that required careful coordination and optimization.

What we learned

This project gave us deep insights into streaming video and audio at scale, and we learned a lot about running extremely compute-intensive ML models in production-like environments. We gained hands-on experience with the practical challenges of real-time generative AI.

What's next for Rewrite

We want to give users even more control over their worlds. Our next steps include allowing users to directly prompt the environment in real-time with custom commands and integrating VR motion tracking directly with the world controls, so head movements and hand gestures can influence exploration. The goal is to create a fully interactive platform where anyone can explore and shape the worlds they imagine.