Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We present RealWonder, the first real-time system for action-conditioned video generation from a single image. Our key insight is using physics simulation as an intermediate bridge: instead of directly encoding continuous actions, we translate them through physics simulation into visual representations (optical flow and RGB) that video models can process. RealWonder integrates three components: 3D reconstruction from single images, physics simulation, and a distilled video generator requiring only 4 diffusion steps. Our system achieves 13.2 FPS at 480×832 resolution, enabling interactive exploration of forces, robot actions, and camera controls on rigid objects, deformable bodies, fluids, and granular materials.
RealWonder: Real-Time Physical Action-Conditioned Video Generation
Project Page | Paper
Wei Liu*, Ziyu Chen*, Zizhang Li, Yue Wang, Hong-Xing (Koven) Yu†, Jiajun Wu†
Stanford University, University of Southern California
*Equal contribution †Equal advising
conda env create -f default.yml
conda activate realwondercd submodules/sam_3d_objects
export PIP_EXTRA_INDEX_URL="https://pypi.ngc.nvidia.com https://download.pytorch.org/whl/cu121"
pip install -e '.[dev]'
pip install -e '.[p3d]'
export PIP_FIND_LINKS="https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.5.1_cu121.html"
pip install -e '.[inference]'
./patching/hydra
cd ../..pip install 'huggingface-hub[cli]<1.0'
TAG=hf
hf download --repo-type model --local-dir checkpoints/${TAG}-download --max-workers 1 facebook/sam-3d-objects
mv checkpoints/${TAG}-download/checkpoints checkpoints/${TAG}
rm -rf checkpoints/${TAG}-downloadcd submodules/sam2
pip install -e .
cd checkpoints && ./download_ckpts.sh && cd ..
cd ../..cd submodules/Genesis
git checkout 3aa206cd84729bc7cc14fb4007aeb95a0bead7aa
pip install -e .
cd ../..pip install -r requirements.txthf download ziyc/realwonder --include "Realwonder-Distilled-AR-I2V-Flow/*" --local-dir ckpts/
hf download alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP --local-dir wan_models/Wan2.1-Fun-V1.1-1.3B-InPTested on NVIDIA H200 GPU with CUDA 12.1.
pip install -r demo_web/requirements.txtcd demo_web
python app.py \
--demo_data demo_data/lamp \
--checkpoint_path /path/to/checkpoint.ptRun physics simulation:
python case_simulation.py --config_path demo_data/lamp/config.yamlRun video generation from simulation results:
python infer_sim.py \
--checkpoint_path ckpts/Realwonder-Distilled-AR-I2V-Flow/sink_size=1-attn_size=21-frame_per_block=3-denoising_steps=4/step=000800.pt \
--sim_data_path result/lamp/final_sim \
--output_path result/lamp/final_sim/final.mp4@misc{realwonder2026,
title={RealWonder: Real-Time Physical Action-Conditioned Video Generation},
author={Liu, Wei and Chen, Ziyu and Li, Zizhang and Wang, Yue and Yu, Hong-Xing and Wu, Jiajun},
year={2026},
eprint={2603.05449},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.05449},
}