CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video
Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan
CVPR 2026
CubeComposer turns perspective videos into native 4K 360° videos without memory blow‑up.
TL;DR: Generate one cubemap face per time window with an effective and efficient context mechanism. Then, perspective video becomes 4K 360° without the memory blow‑up or the low‑res‑then‑upscale.
CubeComposer generates 360° video in a cubemap face‑wise spatio‑temporal autoregressive manner. Each step generates one face over a fixed temporal window, which greatly reduces peak memory and enables native 2K/3K/4K generation.
For more details, please visit our project page.
-
ODVista 360 dataset
Download the ODVista360dataset according to its license and place it, for example, at:/path/to/ODVista360
The expected structure is:
ODVista360/ train/HR/ ... val/HR/ ... test/HR/ ...You also need to set
ODV_ROOT_DIRinrun.shto this path later.For running the perspective-to-360° video generation test, please also extract the test caption zip file (ODVista360-test-captions.zip) and put the caption folder into test/HR/ folder of ODVista 360 on your disk. We will release our filtered 4K360Vid dataset with face-wise captions later.
Clone this repo and chage directory to the project root. Ensure your have installed ffmpeg and added it to PATH (for video saving). Then, run:
conda create -n cubecomposer python=3.10
conda activate cubecomposer
pip install -r requirements.txtThis installs all required packages. Please note that this requirements.txt assumes the environment is a Linux platform with CUDA 12.4. Modify as needed for other platforms or CUDA versions.
The project already contains modified, embedded versions of diffsynth and equilib. You do not need to install these separately in your Python environment.
There are two types of weights:
- Wan2.2 base model cache (used via
diffsynthcache) - CubeComposer checkpoint (our weights)
Set BASE_MODEL_PATH to your diffsynth cache directory. If this cache is empty, the code will automatically download the Wan weights there on first use.
Example:
BASE_MODEL_PATH="/path/to/diffsynth/cache" # set this in run.shDownload our CubeComposer checkpoints from Hugging Face.
We provide two variants in a single model repo:
https://huggingface.co/TencentARC/CubeComposer
CubeComposer/
cubecomposer-3k/
model.safetensors
args.json
cubecomposer-4k/
model.safetensors
args.json
The cubecomposer-3k variant is used for 2K/3K generation (internally using a cubemap size of 512/768, temporal window length of 9 frames),
and cubecomposer-4k is used for 4K generation (cubemap size 960, temporal window length of 5 frames).
You can either:
- download a checkpoint (e.g.
.safetensors) andargs.jsonlocally and point the script to them, or - let the test script automatically download the correct pair from the Hugging Face repo by using
--test_mode.
If you prefer manual paths, place the checkpoint and args.json somewhere accessible and set the corresponding variables in run.sh:
CHECKPOINT_PATH="/path/to/your/cubecomposer_checkpoint.safetensors"
Perspective-to-360° video generation is driven by the script run.sh, which calls the Python test entry (run.py in this repo).
This test script automatically extracts perspective videos from the 360° video dataset (ODVista360) and uses them as the input video.
Open run.sh and configure:
BASE_MODEL_PATH–diffsynthmodel cache directoryARGS_JSON– (optional) path toargs.json. If omitted or the path does not exist, the script will auto‑download the properargs.jsonfrom Hugging Face based onTEST_MODE.CHECKPOINT_PATH– (optional) path to the CubeComposer checkpoint (.safetensors) you want to test. If omitted or the path does not exist, the script will auto‑download the proper checkpoint from Hugging Face based onTEST_MODE.ODV_ROOT_DIR– ODV360 dataset root (e.g./path/to/ODVista360)TEST_OUTPUT_DIR– where to save generated videosNUM_SAMPLES,START_IDX– which test samples to processNUM_INFERENCE_STEPS,CFG_SCALE– inference hyper‑parametersTEST_MODE– target resolution mode, one of2k,3k,4k(default3k).
TEST_MODE controls which CubeComposer variant is used and the corresponding cubemap size:
2k/3k→ usecubecomposer-3k(cubemap size 768)4k→ usecubecomposer-4k(cubemap size 960)
If ARGS_JSON or CHECKPOINT_PATH are empty or invalid, the script will fall back to the 3K model (cubecomposer-3k) by default.
Trajectory file (camera path) is passed via --trajectory_file in run.sh. By default, we use:
./input/trajectory_rotation_fov90_2wp_20samples.json
You can replace this path with any other trajectory JSON you prepare (see below).
After editing run.sh, simply run:
bash run.shrun.py will:
- load model/dataset arguments from
ARGS_JSON - load the CubeComposer checkpoint from
CHECKPOINT_PATH - read the trajectory definitions from
--trajectory_fileand enforce them for each sample - run panoramic video generation on ODV360
test/HR - save outputs under
TEST_OUTPUT_DIR, including:- input perspective videos
- generated equirectangular videos
- generated cubemap faces
generation_info.jsonwith generation order and camera trajectory per sample
If you want custom camera paths, you can use export_trajectory.py, for example:
python export_trajectory.py \
--num_samples 20 \
--trajectory_mode rotation \
--fov_x 90 \
--num_waypoints 2 \
--output_json trajectory_rotation_new.jsonThen point --trajectory_file in run.sh to this new JSON.
If you find our work helpful in your research, please star this repo and cite:
@article{li2026cubecomposer,
title={CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video},
author={Li, Lingen and Wang, Guangzhi and Li, Xiaoyu and Zhang, Zhaoyang and Dou, Qi and Gu, Jinwei and Xue, Tianfan and Shan, Ying},
journal={arXiv preprint arXiv:2603.04291},
year={2026}
}
This repository builds upon the excellent opensource repos including Wan2.2, DiffSynth-Studio, and equilib. We gratefully acknowledge the original authors for making the code publicly available.
This repository is released under the terms of the LICENSE file.
By cloning, downloading, using, or distributing this repository or any of its code, models, or weights, you agree to comply with the terms and conditions specified in the LICENSE.

