This repository is the official implementation of SeeU.
[CVPR 2026] SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
Yu Yuan, Tharindu Wickremasinghe, Zeeshan Nadir, Xijun Wang, Yiheng Chi, Stanley H. Chan
- [Mar 17, 2026]: Release code.
- [Feb 21, 2026]: SeeU has been accepted by CVPR 2026!
- [Dec 3, 2025]: Paper available on arXiv.
- [Dec 3, 2025]: Release SeeU45 Dataset.
- [Dec 1, 2025]: Release Project Page.
- CUDA 12.6, 64-bit Python 3.10 and PyTorch 2.6.0, other environments may also work
- Users can use the following commands to install the packages
conda create -n seeu python=3.10
conda activate seeu
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
git clone https://github.com/pandayuanyu/SeeU.git
cd SeeU
pip install -r requirements.txtPlease download our pre-processed SeeU45 Dataset from Hugging Face, and put it under folder preproc
pip install "huggingface_hub[hf_transfer]"
hf download pandaphd/SeeU45_PreProcessed --repo-type dataset --local-dir preproc/SeeU45We use the butterfly scene as a demonstration to complete the two-stage training process:
(1) from 2D to discrete 4D, and
(2) from discrete 4D to continuous 4D.
python train.py --work-dir /path-to/output_butterfly/ data:custom --data.data-dir /path-to/preproc/SeeU45/ --data.scene butterfly --data.depth-type megasam_depthThe following script demonstrates how to render projected frames and inpainting masks from the trained continuous 4D representation, given a predefined temporal configuration.
python inference_video_lapse.py --work-dir /path-to/output_butterfly/ --port 5005 --data.data-dir /path-to/preproc/SeeU45/ --data.scene butterfly --camera.mode continuous --gt.gt-dir /path-to/dataset/SeeU45_GT/butterfly/- You can modify the temporal setup in
inference_video_lapse.pyunder:
class VideoConfig:
num_past_extrap_frames: int = 5 # number of extrapolated frames in the past
num_replay_frames: int = 71 # number of in-between frames
num_future_extrap_frames: int = 5 # number of extrapolated future frames
# We recommend using 81 total frames for final experiments, to stay consistent with the following video inpainting stage.- If ground-truth frames are not available, you can safely remove:
--gt.gt-dir /path-to/dataset/SeeU45_GT/butterfly/The following script demonstrates rendering projected frames and inpainting masks from the trained continuous 4D representation, under different camera trajectories.
python inference_video_lapse.py --work-dir /path-to/output_butterfly/ --port 5005 --data.data-dir /path-to/preproc/SeeU45/ --data.scene butterfly --camera.mode reference --camera.traj dolly-right --gt.gt-dir /path-to/dataset/SeeU45_GT/butterfly/- Supported camera trajectories via
--camera.traj, current options include:fixedtilt-uppan-rightdolly-updolly-rightdolly-out - You can extend or customize trajectories by modifying
flow3d/trajectories_4D_motion.py - If ground-truth frames are not available, you can safely remove:
--gt.gt-dir /path-to/dataset/SeeU45_GT/butterfly/We also provide additional scripts for visualizing 4D tracks.
python render_tracks.py --work-dir /path-to/output_butterfly/ --data.data-dir /path-to/preproc/SeeU45/ --data.scene butterfly- You can modify the visualization setup in
render_tracks.pyunder:
class VideoConfig:
# View 2 Config
view2: CameraView2Config = field(default_factory=CameraView2Config)
# tracks selection
grid_selection_step: int = 80 # Sampling density (e.g., visualize one track per N 3D Gaussians)
opacity_selection_threshold: float = 0.8 # Opacity threshold for filtering tracksOnce the projected frames (src_video.mp4) and inpainting masks (src_mask.mp4) are obtained from above Section 4.1 or 4.2, we perform in-context video inpainting based on a given text prompt using VACE.
Install VACE first (the environment is already integrated, so there is no need to create a separate environment for VACE). Then run the following scripts:
cd /path-to/VACE/
# inpainting inference
python vace/vace_wan_inference.py --src_video /path-to/output_butterfly/videos/2026-xx-xx-xxxxxx_0p_81r_0f_Cam_Ref_dolly-out/src_video.mp4 --src_mask /path-to/output_butterfly/videos/2026-xx-xx-xxxxxx_0p_81r_0f_Cam_Ref_dolly-out/src_mask.mp4 --prompt "A close-up video of a butterfly flapping its wings. The background has stones and gravel. Restore the masked regions of the video with the backgroound of the stones. Make the colors and background behind the butterfly realistic and continuous."Click to expand
Our data preprocessing pipeline is primarily adapted from Shape-of-Motion, with few modifications.
The difference is that we additionally leverage MegaSaM to extract more robust camera parameters and depth maps, which improves the overall stability and quality of downstream 4D reconstruction.
# 1. create new venv
# 2.
cd /home/.../SeeU/preproc/
# 3. install dependencies
./setup_dependencies.sh
# 4. get foreground masks by SAM
python mask_app.py --root_dir /home/.../SeeU/preproc/data/
# 5. run
python process_custom.py --img-dirs /home/.../SeeU/preproc/data/images/** --gpus 0
# 6. install MegaSaM and
cd /home/.../SeeU/preproc/mega-sam/
# 7. modeify the setting of the following scripts and run
bash ./mono_depth_scripts/run_mono-depth_demo.sh
bash ./tools/evaluate_demo.sh
bash ./cvd_opt/cvd_opt_demo.sh
# 8. After preprocessing, please follow the directory structure of [SeeU45_PreProcessed](https://huggingface.co/datasets/pandaphd/SeeU45_PreProcessed/tree/main) to organize your data.-
We recommend creating a separate environment for data preprocessing, as it involves running multiple third-party models. Our base environment (compatible with all preprocessing models): CUDA 12.1, Python 3.10, PyTorch 2.2.0, torchvision 0.17.0, transformers 4.57.0, xformers 0.0.24.
-
For efficient training, we suggest using custom datasets with 10–40 frames, where the data contains clear foreground motion.
Please check the code is in the comp_metrics folder.
This project is released for academic use. We disclaim responsibility for user-generated content. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for, users' behaviors. Use the generative model responsibly, adhering to ethical and legal standards.
We thank Shape-of-Motion, MegaSaM, and VACE for their amazing jobs.
If you feel this project helpful/insightful, please cite our paper:
@article{Yuan_2025_SeeU,
title={{SeeU}: Seeing the Unseen World via 4D Dynamics-aware Generation},
author={Yuan, Yu and Wickremasinghe, Tharindu and Nadir, Zeeshan and Wang, Xijun and Chi, Yiheng and Chan, Stanley H.,
journal={arXiv preprint arXiv: 2512.03350},
year={2025}
}If you have any questions or comments, feel free to contact me through email (mryuanyu@outlook.com). Suggestions and collaborations are also highly welcome!
