Skip to content

RoboVIP/RoboVIP_VDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Paper Website

We propose RoboVIP, a multi-view inpainting-based video diffusion model with identity reference as conditions to augment robotics manipulation data in both simulation and real-world robot setup.

πŸ”₯ Update | πŸ”§ Installation | πŸ’» Inference Augmentation | 🧩 Dataset Preprocessing | πŸ”₯Train

Update πŸ”₯πŸ”₯πŸ”₯

  • Release the paper
  • Release the Video Diffusion Model weights and Inference Code
  • Less GPU memory intense version (<80GB) of Bridge RLDS
  • Release the preprocessing code of the dataset
  • Release the training code for the Video Diffusion Model
  • Release the simulation testing
  • Release the training code for simulation

⭐ If you like RoboVIP, please help ⭐⭐star⭐⭐ this repo. Thanks! πŸ€—



Under Review. Code Will Release Soon.


Citation πŸ“š

If you make use of our work, please cite our paper.

@article{wang2026robovip,
  title={RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation},
  author={Wang, Boyang and Zhang, Haoran and Zhang, Shujie and Hao, Jinkun and Jia, Mingda and Lv, Qi and Mao, Yucheng and Lyu, Zhaoyang and Zeng, Jia and Xu, Xudong and others},
  journal={arXiv preprint arXiv:2601.05241},
  year={2026}
}

Acknowledgment πŸ€—

RoboVIP is built on diffusers and RoboEngine. We appreciate the authors for sharing their awesome codebase.

About

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published