Skip to content

renwang435/multigen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio

Renhao Wang, Haoran Geng, Tingle Li, Feishi Wang, Gopala Anumanchipalli, Boyi Li, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Alexei A. Efros

[arXiv] [BibTeX]

Code Structure

Our code release consists of two main sections. The first section involves physics-based simulation for generating motion planned pouring trajectories. The second section involves training a video-to-audio diffusion model for synchronized pouring. This section also includes inference code for generating audio tracks given the simulated video from the first section.

Citing MultiGen

@inproceedings{
    wang2025the,
    title={The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio},
    author={Renhao Wang and Haoran Geng and Tingle Li and Philipp Wu and Feishi Wang and Gopala Anumanchipalli and Trevor Darrell and Boyi Li and Pieter Abbeel and Jitendra Malik and Alexei A Efros},
    booktitle={9th Annual Conference on Robot Learning},
    year={2025},
    url={https://openreview.net/forum?id=a9RXjOt5bU}
}

About

The Sound of Simulation (CoRL 2025 Best Paper Finalist)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published