The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio

Renhao Wang, Haoran Geng, Tingle Li, Feishi Wang, Gopala Anumanchipalli, Boyi Li, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Alexei A. Efros

[arXiv] [BibTeX]

Code Structure

Our code release consists of two main sections. The first section involves physics-based simulation for generating motion planned pouring trajectories. The second section involves training a video-to-audio diffusion model for synchronized pouring. This section also includes inference code for generating audio tracks given the simulated video from the first section.

Citing MultiGen

@inproceedings{
    wang2025the,
    title={The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio},
    author={Renhao Wang and Haoran Geng and Tingle Li and Philipp Wu and Feishi Wang and Gopala Anumanchipalli and Trevor Darrell and Boyi Li and Pieter Abbeel and Jitendra Malik and Alexei A Efros},
    booktitle={9th Annual Conference on Robot Learning},
    year={2025},
    url={https://openreview.net/forum?id=a9RXjOt5bU}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
generation		generation
simulation		simulation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio

Code Structure

Citing MultiGen

About

Uh oh!

Releases

Packages

Languages

License

renwang435/multigen

Folders and files

Latest commit

History

Repository files navigation

The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio

Code Structure

Citing MultiGen

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages