Renhao Wang, Haoran Geng, Tingle Li, Feishi Wang, Gopala Anumanchipalli, Boyi Li, Trevor Darrell, Pieter Abbeel, Jitendra Malik, Alexei A. Efros
Our code release consists of two main sections. The first section involves physics-based simulation for generating motion planned pouring trajectories. The second section involves training a video-to-audio diffusion model for synchronized pouring. This section also includes inference code for generating audio tracks given the simulated video from the first section.
@inproceedings{
wang2025the,
title={The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio},
author={Renhao Wang and Haoran Geng and Tingle Li and Philipp Wu and Feishi Wang and Gopala Anumanchipalli and Trevor Darrell and Boyi Li and Pieter Abbeel and Jitendra Malik and Alexei A Efros},
booktitle={9th Annual Conference on Robot Learning},
year={2025},
url={https://openreview.net/forum?id=a9RXjOt5bU}
}