SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

Yuji Wang¹, Haoran Xu², Yong Liu¹, Jiaze Li², Yansong Tang¹

¹Tsinghua University ²ZJU

📖 Overview

We propose a novel framework, SAM2-LOVE that firstly leverages SAM2 to achieve pixel-wise understanding in the LAVS by designing a multimodal fusion module.
We develop creative token propagation and accumulation strategies to improve spatio-temporal comprehension of the promtable token.
Extensive experiments on Ref-AVS dataset demonstrate the superiority of our method, with ablation studies highlighting the simplicity and effectiveness of its modules.

🌹 Acknowledgement

Our work is primarily based on EVF-SAM, SAM2, Ref-AVS. We are sincerely grateful for their excellent works.

📚 Citation

If you find our paper and code helpful for your research, please consider starring our repository ⭐ and citing our work ✏️.

@inproceedings{wang2025sam2,
  title={SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes},
  author={Wang, Yuji and Xu, Haoran and Liu, Yong and Li, Jiaze and Tang, Yansong},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={28932--28941},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assests		assests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

📖 Overview

🌹 Acknowledgement

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes

📖 Overview

🌹 Acknowledgement

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages