VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
Yuxuan Wang*, Yiqi Song*, Cihang Xie, Yang Liu, Zilong Zheng
ICCV 2025
|
PDF
|
Code
|
Homepage
|
Cite
|
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang, Yueqian Wang, Bo Chen, Tong Wu, Dongyan Zhao, Zilong Zheng
CVPR 2025
|
PDF
|
Code(OmniMMI)
|
Code(M4)
|
Homepage
|
Cite
|
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng
EMNLP 2024
|
PDF
|
Code & Demo
|
Cite
|
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yuxuan Wang, Alan Yuille, Zhuowan Li, Zilong Zheng
COLM 2024
|
PDF
|
Code
|
Cite
|
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang, Dongyan Zhao
ACL 2023
|
PDF
|
Code
|
Homepage
|
Cite
|
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
Yueqian Wang, Xiaojun Meng, Yuxuan Wang, Jianxin Liang, Jiansheng Wei, Huishuai Zhang, Dongyan Zhao
EMNLP 2025 Findings
|
PDF
|
Code
|
Cite
|
The AI Hippocampus: How Far are We From Human Memory?
Zixia Jia*, Jiaqi Li*, Yipeng Kang*, Yuxuan Wang*, Tong Wu, Quansen Wang, Xiaobo Wang, Shuyi Zhang, Junzhe Shen, Qing Li, Siyuan Qi, Yitao Liang, Di He, Zilong Zheng, Song-Chun Zhu
TMLR 2025
|
PDF
|
Code
|
Cite
|