Keda Tao

I'm Keda Tao, I am a first-year PhD student at Westlake University in a joint program with Zhejiang University, advised by Prof. Huan Wang. Previously, I received my B.E. degree from XDU in 2025.

My research interest is multimodal large language model (e.g., VideoLLMs and OmniLLMs), Efficient AI, low-level vision and generative model.

I am currently working as a research intern at Ant Group, focuing on the omnimodal understanding, agent and agentic model training. I am deeply grateful to my advisor and all collaborators for their guidance and support. Please feel free to reach out to me via email for any inquiries or potential collaborations.

Email  /  ENCODE LAB  /  Scholar  /  Github

profile photo

🔥 News

  • 2025.12 [2025 Final Update] 🤖 [Preprint] We release a new omnimodal understanding work: OmniAgent. OmniAgent is an audio-guided active perception agent for omnimodal audio-video understanding. We outperform Gemini2.5-Flash, GPT-4o, and Qwen3-Omni on several benchmarks. [Website]
  • 2025.11 🌟 [Preprint] A new work: OmniZip has been released. OmniZip is an audio-guided token compression method for fast OmniLLMs. [Repo]
  • 2025.10 [Preprint] We have released the preprint of StreamingTom and RLKV.
  • 2025.09 🎉 [NeurIPS'25] Poison as Cure and HoliTom are accepted by NeurIPS 2025!
  • 2025.08 🌟 [Survey] We are excited to present the first systematic review of multimodal long-context token compression methods. [arXiv] [Repo]
  • 2025.07 🎉 [Award] Received the "2025 Westlake University Xinrui Award" (西湖大学博士研究生新锐奖).
  • 2025.05 [Preprint] We have released the preprint of HoliTom: "Holistic Token Merging for Fast Video Large Language Models".
  • 2025.03 [Preprint] We introduce VidKV, a plug-and-play 1.x-bit KV Cache quantization for VideoLLMs. [Code]
  • 2025.02 🎉 [CVPR'25] DyCoke is accepted by CVPR'25! DyCoke is a plug-and-play token compression method for fast VideoLLMs.
  • 2025.02 [Preprint] We have released the preprint of our paper Poison as Cure. We propose a novel visual adversarial perturbation (VAP) method to mitigate hallucination.
  • 2025.01 🎉 [ICLR'25] MGFR is accepted by ICLR 2025 Spotlight! The Reface-HQ dataset is also released!
  • 2024.11 [Preprint] We have released the preprint of our paper DyCoke.

📖 Publications

arXiv 2025
OmniAgent
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding
Keda Tao, Wenjie Du, Bohan Yu, Weiqiang Wang, Jian Liu, Huan Wang
arXiv, 2025
arXiv 2025
OmniZip
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian Liu, Huan Wang
arXiv, 2025
ICLR 2025 Spotlight
MGFR
Overcoming False Illusions in Blind Face Restoration with Multi-Modal Guided Diffusion Model
Keda Tao, Jinjin Gu, Yulun Zhang, Xiucheng Wang, Nan Cheng
ICLR, 2025   (Spotlight)
CVPR 2025
DyCoke
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Keda Tao, Can Qin, Haoxuan Yu, Yang Sui, Huan Wang
CVPR, 2025
arXiv 2025
Survey
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Kele Shao*, Keda Tao*, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, Huan Wang
arXiv, 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao, Haoxuan Yu, Yang Sui, Can Qin, Huan Wang
arXiv, 2025
[arXiv] [Github] [Page]
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Kejia Zhang, Keda Tao, Jiasheng Tang, Huan Wang
NeurIPS, 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang
NeurIPS, 2025
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen, Keda Tao, Kele Shao, Huan Wang
arXiv, 2025
TARS: MinMax Token-Adaptive Preference Strategy for MLLM Hallucination Reduction
Kejia Zhang, Keda Tao, Zhiming Luo, Chang Liu, Jiasheng Tang, Huan Wang
arXiv, 2025
[arXiv]
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
Wenjie Du, Li Jiang, Keda Tao, Xue Liu, Huan Wang
arXiv, 2025
RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction
Xiucheng Wang*, Keda Tao*, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin (Sherman) Shen
TCCN, 2024
PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents
Haoyu Chen, Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu
arXiv, 2025
[arXiv]
Is Oracle Pruning the True Oracle?
Sicheng Feng, Keda Tao, Huan Wang
arXiv, 2025

🌟 Professional Services

  • Journal Reviewer - TMM etc.
  • Conference Reviewer - CVPR, ECCV, ICCV, ICLR, PRCV etc.

This webpage is built upon the source code of Wenjie Du.