Zehong Ma

Zehong Ma

Ph.D. Student

Peking University

Research Interests

Image/Video Generation
Multi-modal Large Language Model
Open-Vocabulary Recognition

About

I am now a fourth-year Ph.D. student in VMC group under the supervision of Professor Shiliang Zhang at the School of Computer Science of Peking University, Beijing, China.

My research interests are multi-modal understanding and generation, including multimodal large language model, image/video generation, and open-vocabulary recognition. I am seeking job opportunities in 2026. Please feel free to email me if you are interested in my research.

News

2025-09

🎉 MagCache about fast video generation has been accepted by NeurIPS 2025

2025-05

🎉 EMLoC about long-context learning has been accepted by ICML 2025

2025-02

🎉 MMRef about multi-modal representation learning has been accepted by IEEE TMM

2024-03

🎉 OVMR about open-vocabulary recognition has been accepted by CVPR 2024

Education

Peking University

2022 - Present

Ph.D. in School of Computer Science, Supervised by Prof. Shiliang Zhang

Northwestern Polytechnical University

2018 - 2022

BSc in School of Software. Practice Research Advised by Prof. Peng Wang

Publications

MagCache: Fast Video Generation with Magnitude-Aware Cache

MagCache: Fast Video Generation with Magnitude-Aware Cache

Zehong Ma, Longhui Wei, Feng Wang, Shiliang Zhang, Qi Tian

Neural Information Processing Systems (NeurIPS) 2025

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Zehong Ma, Longhui Wei, Shuai Wang, Shiliang Zhang, Qi Tian

2025

Efficient Multi-modal Long Context Learning for Training-free Adaptation

Efficient Multi-modal Long Context Learning for Training-free Adaptation

Zehong Ma, Shiliang Zhang, Longhui Wei, Qi Tian

International Conference on Machine Learning (ICML) 2025

PDFCode
Multi-Modal Reference Learning for Fine-Grained Text-to-Image Retrieval

Multi-Modal Reference Learning for Fine-Grained Text-to-Image Retrieval

Zehong Ma, Hao Chen, Wei Zeng, Limin Su, Shiliang Zhang

IEEE Transactions on Multimedia 2025

PDF
OVMR: Open-Vocabulary Recognition with Multi-Modal References

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Zehong Ma, Shiliang Zhang, Longhui Wei, Qi Tian

Computer Vision and Pattern Recognition (CVPR) 2024

PDFCode

Honors and Awards

Top Ten Students of the Year

2025

NERCVT, Peking University

Merit Student

2025

Peking University

China National Scholarship

2019,2020,2021

Outstanding Student Model of Northwestern Polytechnical University

2020

National Champion of China Robotics Competition in Basketball Robot

2020

National First Prize of DJI RoboMaster Competition

2020

Services

Reviewer

2023 - Present

NeurIPS, CVPR, TIP, TMM, TMLR, CVIU