Wentao (Tony) Ma

@ BosonAI
@ University of Toronto

MLLM for Video / Audio Understanding and Generation

University of Toronto
College St, Toronto, ON, CA, M7A 1A2
Email: tonyyyma [at] gmail [dot] com

Open to [PhD / Research Engineer] positions


Introduction

The research areas I'm focusing on are Multi-Modal LLMs. I enjoy improving and exploring the ability of MLLMs and on Video and Audio, and applying them to other fields like Robotics.

Currently, I'm a MLE at @BosonAI, adviced by Alex Smola and Mu Li. We are developing efficient and expressive foundation models for audio understanding and generation.

Before that, I got my Master's degree from University of Toronto, supervised by Dr. Zhijing Jin. At the meantime,I worked closely with Wenhu Chen on the Video understanding field. I also studied at Imperial College London, supervised by Edward Johns. We validate and improve the Multi-Modal pattern learning ability of VLMs and apply them to Robotics. I got my bachelor's degree from Beihang University, School of ShenYuan Honors College, and my major is Computer Science.

I like photographing and I'm one of the members of Toronto Photo Walk(ToPW). I'm also interested in all kinds of sports, including snowboarding and tennis.

News

Publications                            

Higgs-Audio V2.5 Voice Model

Wentao Ma(Core Contributer), Boson AI Team

Technical Blog

[Blog]

VideoScore2: Think before You Score in Generative Video Evaluation

Xuan He*, Dongfu Jiang*, Ping Nie, Minghao Liu, Wentao Ma, Junru Lin, and Others

Preprint

[paper] [website]

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Jialin Yang*, Dongfu Jiang*, Lipeng He, Sherman Siu, Wentao Ma, Zhiheng Lyu, and Others

Transactions on Machine Learning Research(TMLR), 2025

[paper] [website] [benchmark]

VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation

Wentao Ma*, Weiming Ren*, Yiming Jia, Zhuofeng Li, Ping Nie, Ge Zhang, Wenhu Chen

Preprint

[paper] [website] [benchmark] [Leaderboard]

ProT-GFDM: A Generative Fractional Diffusion Model for Protein Generation

Xiao Liang*, Wentao Ma*, Eric Paquet, Herna Lydia Viktor, Wojtek Michalowski

Computational and Structural Biotechnology Journal(CSBJ), 2025

[paper]

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen

International Conference on Computer Vision (ICCV), 2025

[paper] [website]

Paint2Plan: Image Painting Enables Imitation Learning with VLMs

Tony Ma, Teyun Kwon, Edward Johns

Preprint, 2024

[paper] [website]

LLM Echo Chamber: personalized and automated disinformation

Tony Ma, Yves-Alexandre de Montjoye

Machine Leanrning and Cyber Security Symposium (MLCSS), Imperial, 2024

[paper] [code] [video]

Boosting Transferability of Adversarial Patches with Visual Relations

Tony Ma, Songze Li, Yisong Xiao, Shunchang Liu

Conference on Computer Vision and Pattern Recognition (CVPR), AdvVision Workshop, 2023

[paper]

Experience             

Boson AI

Machine Learning Engineer

Alignment for Audio Understanding and Generation models

May.2025 - Present [website]

Vector Institute

Machine Learning Associate

Designed a Geo-filtering RAG system with Global Spatial Technology Solutions(GSTS)

Jan.2025 - Apr.2025 [website]

SONY

Edge AI Engineer Intern

Video Object Tracking / Model Qutilization / Edge Computing

Sep.2022 - Feb.2023 [website] [Project]

TikTok

Software Engineer Intern

IOS developing for TikTok Pay

May.2022 - Aug.2022 [website]

Selected Certifications and Awards

AWS Certified Solution Architect (Associate) --- 2026
Mitacs Research Funding --- 2025-2026
Distinction @ Imperial College London --- 2024
Outstanding Graduates --- 2023
Scholarship for Academic Excellence --- 2020/2021/2022
Scholarship for Discipline Competitions --- 2020/2021/2022
Excellent Student Leader --- 2020

Community Serving

Reviewer: ICLR workshops


© Wentao Ma | Template From Dr.YueMing Jin | Last updated: Jan 2026