About

Currently, I am a second-year master’s student at the Institute of Computing Technology, Chinese Academy of Sciences. Prior to this, I earned my B.Eng. from Huazhong University of Science and Technology. My research focuses on Computer Vision and Vision-Language Models. I’m also passionate about the open-source community.

News

[Jan. 2026] Our paper “Revisiting Multimodal Positional Encoding in Vision-Language Models” has been accepted by ICLR 2026. [Paper] [GitHub]
[Nov. 2025] Our Qwen3-VL technical report has been released. [Paper] [GitHub]
[May. 2025] Our paper RefHCM has been released and accepted by TMM. [Paper] [Code]

Education

Institute of Computing Technology, Chinese Academy of Sciences: Master’s student (2024.9–present)
Huazhong University of Science and Technology: Bachelor’s student (2020.9–2024.6)

Internship

Qwen Team, Alibaba Cloud (2025.4–2025.9) : Core contributor to the Qwen3-VL series, participating in multimodal positional encoding research, inference infrastructure, and model release.

Open Source

Here are some open-source contributions I’m proud of. I’m grateful to everyone involved in these projects, collaborating with this community has been an incredible experience. 🫡

Transformers: Added support for Qwen3-VL, Qwen3.5.
vLLM: Added support for Qwen3-VL, Qwen3.5.
llama.cpp: Added support for Qwen3-VL, Qwen3.5.
MLX community: Contributed enhancements such as mlx-vlm #722 and mlx-lm #869.

Publications

Vision-Language Models

Revisiting Multimodal Positional Encoding in Vision-Language Models
Jie Huang*, Xuejing Liu*, Sibo Song, Ruibing Hou, Hong Chang, Junyang Lin, Shuai Bai
International Conference on Learning Representations (ICLR), 2026.
[Paper] [Code]
Qwen3-VL Technical Report
Core Contributor
arXiv preprint, 2025.
[Paper] [Code]

Human-Centric Perception

RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios
Jie Huang, Ruibing Hou, Jiahe Zhao, Hong Chang, Shiguang Shan
IEEE Transactions on Multimedia (TMM), 2025.
[Paper] [Code]

Adversarial Robustness

Stealthy and Effective Physical Adversarial Attacks in Autonomous Driving
Man Zhou, Wenyu Zhou, Jie Huang, Junhui Yang, Minxin Du, Qi Li
IEEE Transactions on Information Forensics and Security (TIFS), 2024.
[Paper]

JJJYmmm (Jie Huang)