About
Currently, I am a second-year master’s student at the Institute of Computing Technology, Chinese Academy of Sciences. Prior to this, I earned my B.Eng. from Huazhong University of Science and Technology. My research focuses on Computer Vision and Vision-Language Models. I’m also passionate about the open-source community.
News
- [Jan. 2026] Our paper “Revisiting Multimodal Positional Encoding in Vision-Language Models” has been accepted by ICLR 2026. [Paper] [GitHub]
- [Nov. 2025] Our Qwen3-VL technical report has been released. [Paper] [GitHub]
- [May. 2025] Our paper RefHCM has been released and accepted by TMM. [Paper] [Code]
Education
Institute of Computing Technology, Chinese Academy of Sciences: Master’s student (2024.9–present)
Huazhong University of Science and Technology: Bachelor’s student (2020.9–2024.6)
Internship
- Qwen Team, Alibaba Cloud (2025.4–2025.9) : Core contributor to the Qwen3-VL series, participating in multimodal positional encoding research, inference infrastructure, and model release.
Open Source
Here are some open-source contributions I’m proud of. I’m grateful to everyone involved in these projects, collaborating with this community has been an incredible experience. 🫡
- Transformers: Added support for Qwen3-VL, Qwen3.5.
- vLLM: Added support for Qwen3-VL, Qwen3.5.
- llama.cpp: Added support for Qwen3-VL, Qwen3.5.
- MLX community: Contributed enhancements such as mlx-vlm #722 and mlx-lm #869.
Publications
Vision-Language Models
- Revisiting Multimodal Positional Encoding in Vision-Language Models
Jie Huang*, Xuejing Liu*, Sibo Song, Ruibing Hou, Hong Chang, Junyang Lin, Shuai Bai
International Conference on Learning Representations (ICLR), 2026.
[Paper] [Code] - Qwen3-VL Technical Report
Core Contributor
arXiv preprint, 2025.
[Paper] [Code]
Human-Centric Perception
- RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios
Jie Huang, Ruibing Hou, Jiahe Zhao, Hong Chang, Shiguang Shan
IEEE Transactions on Multimedia (TMM), 2025.
[Paper] [Code]
Adversarial Robustness
- Stealthy and Effective Physical Adversarial Attacks in Autonomous Driving
Man Zhou, Wenyu Zhou, Jie Huang, Junhui Yang, Minxin Du, Qi Li
IEEE Transactions on Information Forensics and Security (TIFS), 2024.
[Paper]
