About

Currently, I am a second-year master’s student at the Institute of Computing Technology, Chinese Academy of Sciences. Prior to this, I earned my B.Eng. from Huazhong University of Science and Technology. My research focuses on Computer Vision and Vision-Language Models. I’m also passionate about the open-source community.

News

  • [Jan. 2026] Our paper “Revisiting Multimodal Positional Encoding in Vision-Language Models” has been accepted by ICLR 2026. [Paper] [GitHub]
  • [Nov. 2025] Our Qwen3-VL technical report has been released. [Paper] [GitHub]
  • [May. 2025] Our paper RefHCM has been released and accepted by TMM. [Paper] [Code]

Education

  • Institute of Computing Technology, Chinese Academy of Sciences: Master’s student (2024.9–present)

  • Huazhong University of Science and Technology: Bachelor’s student (2020.9–2024.6)

Internship

  • Qwen Team, Alibaba Cloud (2025.4–2025.9) : Core contributor to the Qwen3-VL series, participating in multimodal positional encoding research, inference infrastructure, and model release.

Open Source

Here are some open-source contributions I’m proud of. I’m grateful to everyone involved in these projects, collaborating with this community has been an incredible experience. 🫡

Publications

Vision-Language Models

  1. Revisiting Multimodal Positional Encoding in Vision-Language Models
    Jie Huang*, Xuejing Liu*, Sibo Song, Ruibing Hou, Hong Chang, Junyang Lin, Shuai Bai
    International Conference on Learning Representations (ICLR), 2026.
    [Paper] [Code]
  2. Qwen3-VL Technical Report
    Core Contributor
    arXiv preprint, 2025.
    [Paper] [Code]

Human-Centric Perception

  1. RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios
    Jie Huang, Ruibing Hou, Jiahe Zhao, Hong Chang, Shiguang Shan
    IEEE Transactions on Multimedia (TMM), 2025.
    [Paper] [Code]

Adversarial Robustness

  1. Stealthy and Effective Physical Adversarial Attacks in Autonomous Driving
    Man Zhou, Wenyu Zhou, Jie Huang, Junhui Yang, Minxin Du, Qi Li
    IEEE Transactions on Information Forensics and Security (TIFS), 2024.
    [Paper]