About Me

Hello, my name is Zhentao Tan. I received my Bachelor’s degree from the University of Science and Technology of China (USTC) in 2017 and my Ph.D. from USTC in 2022, under the supervision of Professor Nenghai Yu. From 2022 to 2024, I conducted postdoctoral research at the Alibaba-USTC Joint Postdoctoral Research Station, under the joint guidance of Professor Nenghai Yu of USTC, Jieping Ye, Vice President of Alibaba Cloud and IEEE Fellow, and Le Lu, IEEE Fellow and member of the MICCAI Council. Currently, I work at Alibaba Cloud Apsara Lab. My research interests include large language models, multimodal models, image and video synthesis, visual image analysis, etc. I have published more than 20 related papers in top conferences and journals in computer vision, image processing, and artificial intelligence, such as CVPR, ECCV, ICLR, SIGGRAPH, EMNLP, TPAMI, and TIP.

Currently, I am dedicated to the fundamental research and practical application of large-scale models, including exploring the mechanisms and characteristics of structures such as MoE, how to train basic models to adapt to tasks in different domains, how to construct better data to drive the training of large-scale models, and how to achieve a better balance between the performance and efficiency of large-scale models. If you are also interested in large-scale model technology, please feel free to contact me, whether for collaboration or internship!

More contact information: GMAIL, OUTLOOK.

Main Publications

[AAAI Oral 2026] Flora: Effortless Context Construction to Arbitrary Length and Scale, Tianxiang Chen, Zhentao Tan, Xiaofan Bo, Yue Wu, Tao Gong, Qi Chu, Jieping Ye, Nenghai Yu.

[ArXiv 2025] Fun-ASR Technical Report, Keyu An, Yanni Chen, Chong Deng, Changfeng Gao, Zhifu Gao, Bo Gong, Xiangang Li, Yabin Li, Xiang Lv, Yunjie Ji, Yiheng Jiang, Bin Ma, Haoneng Luo, Chongjia Ni, Zexu Pan, Yiping Peng, Zhendong Peng, Peiyao Wang, Hao Wang, Wen Wang, Wupeng Wang, Biao Tian, Zhentao Tan, Nan Yang, Bin Yuan , et al.

[TIP 2024] Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal, Zhentao Tan, Yue Wu, Qiankun Liu, Qi Chu, Le Lu, Jieping Ye, Nenghai Yu.

[ICLR 2024] Boosting Vanilla Lightweight Vision Transformers via Re-parameterization, Zhentao Tan, Xiaodan Li, Yue Wu, Qi Chu, Le Lu, Nenghai Yu, Jieping Ye.

[EMNLP Findings 2024] Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection, Tianxiang Chen, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Jieping Ye, Nenghai Yu.

[TPAMI 2024] Transformer based Pluralistic Image Completion with Reduced Information Loss, Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu.

[TGRS 2024] MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection, Tianxiang Chen, Zi Ye, Zhentao Tan, Tao Gong, Yue Wu, Qi Chu, Bin Liu, Nenghai Yu, Jieping Ye.

[CVPR 2024] SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models, Feifei Wang, Zhentao Tan, Tianyi Wei, Yue Wu, Qidong Huang.

[CVPR 2024] Towards More Unified In-context Visual Understanding, Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu.

[TPAMI 2023] Semantic Probability Distribution Modeling for Diverse Semantic Image Synthesis, Zhentao Tan, Qi Chu, Menglei Chai, Dongdong Chen†, Jing Liao, Qiankun Liu, Bin Liu, Gang Hua, Nenghai Yu.

[ECCV Oral 2022] UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection, Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, Nenghai Yu.

[CVPR 2022] Reduce Information Loss in Transformers for Pluralistic Image Inpainting, Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu.

[CVPR 2021] Diverse Semantic Image Synthesis via Probability Distribution Modeling, Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu

[TPAMI 2021] Efficient Semantic Image Synthesis via Class-Adaptive Normalization, Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu.

[SIGGRAPH 2020] MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing, Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu.