


default search action
Can Huang 0002
Person information
- affiliation: Bytedance, Shanghai, China
Other persons with the same name
- Can Huang — disambiguation page
- Can Huang 0001
— Peking University, Beijing, China - Can Huang 0003
— Fudan University, Shanghai, China - Can Huang 0004
— Chongqing Municipal Research Institute of Design, China - Can Huang 0005
— North China University of Technology, Beijing, China - Can Huang 0006
— Lawrence Livermore National Laboratory, Livermore, CA, USA - Can Huang 0007
— Texas A&M University, College Station, TX, USA - Can Huang 0008
— Wuhan Electronic Information Institute, Wuhan, China - Can Huang 0009
— Stanford University, Stanford, CA, USA (and 1 more)
Other persons with a similar name
SPARQL queries 
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2026
[c11]Weitao Jia, Jinghui Lu, Haiyang Yu, Siqi Wang, Guozhi Tang, An-Lan Wang, Weijie Yin, Dingkang Yang, Yuxiang Nie, Bin Shan, Hao Feng, Irene Li, Kun Yang, Han Wang, Jingqun Tang, Teng Fu, Changhong Jin, Chao Feng, Xiaohui Lv, Can Huang:
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement. AAAI 2026: 31283-31291
[i26]Hao Feng, Wei Shi, Ke Zhang, Xiang Fei, Lei Liao, Dingkang Yang, Yongkun Du, Xuecheng Wu, Jingqun Tang, Yang Liu, Hong Chen, Can Huang:
Dolphin-v2: Universal Document Parsing via Scalable Anchor Prompting. CoRR abs/2602.05384 (2026)
[i25]Hanshen Zhu, Yuliang Liu, Xuecheng Wu, An-Lan Wang, Hao Feng, Dingkang Yang, Chao Feng, Can Huang, Jingqun Tang, Xiang Bai:
TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering. CoRR abs/2602.20903 (2026)- 2025
[c10]Xiang Fei, Jinghui Lu
, Qi Sun
, Hao Feng, Yanjie Wang, Wei Shi, An-Lan Wang, Jingqun Tang, Can Huang:
Advancing Sequential Numerical Prediction in Autoregressive Models. ACL (2) 2025: 562-574
[c9]Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang:
A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding. ACL (Findings) 2025: 7252-7273
[c8]Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shu Wei, An-Lan Wang, Chunhui Lin, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao Liu, Xiang Bai, Can Huang:
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. ACL (Findings) 2025: 7748-7763
[c7]Hao Feng, Shu Wei, Xiang Fei, Wei Shi, Yingdong Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, Hao Liu, Can Huang:
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting. ACL (Findings) 2025: 21919-21936
[c6]An-Lan Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xiang Fei, Jinghui Lu
, Han Wang, Hao Liu, Yuliang Liu, Xiang Bai, Can Huang:
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025: 22991-23001
[i24]Ling Fu, Biao Yang, Zhebin Kuang, Jiajun Song, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Mingxin Huang, Zhang Li, Guozhi Tang, Bin Shan
, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu
, Xiang Bai:
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning. CoRR abs/2501.00321 (2025)
[i23]Haiyang Yu, Jinghui Lu, Yanjie Wang, Yang Li, Han Wang, Can Huang, Bin Li:
EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models. CoRR abs/2503.04058 (2025)
[i22]Han Wang, Yongjie Ye, Bingru Li, Yuxiang Nie, Jinghui Lu, Jingqun Tang, Yanjie Wang, Can Huang:
Vision as LoRA. CoRR abs/2503.20680 (2025)
[i21]An-Lan Wang, Jingqun Tang, Liao Lei, Hao Feng, Qi Liu, Xiang Fei, Jinghui Lu, Han Wang, Weiwei Liu, Hao Liu, Yuliang Liu, Xiang Bai, Can Huang:
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? CoRR abs/2505.11015 (2025)
[i20]Xiang Fei, Jinghui Lu, Qi Sun, Hao Feng, Yanjie Wang, Wei Shi, An-Lan Wang, Jingqun Tang, Can Huang:
Advancing Sequential Numerical Prediction in Autoregressive Models. CoRR abs/2505.13077 (2025)
[i19]Hao Feng, Shu Wei, Xiang Fei, Wei Shi, Yingdong Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, Hao Liu, Can Huang:
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting. CoRR abs/2505.14059 (2025)
[i18]Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan
, Teng Fu, Hao Feng, Jingqun Tang, Han Wang, Can Huang:
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning. CoRR abs/2505.15154 (2025)
[i17]Xiang Fei, Siqi Wang, Shu Wei, Yuxiang Nie, Wei Shi, Hao Feng, Can Huang:
Post-Completion Learning for Language Models. CoRR abs/2507.20252 (2025)
[i16]Weitao Jia, Jinghui Lu, Haiyang Yu, Siqi Wang, Guozhi Tang, An-Lan Wang, Weijie Yin, Dingkang Yang, Yuxiang Nie, Bin Shan
, Hao Feng, Irene Li, Kun Yang, Han Wang, Jingqun Tang, Teng Fu, Changhong Jin, Chao Feng, Xiaohui Lv, Can Huang:
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement. CoRR abs/2508.09670 (2025)
[i15]Haiyang Yu, Yuchuan Wu, Fan Shi, Lei Liao, Jinghui Lu, Xiaodong Ge, Han Wang, Minghan Zhuo, Xuecheng Wu, Xiang Fei, Hao Feng, Guozhi Tang, An-Lan Wang, Hanshen Zhu, Yangfan He, Quanhuan Liang, Liyuan Meng, Chao Feng, Can Huang, Jingqun Tang, Bin Li:
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning. CoRR abs/2509.09731 (2025)
[i14]Yuxiang Nie, Han Wang, Yongjie Ye, Haiyang Yu, Weitao Jia, Tao Zeng, Hao Feng, Xiang Fei, Yang Li, Xiaohui Lv, Guozhi Tang, Jingqun Tang, Jinghui Lu, Zehui Dai, Jiacong Wang, Dingkang Yang, An-Lan Wang, Can Huang:
ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering. CoRR abs/2511.18399 (2025)- 2024
[j3]Hao Feng, Qi Liu
, Hao Liu, Jingqun Tang, Wengang Zhou, Houqiang Li, Can Huang:
DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding. Sci. China Inf. Sci. 67(12) (2024)
[c5]Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Can Huang, Hao Liu, Xin Tan, Zhizhong Zhang, Yuan Xie:
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer. CVPR 2024: 15567-15576
[c4]Jinghui Lu, Yanjie Wang, Ziwei Yang, Xuejing Liu, Brian Mac Namee, Can Huang:
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition. NeurIPS 2024
[c3]Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Binghong Wu, Lei Liao, Shu Wei, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang:
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy. NeurIPS 2024
[c2]Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie:
Harmonizing Visual Text Comprehension and Generation. NeurIPS 2024
[i13]Jinghui Lu, Ziwei Yang, Yanjie Wang, Xuejing Liu, Brian Mac Namee, Can Huang:
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition. CoRR abs/2402.04838 (2024)
[i12]Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu
, Hao Liu, Yuan Xie, Xiang Bai, Can Huang:
TextSquare: Scaling up Text-Centric Visual Instruction Tuning. CoRR abs/2404.12803 (2024)
[i11]Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu
, Hao Liu, Xiang Bai, Can Huang:
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. CoRR abs/2405.11985 (2024)
[i10]Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Houqiang Li, Can Huang:
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy. CoRR abs/2406.01326 (2024)
[i9]Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang:
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding. CoRR abs/2407.01976 (2024)
[i8]Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie:
Harmonizing Visual Text Comprehension and Generation. CoRR abs/2407.16364 (2024)
[i7]Bin Shan, Xiang Fei, Wei Shi, An-Lan Wang, Guozhi Tang, Lei Liao, Jingqun Tang, Xiang Bai, Can Huang:
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark. CoRR abs/2410.11538 (2024)
[i6]Han Wang, Yuxiang Nie, Yongjie Ye, Guanyu Deng, Yanjie Wang, Shuai Li, Haiyang Yu, Jinghui Lu, Can Huang:
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM. CoRR abs/2412.09530 (2024)- 2023
[j2]Dingkang Yang
, Yang Liu
, Can Huang, Mingcheng Li, Xiao Zhao, Yuzheng Wang
, Kun Yang, Yan Wang
, Peng Zhai, Lihua Zhang
:
Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences. Knowl. Based Syst. 265: 110370 (2023)
[j1]Yuliang Liu
, Jiaxin Zhang
, Dezhi Peng
, Mingxin Huang
, Xinyu Wang
, Jingqun Tang
, Can Huang
, Dahua Lin
, Chunhua Shen
, Xiang Bai
, Lianwen Jin
:
SPTS v2: Single-Point Scene Text Spotting. IEEE Trans. Pattern Anal. Mach. Intell. 45(12): 15665-15679 (2023)
[c1]Mingxin Huang
, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu
, Xiang Bai, Lianwen Jin:
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer. ICCV 2023: 19438-19448
[i5]Yuliang Liu
, Jiaxin Zhang
, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin:
SPTS v2: Single-Point Scene Text Spotting. CoRR abs/2301.01635 (2023)
[i4]Mingxin Huang, Jiaxin Zhang
, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu
, Xiang Bai, Lianwen Jin:
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer. CoRR abs/2308.10147 (2023)
[i3]Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang:
UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding. CoRR abs/2308.11592 (2023)
[i2]Hao Feng, Qi Liu, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang:
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding. CoRR abs/2311.11810 (2023)
[i1]Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Hao Liu, Zhizhong Zhang, Xin Tan, Can Huang, Yuan Xie:
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer. CoRR abs/2311.13120 (2023)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-04-15 00:58 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







