OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

This is the repository of the OCRBench & OCRBench v2 & MDPBench.

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
Zhang Li*, Zhibo Lin*, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu

MDPBench is the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluate how models perform on digital and photographed documents across diverse scripts and low-resource languages. MDPBench comprises 3,400 document images spanning 17 languages (Simplified Chinese, Traditional Chinese, English, Arabic, German, Spanish, French, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Thai, Vietnamese), diverse scripts, and varied photographic conditions, with high-quality annotations produced through a rigorous pipeline of expert model labeling, manual correction, and human verification. To ensure fair comparison and prevent data leakage, we maintain separate public and private evaluation splits. Our comprehensive evaluation of both open-source and closed-source models uncovers a striking finding: while closed-source models (notably Gemini3-Pro) prove relatively robust, open-source alternatives suffer dramatic performance collapse, particularly on non-Latin scripts and real-world photographed documents, with an average drop of 17.8% on photographed documents and 14.0% on non-Latin scripts. These results reveal significant performance imbalances across languages and conditions, and point to concrete directions for building more inclusive, deployment-ready parsing systems.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai

OCRBench v2 is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in OCRBench v2 README.

OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models
Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, Xiang Bai

OCRBench is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in OCRBench README.

News

2026.04.01 🚀 We realese MDPBench, a benchmark for multilingual document parsing in real-world scenarios.
2026.03.31 🚀 The leaderboard has been updated to the latest release Leaderboard (2026.03).
2025.09.30 🚀 The leaderboard has been updated (2025.09).
2025.09.18 🚀 OCRBench v2 has been accepted by NeurIPS 2025 Datasets & Benchmarks Track.
2025.06.21 🚀 We realese the private dataset of OCRBench v2 and will update Leaderboard every quarter.
2024.12.31 🚀 OCRBench v2 is released.
2024.12.11 🚀 OCRBench has been accepted by Science China Information Sciences.
2024.05.19 🚀 We realese DTVQA, to explore the Capabilities of Large Multimodal Models on Dense Text.
2024.05.01 🚀 Thanks to SWHL for releasing ChineseOCRBench.
2024.03.26 🚀 OCRBench is now supported in lmms-eval.
2024.03.12 🚀 We plan to construct OCRBench v2 to include more ocr tasks and data. Any contribution will be appreciated.
2024.02.25 🚀 OCRBench is now supported in VLMEvalKit.

Other Related Multilingual Datasets

Data	Link	Description
EST-VQA Dataset (CVPR 2020, English and Chinese)	Link	On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering.
Swahili Dataset (ICDAR 2024)	Link	The First Swahili Language Scene Text Detection and Recognition Dataset.
Urdu Dataset (ICDAR 2024)	Link	Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering.
MTVQA (9 languages)	Link	MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
EVOBC (Oracle Bone Script Evolution Dataset)	Link	We systematically collected ancient characters from authoritative texts and websites spanning six historical stages.
HUST-OBC (Oracle Bone Script Character Dataset)	Link	For deciphering oracle bone script characters.

Citation

If you wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{Liu_2024,
    title={OCRBench: on the hidden mystery of OCR in large multimodal models},
    volume={67},
    ISSN={1869-1919},
    url={http://dx.doi.org/10.1007/s11432-024-4235-6},
    DOI={10.1007/s11432-024-4235-6},
    number={12},
    journal={Science China Information Sciences},
    publisher={Springer Science and Business Media LLC},
    author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
    year={2024},
    month=dec }
  
@misc{fu2024ocrbenchv2improvedbenchmark,
    title={OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning}, 
    author={Ling Fu and Biao Yang and Zhebin Kuang and Jiajun Song and Yuzhe Li and Linghao Zhu and Qidi Luo and Xinyu Wang and Hao Lu and Mingxin Huang and Zhang Li and Guozhi Tang and Bin Shan and Chunhui Lin and Qi Liu and Binghong Wu and Hao Feng and Hao Liu and Can Huang and Jingqun Tang and Wei Chen and Lianwen Jin and Yuliang Liu and Xiang Bai},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2025}
}

@misc{li2026mdpbenchbenchmarkmultilingualdocument,
      title={MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios}, 
      author={Zhang Li and Zhibo Lin and Qiang Liu and Ziyang Zhang and Shuo Zhang and Zidun Guo and Jiajun Song and Jiarui Zhang and Xiang Bai and Yuliang Liu},
      year={2026},
      eprint={2603.28130},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.28130}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
MDPBench		MDPBench
OCRBench		OCRBench
OCRBench_v2		OCRBench_v2
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

News

Other Related Multilingual Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

News

Other Related Multilingual Datasets

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages