Overview

Updates

18 June: Congratulations to our Best Paper and Challenge Winners!

Best Paper Award:
READ: Recursive Autoencoders for Document Layout Generation.
Akshay Gadi Patil, Omri Ben-Eliezer, Or Perel, and Hadar Averbuch Elor

DocVQA Challenge Task 1:

  • Winner: PingAn-OneConnect-Gammalab-DQA Team
    Han Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, and Ziqi Zheng
  • Runner-up 1st: Structural LM Team
    Chenliang Li, Bin Bi, Ming Yan, Wei Wang, and Songfang Huang
  • Runner-up 2nd: IIE.SECAI-CUC Team
    Yudi Chen, Youhui Guo, Gangyan Zeng, Jianjian Cao, Qiming Peng, and Sijin Wu

DocVQA Challenge Task 2:

  • Winner: PingAn-OneConnect-Gammalab-DQA Team
    Han Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, Ziqi Zheng
  • Runner-up 1st: iIFLYTEK-DOCR Team
    Chenyu Liu, Fengren Wang, Jiajia Wu, Jinshui Hu, Bing Yin, and Cong Liu


12 June: Schedule updated, now available!
4 May: Added Accepted Paper List
14 April: Clarification: Camera Ready Deadline is April 17, 2020 [April 15 on the emailed document was in error]
10 April: Notification to Authors Pushed to April 13, 2020
18 March: Paper Submission Deadline Extended to March 31, 2020
11 Feb: Updated challenge information and website: https://rrc.cvc.uab.es/?ch=17
10 Feb: Call for papers and submission site now available! Workshop Date: Full day June 15, 2020
17 Jan: Deadlines for paper submissions added! Check back soon for competition dates!
6 Dec: Web site is now live! Check back soon for new updates!

Overview

Understanding written communication through vision is a key aspect of human civilization, and should also be an important capacity of intelligent agents aspiring to function in man-made environments. As such, the analysis of written communication in images has been one of the oldest fields in computer vision. There are two major application domains for such methods: document understanding and scene text understanding. Document understanding goes over and above OCR, requiring layout analysis, handwritten text recognition, symbolic language interpretation etc., on anything from administrative and historical documents to mixed-type documents such as maps and diagrams. Understanding text in the wild, on the other hand, has already enabled applications such as image-based translation or autonomous navigation. In both cases, dealing with such high-level semantic content requires specific treatment that often departs from other areas in computer vision. While work on such problems has been carried out for many years, with synergies between document understanding and computer vision domains, reaching the high accuracy required for many applications is still a practical challenge.

The cross-fertilization between techniques used for computer vision and document understanding has led to some of the recent advances in both fields. CNNs were first developed for recognizing characters in MNIST and have now revolutionized computer vision. Similarly, RNNs were first used in areas such as handwriting and speech recognition and are now an important part of the toolbox for computer vision. On the other hand, advances in deep learning made in areas such as object recognition are now critical to improving the state of the art in both document understanding and scene text recognition.

The goal of this workshop is to raise awareness about the aforementioned topics in the broader computer vision community, as well as drive a new wave of progress by cross pollinating more ideas between text/documents and mainstream computer vision. This workshop will comprise of the following:

  • Invited and peer reviewed papers related to the call for papers below
  • Keynote addresses by top researchers in the field
  • A challenge on Document VQA aiming to advance state of the art in text and documents

Call for Papers

The topics of interest for the workshop include but are not limited to the following:

● Document structure and layout learning

● Document modeling, and representations

● Cleansing and image enhancement techniques for scanned documents

● Semantic understanding of scene OR document content

● Table identification and extraction from business documents

● Multi-lingual scene text and document understanding

● Character and text recognition

● Scene text detection and recognition

● Scene text Visual Question Answering 

● Document Visual Question Answering 

● Scene text detection and recognition

● Visual document retrieval

● Handwriting recognition 

● Online Handwriting recognition 

● Signature detection and verification

● Graphics recognition 

● Applications of document analysis

● Document simulation and synthesis

Paper Submissions

We are soliciting submissions of full length papers in PDF format, following CVPR 2020 submission guidelines. All submissions will be handled electronically via the workshop’s CMT Website.  Papers are limited to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. Please refer to the following files (same as the CVPR 2020 main conference format) for detailed formatting instructions:

Papers that are not properly anonymized, or do not use the template, or have more than eight pages (excluding references) will be rejected without review.

Submission Site: https://cmt3.research.microsoft.com/WTDDLE2020

Deadlines

Call for Paper Deadlines:
Paper Submission Deadline: March 20, 2020 March 31, 2020
Notification to Authors: April 10, 2020 April 13, 2020
Workshop Camera Ready Due: April 17, 2020
Workshop Date: Full day June 15, 2020

Submission Site: https://cmt3.research.microsoft.com/WTDDLE2020

Challenge Dates:
Training Set Available: March 13, 2020
Test Set Available: April 15, 2020
Submission of Results: April 30, 2020
Workshop Date: Full day June 15, 2020

Challenge Site: https://rrc.cvc.uab.es/?ch=17

Contact

Email: cvpr2020.document.workshop [at] gmail.com


Sponsors

 

Design a site like this with WordPress.com
Get started