Inspiration

Understanding and extracting information from paper or electronic business documents like receipts, invoices, tax forms, reports, manuals, etc. is an essential task for blind workers in the modern workplace in remote and traditional offices all over the world.

blind office worker

E.g. a 2022 workplace technology study on blind employees by the American Foundation for the Blind found that:

Many employees must organize/submit receipts and/or complete expense reports. Of 310 participants, just over half (n=163, 52.6%) reported that they had to organize, submit, and complete receipts and/or expense reports.

While assitive technology like OCR and computer-vision based smartphone apps have made huge strides in making business documents accessible, in many cases the study still found these tasks aren't able to be completed independently by blind workers who aren't able to see receipts or read paper receipts:

The 163 participants were provided a list of statements that described their experience with these tasks with multiple responses permitted. Their responses included that they:
* Requested receipts be sent electronically (n=104)
* Used a person to assist with viewing and organizing receipts (n=83)
* Independently completed the process to submit an expense report (n=67)
* Used an app or OCR to view and organize receipts (n=58)
* Required assistance to complete the process to submit an expense report (n=58)
* Used a visual interpreting service when viewing and organizing receipts (n=36)
* Had someone else complete the process to submit an expense report because the process was not accessible (n=29)
* Were able to see and organize receipts (n=26)

Only 58 out of study 163 participants reported they were able to use an app or OCR to view and organize receipts and only 67 out of the 163 participants were eventually able to independently complete the process to submit a business report.

Understanding information contained in documents like training or HR manuals is also a challenge according to the study participants:

When I cannot read materials [at work trainings], I can't participate. Sometimes people assume I can't participate because I'm blind, when the real issue is that the materials either were not provided or aren't accessible. All of this hinders my career advancement and my ability to learn new skills and technology. I have to constantly find new ways of obtaining the information everyone else already has either by advocating for the materials to be made accessible or by finding another course that might be accessible. This causes stress and frustration, I feel like I'm behind in my work and that I can't measure up due to a lack of information. I know I've done my best and this isn't my fault, and I have had to ask supervisors and other officials to address the issues several times. It has been years, sometimes, before anything has changed, and some courses still aren't accessible as of this writing.

Today much of the research into assistive technology for blind people to understand print aand text focuses on smartphone apps that can read short documents like receipts and menus using the phone's camera or longer documents like letters or books e.g Envision App, Seeing AI, OneStep Reader...

But smartphone apps aren't always appropriate for instance in an office environment where blind workers must work for long periods with business documents and forms.and manuals and input information contained in these documents into other business programs.

img Blind worker in office. From AFB Workplace Tech Study

Documents in electronic format are vastly more accessible to people with visual disabilities. But reading and working with electronic documents is still a challenge e.g. the following video shows how an untagged PDF document sounds in a screen reader:

img https://www.youtube.com/watch?v=GaNwnsT4B5s

PDF documents must be properly tagged to make them accessible:

Just providing the PDF document isn’t enough. It must be made accessible. Tag headings to allow a person using assistive technology to navigate a large document rather than having to read through 50 pages to find the section of interest. Provide descriptive and contextual alt-text for images, graphs, charts, and infographics. Make all links descriptive and functional. Correctly tag lists and tables so they are not a string of words, but so that the relationship between the various elements is clear. Provide tooltips for form fields so it is clear what information needs to be input, and in what format.

But even with tagged PDF documents, navigating these documents in a traditional PDF viewer can be challenging. GUI operating environments like Windows and MacOS and Android, and document formats like HTML and PDF have made huge strides in becoming accessible to visually impaired and disabled users. But user interfaces today still make fundamental assumptions about the medium of presentation of information and the mode of interaction with users, which can make using desktop or Web applications and forms or documents frustrating and time-consuming for sight-impaired users.

People who are blind tend to visualize content in a linear path rather than a 2D view of the document or page. But navigation through desktop and web applications is still spatially oriented on a persistent visual surface and visual users can see and immediately memorize and rank in importance navigation elements like windows, menus, trees, buttons, text, headers et.al, while using a visual input marker like a mouse cursor to select the desired element or content they need. Information like tables or forms when presented visually use the visual layout as an important part of the meaning and applications rely on a user's ability to quickly understand how the visual layout of elements prioritizes information and the steps needed to complete a task or process.

Non-visual users who rely on assistive technology like screen-readers must often wade through a sea of elements and text before finding the desired content and must rely on slow trial-and-error, repetition, and memory to be able to efficiently navigate and understand a document GUI.

There's a definitive need for assistive technology for blind employees to enable them to efficiently navigate, query, and understand structured business documents that can allow them to work as efficiently as their sighted colleagues.

What it does

Victor Document Understanding is a desktop auditory user-interface for blind employees to add in navigation, understanding, and querying of business documents like invoices, receipts, tax forms, training manuals, and other types of structured business documents employees must work with during their day. In contrast to appps like One Reader or Envision App, Victor Document Understanding is specifically designed to understand, navigate and query structured multipage business documents that are laid out using fields and tables and to incorporate structured information stored in knowledge bases about business processes and document handling that can be accessed by the blind employee. Victor uses Azure Cognitive Services Form Recognizer to extract structured information from electronic documents or paper documents that are scanned without requiring the documents to be tagged. ![img](https://challengepost-s3-challengepost.netdna-ssl.com/photos/production/software_photos/001/930/055/datas/original.png

Knowledge about documents, business processes, employee training, and so on can be stored in knowledge bases using Azure QnA Maker:

img

as an acessible alternative to employee training manuals and can queried using the Victor conversational user interface. Victor Document Understanding uses a simplified, conversation-driven interface powered by natural language understanding that can run on a character-based terminal on Windows or on Linux without the need for a GUI. Victor produces line-by-line output that is easily read by screen readers and other assistive technology and uses line-driven interactive input that can be easily entered via any kind of keyboard or character input device, or via desktop speech recognition for users without access to such devices.

Victor is desktop software designed for employees who are blind or sight-impaired or who are otherwise not able to effectively use a traditional GUI with mouse or touchscreen input, and must rely on assistive technologies like screen-readers or braille displays. It allows these employees to work effectively either with electronic documents as PDFs or with paper documents scanned via an attached desktop scanner.

How we built it

Victor Document Understanding is a .NET console app that can run either on Windows or Linux and uses the Azure Cognitive Services C# SDK.

What's next for Victor Document Understanding

I plan to reach out to /r/Blind and get feedback on if Victor is helpful to blind workers.

Built With

Share this project:

Updates