Skip to content

Continually refresh OCR and speak new text as it appears.#11270

Closed
jcsteh wants to merge 1 commit into
nvaccess:masterfrom
jcsteh:continualOcr
Closed

Continually refresh OCR and speak new text as it appears.#11270
jcsteh wants to merge 1 commit into
nvaccess:masterfrom
jcsteh:continualOcr

Conversation

@jcsteh

@jcsteh jcsteh commented Jun 18, 2020

Copy link
Copy Markdown
Contributor

Link to issue number:

Fixes #2797.

Summary of the issue:

Some videos include text which is graphical only with no accompanying verbalisation, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognise is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.

This could be useful for other scenarios as well such as virtual machines and games.

Description of how this pull request fixes the issue:

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again.
The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added.
LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service.
Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

Testing performed:

Tested on a video with scrolling text. After pressing NVDA+r, NVDA spoke the new text as it appeared and I could also follow it on my braille display. Unfortunately, I'm not able to share the particular video.

Known issues with pull request:

While you can disable the spoken reporting using report dynamic content changes, you can't currently disable the actual automatic OCR refresh. This could be a problem for resource usage, though I haven't seen that in practice. It's possible there may be other use cases for disabling it, though I don't know of any.

Change log entry:

New Features:
- After recognizing content with Windows 10 OCR using NVDA+r, NVDA will now continually update the recognition, speaking new text as it appears.
  - You can disable speaking of new text by turning off report dynamic content changes (pressing NVDA+5).

Changes for Developers:
- ContentRecognizers can specify whether they want to allow automatic, periodic refresh using the new allowAutoRefresh attribute.

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again.
The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added.
LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service.
Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
@Adriani90

Copy link
Copy Markdown
Collaborator

@jcsteh thank you so much for the work on this. Could you please have also a look at issue #2797? Especially the last 4 comments?
I wonder if it is necessary to open the OCR window in order for this to work? This might not be ideal in a game menu where text should be reported when the focus changes to a menu item. Or imagine a case where you watch a video with subtitles together with a sighted person while nvda is speaking via another soundcard and you hear the subtitle via a bluetooth inear. In that case the screen should not show the Ocr window.

@LeonarddeR

Copy link
Copy Markdown
Collaborator

This is wonderful!

Another use case or next step for this could be to stop the recognition result from being dismissed when activating something in the result doesn't trigger a focus change. This happens in some very inaccessible programs where there is no a11y implementation at all, but also in DVD menus, etc.

@jcsteh

jcsteh commented Jun 18, 2020 via email

Copy link
Copy Markdown
Contributor Author

@ruifontes

Copy link
Copy Markdown
Contributor

It will be nice if we could set the recognition interval to accomodate the different speed of caption apparition...

@burakyuksek

Copy link
Copy Markdown
Contributor

I completely agree with @ruifontes

@Adriani90

Adriani90 commented Jun 18, 2020

Copy link
Copy Markdown
Collaborator

Maybe @vortex1024 the autor of the Lion addon can contribute here as well. I think his approach is not that bad and could possibly bring some ideas here as well.

@OzancanKaratas

Copy link
Copy Markdown
Collaborator

This implementation does not read anything on the screen and beeps once every half a second. What am I doing wrong?

@jcsteh

jcsteh commented Jun 21, 2020 via email

Copy link
Copy Markdown
Contributor Author

@jage9

jage9 commented Jun 23, 2020

Copy link
Copy Markdown
Contributor

Love the concept of this. There's lots of great ideas for improvements, but let's get this feature in and then others can iterate from there.

@Carlos-EstebanM

Copy link
Copy Markdown

¡This is a very good idea!
Now I use the add-on Lion, but this function in the NVDA core is good for many cases.

@burakyuksek

Copy link
Copy Markdown
Contributor

Hello,
Can you please add a feature to this pull request which makes the virtual window optional? NVDA just speaking the text as it changes would make this feature very useful for reading game menus and would eliminate the use of external addons.

@OzancanKaratas

Copy link
Copy Markdown
Collaborator

@jcsteh, fixes the issue #2797 if it works well. Please update your pull request description.

@bramd

bramd commented Sep 13, 2020

Copy link
Copy Markdown
Contributor

Tested this and it worked fine. I was able to OCR a video stream in VLC continuously. However, it beeps on every refresh. Is this a debugging feature or the intended behavior?

@burakyuksek

Copy link
Copy Markdown
Contributor

As @jcsteh said, The beeping is just for testing and will be removed before this comes out
of draft status.

@codeofdusk

Copy link
Copy Markdown
Contributor

Any updates on this PR?

Pressing escape dismisses the recognition result.
"""
#: How often (in ms) to perform recognition.
REFRESH_INTERVAL = 1500

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this a config option?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide that this interval should be configurable I believe we should be able to set it individually per recognition provider.


def _onResult(self, result):
import tones # jtd
tones.beep(1660, 10) # jtd

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "jtd" mean?

def event_loseFocus(self):
super().event_loseFocus()
if self.recognizer.allowAutoRefresh:
self.stopMonitoring()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here similar to the one for LiveText.startMonitoring?

@LeonarddeR

Copy link
Copy Markdown
Collaborator

@jcsteh are you intending to bring this forward somehow, or is it abandoned? If the latter, I"m happy to take it.

@jcsteh

jcsteh commented Jul 31, 2023

Copy link
Copy Markdown
Contributor Author

I do have vague intentions to continue working on this, but I keep not getting around to it yet. If you have time in the immediate future, I don't want to hold you back. :)

Aside from rebasing, I think the only thing needed here is a GUI setting to enable/disable auto refresh.

@LeonarddeR

Copy link
Copy Markdown
Collaborator

Superseded by #15331

@LeonarddeR LeonarddeR closed this Aug 25, 2023
seanbudd pushed a commit that referenced this pull request Sep 1, 2023
Replaces #11270, adding configuration.
Fixes #2797.

Summary of the issue:
Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.

This could be useful for other scenarios as well such as virtual machines and games.

Description of how this pull request fixes the issue:
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
@jcsteh jcsteh deleted the continualOcr branch May 25, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for reading subtitles in videos, i.e. continuous and refreshable OCR