Continually refresh OCR and speak new text as it appears. by jcsteh · Pull Request #11270 · nvaccess/nvda

jcsteh · 2020-06-18T02:53:07Z

Link to issue number:

Summary of the issue:

Some videos include text which is graphical only with no accompanying verbalisation, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognise is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.

This could be useful for other scenarios as well such as virtual machines and games.

Description of how this pull request fixes the issue:

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again.
The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added.
LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service.
Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

Testing performed:

Tested on a video with scrolling text. After pressing NVDA+r, NVDA spoke the new text as it appeared and I could also follow it on my braille display. Unfortunately, I'm not able to share the particular video.

Known issues with pull request:

While you can disable the spoken reporting using report dynamic content changes, you can't currently disable the actual automatic OCR refresh. This could be a problem for resource usage, though I haven't seen that in practice. It's possible there may be other use cases for disabling it, though I don't know of any.

Change log entry:

New Features:
- After recognizing content with Windows 10 OCR using NVDA+r, NVDA will now continually update the recognition, speaking new text as it appears.
  - You can disable speaking of new text by turning off report dynamic content changes (pressing NVDA+5).

Changes for Developers:
- ContentRecognizers can specify whether they want to allow automatic, periodic refresh using the new allowAutoRefresh attribute.

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

Adriani90 · 2020-06-18T05:55:30Z

@jcsteh thank you so much for the work on this. Could you please have also a look at issue #2797? Especially the last 4 comments?
I wonder if it is necessary to open the OCR window in order for this to work? This might not be ideal in a game menu where text should be reported when the focus changes to a menu item. Or imagine a case where you watch a video with subtitles together with a sighted person while nvda is speaking via another soundcard and you hear the subtitle via a bluetooth inear. In that case the screen should not show the Ocr window.

LeonarddeR · 2020-06-18T07:48:53Z

This is wonderful!

Another use case or next step for this could be to stop the recognition result from being dismissed when activating something in the result doesn't trigger a focus change. This happens in some very inaccessible programs where there is no a11y implementation at all, but also in DVD menus, etc.

jcsteh · 2020-06-18T09:22:31Z

I don't really have the cycles to implement further features here. Also, a lot of the ideas in #2797 assume that we'd get better accuracy using display hooking. In reality, most modern apps don't use GDI these days and there's no way to hook the text they draw to the screen. If you can't access it is using screen review, it almost certainly doesn't do GDI. If they do support screen review, it's already possible to get new text reported using the DisplayModelLiveText class, which is what we do in the putty app module, for example. It's also worth noting that the OCR result you get with NVDA+r is not displayed on screen. It's purely virtual, so it doesn't change what's on the screen for sighted users at all.

ruifontes · 2020-06-18T10:18:58Z

It will be nice if we could set the recognition interval to accomodate the different speed of caption apparition...

burakyuksek · 2020-06-18T12:22:55Z

I completely agree with @ruifontes

Adriani90 · 2020-06-18T22:08:19Z

Maybe @vortex1024 the autor of the Lion addon can contribute here as well. I think his approach is not that bad and could possibly bring some ideas here as well.

OzancanKaratas · 2020-06-19T17:27:50Z

This implementation does not read anything on the screen and beeps once every half a second. What am I doing wrong?

jcsteh · 2020-06-21T23:03:36Z

Like the existing OCR functionality, it recognises the review/navigator object. So, if you want it to recognise the entire window, you'd need to move through containing objects with the review cursor first. Other than that, it's not going to read anything automatically unless new text is added. The beeping is just for testing and will be removed before this comes out of draft status.

jage9 · 2020-06-23T22:17:05Z

Love the concept of this. There's lots of great ideas for improvements, but let's get this feature in and then others can iterate from there.

Carlos-EstebanM · 2020-07-05T17:43:00Z

¡This is a very good idea!
Now I use the add-on Lion, but this function in the NVDA core is good for many cases.

burakyuksek · 2020-07-29T19:19:56Z

Hello,
Can you please add a feature to this pull request which makes the virtual window optional? NVDA just speaking the text as it changes would make this feature very useful for reading game menus and would eliminate the use of external addons.

OzancanKaratas · 2020-08-05T17:50:12Z

@jcsteh, fixes the issue #2797 if it works well. Please update your pull request description.

bramd · 2020-09-13T18:01:26Z

Tested this and it worked fine. I was able to OCR a video stream in VLC continuously. However, it beeps on every refresh. Is this a debugging feature or the intended behavior?

burakyuksek · 2020-09-14T14:09:14Z

As @jcsteh said, The beeping is just for testing and will be removed before this comes out
of draft status.

codeofdusk · 2020-10-08T05:19:42Z

Any updates on this PR?

codeofdusk · 2021-01-12T00:18:58Z

 	Pressing escape dismisses the recognition result.
 	"""
+	#: How often (in ms) to perform recognition.
+	REFRESH_INTERVAL = 1500


Consider making this a config option?

If we decide that this interval should be configurable I believe we should be able to set it individually per recognition provider.

codeofdusk · 2021-01-12T00:21:55Z

+
+	def _onResult(self, result):
+		import tones  # jtd
+		tones.beep(1660, 10)  # jtd


What does "jtd" mean?

codeofdusk · 2021-01-12T00:23:34Z

+	def event_loseFocus(self):
+		super().event_loseFocus()
+		if self.recognizer.allowAutoRefresh:
+			self.stopMonitoring()


Maybe add a comment here similar to the one for LiveText.startMonitoring?

LeonarddeR · 2023-07-31T19:12:23Z

@jcsteh are you intending to bring this forward somehow, or is it abandoned? If the latter, I"m happy to take it.

jcsteh · 2023-07-31T22:03:19Z

I do have vague intentions to continue working on this, but I keep not getting around to it yet. If you have time in the immediate future, I don't want to hold you back. :)

Aside from rebasing, I think the only thing needed here is a GUI setting to enable/disable auto refresh.

LeonarddeR · 2023-08-25T16:28:30Z

Superseded by #15331

Replaces #11270, adding configuration. Fixes #2797. Summary of the issue: Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears. This could be useful for other scenarios as well such as virtual machines and games. Description of how this pull request fixes the issue: When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

jcsteh force-pushed the continualOcr branch from fed1c61 to ff69e6f Compare June 18, 2020 02:57

codeofdusk suggested changes Jan 12, 2021

View reviewed changes

ABuffEr mentioned this pull request Jan 16, 2021

Warning when using OCR when screen curtain is enabled #11911

Closed

seanbudd mentioned this pull request Aug 18, 2021

Windows 10 OCR refreshable #12745

Closed

LeonarddeR mentioned this pull request Aug 25, 2023

Continually refresh OCR and speak new text as it appears #15331

Merged

LeonarddeR closed this Aug 25, 2023

Adriani90 mentioned this pull request Sep 4, 2023

Auto language detection based on script detection In t2990 review #7629

Closed

jcsteh deleted the continualOcr branch May 25, 2026 04:01

Uh oh!

Conversation

jcsteh commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

Uh oh!

Adriani90 commented Jun 18, 2020

Uh oh!

LeonarddeR commented Jun 18, 2020

Uh oh!

jcsteh commented Jun 18, 2020 via email

Uh oh!

ruifontes commented Jun 18, 2020

Uh oh!

burakyuksek commented Jun 18, 2020

Uh oh!

Adriani90 commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OzancanKaratas commented Jun 19, 2020

Uh oh!

jcsteh commented Jun 21, 2020 via email

Uh oh!

jage9 commented Jun 23, 2020

Uh oh!

Carlos-EstebanM commented Jul 5, 2020

Uh oh!

burakyuksek commented Jul 29, 2020

Uh oh!

OzancanKaratas commented Aug 5, 2020

Uh oh!

bramd commented Sep 13, 2020

Uh oh!

burakyuksek commented Sep 14, 2020

Uh oh!

codeofdusk commented Oct 8, 2020

Uh oh!

codeofdusk Jan 12, 2021

Choose a reason for hiding this comment

Uh oh!

lukaszgo1 Jan 14, 2021

Choose a reason for hiding this comment

Uh oh!

codeofdusk Jan 12, 2021

Choose a reason for hiding this comment

Uh oh!

codeofdusk Jan 12, 2021

Choose a reason for hiding this comment

Uh oh!

LeonarddeR commented Jul 31, 2023

Uh oh!

jcsteh commented Jul 31, 2023

Uh oh!

LeonarddeR commented Aug 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

jcsteh commented Jun 18, 2020 •

edited

Loading

Adriani90 commented Jun 18, 2020 •

edited

Loading