Continually refresh OCR and speak new text as it appears.#11270
Conversation
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
|
@jcsteh thank you so much for the work on this. Could you please have also a look at issue #2797? Especially the last 4 comments? |
|
This is wonderful! Another use case or next step for this could be to stop the recognition result from being dismissed when activating something in the result doesn't trigger a focus change. This happens in some very inaccessible programs where there is no a11y implementation at all, but also in DVD menus, etc. |
|
I don't really have the cycles to implement further features here. Also, a
lot of the ideas in #2797 assume that we'd get better accuracy using
display hooking. In reality, most modern apps don't use GDI these days and
there's no way to hook the text they draw to the screen. If you can't
access it is using screen review, it almost certainly doesn't do GDI. If
they do support screen review, it's already possible to get new text
reported using the DisplayModelLiveText class, which is what we do in the
putty app module, for example.
It's also worth noting that the OCR result you get with NVDA+r is not
displayed on screen. It's purely virtual, so it doesn't change what's on
the screen for sighted users at all.
|
|
It will be nice if we could set the recognition interval to accomodate the different speed of caption apparition... |
|
I completely agree with @ruifontes |
|
Maybe @vortex1024 the autor of the Lion addon can contribute here as well. I think his approach is not that bad and could possibly bring some ideas here as well. |
|
This implementation does not read anything on the screen and beeps once every half a second. What am I doing wrong? |
|
Like the existing OCR functionality, it recognises the review/navigator
object. So, if you want it to recognise the entire window, you'd need to
move through containing objects with the review cursor first. Other than
that, it's not going to read anything automatically unless new text is
added.
The beeping is just for testing and will be removed before this comes out
of draft status.
|
|
Love the concept of this. There's lots of great ideas for improvements, but let's get this feature in and then others can iterate from there. |
|
¡This is a very good idea! |
|
Hello, |
|
Tested this and it worked fine. I was able to OCR a video stream in VLC continuously. However, it beeps on every refresh. Is this a debugging feature or the intended behavior? |
|
As @jcsteh said, The beeping is just for testing and will be removed before this comes out |
|
Any updates on this PR? |
| Pressing escape dismisses the recognition result. | ||
| """ | ||
| #: How often (in ms) to perform recognition. | ||
| REFRESH_INTERVAL = 1500 |
There was a problem hiding this comment.
Consider making this a config option?
There was a problem hiding this comment.
If we decide that this interval should be configurable I believe we should be able to set it individually per recognition provider.
|
|
||
| def _onResult(self, result): | ||
| import tones # jtd | ||
| tones.beep(1660, 10) # jtd |
| def event_loseFocus(self): | ||
| super().event_loseFocus() | ||
| if self.recognizer.allowAutoRefresh: | ||
| self.stopMonitoring() |
There was a problem hiding this comment.
Maybe add a comment here similar to the one for LiveText.startMonitoring?
|
@jcsteh are you intending to bring this forward somehow, or is it abandoned? If the latter, I"m happy to take it. |
|
I do have vague intentions to continue working on this, but I keep not getting around to it yet. If you have time in the immediate future, I don't want to hold you back. :) Aside from rebasing, I think the only thing needed here is a GUI setting to enable/disable auto refresh. |
|
Superseded by #15331 |
Replaces #11270, adding configuration. Fixes #2797. Summary of the issue: Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears. This could be useful for other scenarios as well such as virtual machines and games. Description of how this pull request fixes the issue: When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
Link to issue number:
Fixes #2797.
Summary of the issue:
Some videos include text which is graphical only with no accompanying verbalisation, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognise is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.
This could be useful for other scenarios as well such as virtual machines and games.
Description of how this pull request fixes the issue:
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again.
The result document is updated with the new recognition result, keeping the previous cursor position if possible.
The LiveText NVDAObject is used to report any new text that has been added.
LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.
Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.
While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service.
Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
Testing performed:
Tested on a video with scrolling text. After pressing NVDA+r, NVDA spoke the new text as it appeared and I could also follow it on my braille display. Unfortunately, I'm not able to share the particular video.
Known issues with pull request:
While you can disable the spoken reporting using report dynamic content changes, you can't currently disable the actual automatic OCR refresh. This could be a problem for resource usage, though I haven't seen that in practice. It's possible there may be other use cases for disabling it, though I don't know of any.
Change log entry: