Continually refresh OCR and speak new text as it appears#15331
Conversation
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
|
I'm extremely delighted to see this feature. Thanks Jamie! And Leonard, of course. |
|
Hi, Great work, I'm glad to see this PR come up. Thanks |
|
Thanks @jcsteh and @leonardder--many people will welcome this feature!
Two comments/questions:
we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result
I don't know what library you would have to use, but wouldn't an image
comparison with last snapshot, to determine if there was actually a change
before re-recognizing, be more resource efficient? It might also allow a faster
refresh rate. Something like the same way you can detect motion with video
cameras, by comparing frames.
LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.
But will it prevent new text from appearing in the buffer? If you're doing
manual screen review of the buffer, and the text is updated: even if it isn't
spoken when it updates, the next line read will speak/braille something
different than it did just a second before.
|
|
Here is an addon that could also serve as inspiration for this PR, it works very very well with videos and there is no performance issue. However, it does not work in games etc. Is just for subtitles. But combining the works here could bring some nice advantages. cc: @maxe-hsieh maybe you can contribute as well. |
That's an interesting idea. It probably would be, though Windows 10 OCR is fast enough that I'm not sure whether it's worth it. This could be done in a follow-up if it does prove to be a problem, though. I too have no idea what you would use for this. I guess you could just compare the pixel arrays as a start.
It would, but this isn't necessarily a good thing. When I was initially working on this, I assumed faster refresh rate would always be better. It turns out that this isn't ideal for things like videos because the text is sometimes (often?) deliberately animated so that it appears slowly. If the refresh rate is too fast, you get spammed with updates. Currently, LiveText reports updates by line, so you'd get a lot of repeated text. Even if we went smaller than that (word), you'd still get a lot of tiny updates and some repeats.
No. See the known issues section of the PR description.
Yes it will. However, this is consistent with any other kind of review: object navigation, browse mode, screen review, etc. |
seanbudd
left a comment
There was a problem hiding this comment.
Thanks for picking this up @LeonarddeR and thanks for your original work @jcsteh
This looks almost ready
Oh. I didn't see that a setting was added to disable refreshing. So if you really want to, you can turn it off. |
|
In fact, refreshing is disabled by default. We can cahange that, but I think it should be opt-in rather than opt-out.
|
CyrilleB79
left a comment
There was a problem hiding this comment.
Thanks for this work.
Please find my review comments.
- There is an edge case when you can enable screen curtain while in OCR result. STR:
- Open vision settings
- disable "show a warning when loading screen curtain" checkbox
-
- tab back to "Make screen black" checkbox
- Press NVDA+R to OCR this checkbox
- Press space to activate it
Actual result: Screen curtain is enabled
Expected result: A ui.message should inform that screen curtain cannot be used while OCRing, especially if auto refresh is enabled.
-
Extra linting
This PR contains extra linting related to blank spaces. This makes harder to spot the real changes of this PR and may causegit blameto report non useful information, especially when the change is done on a line of code (less impact when the change is on a blank line or a comment). -
Most (if not all) options in the UI have an associated toggle script in
globalCommands.py. Could you add an unbound script to toggle auto refresh config value?
Thanks!
| Pressing enter or space will activate (normally click) the text at the cursor if possible. | ||
| Pressing escape dismisses the recognition result. | ||
|
|
||
| When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog. |
There was a problem hiding this comment.
Since the option "automatic refresh of the recognized content" is in "Windows OCR" panel, I imagine that it only applies to Windows OCR and not to any other recognizer (e.g. implemented by add-ons). Having this sentence in the general paragraph "Content Recognition" is then misleading since it may let think that the checkbox applies for any recognizer.
You may move this comment in "Windows OCR" paragraph instead.
| When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog. | |
| When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog. | |
There was a problem hiding this comment.
The refresh interval is now bound to the UWP provider.
There was a problem hiding this comment.
Actually, that was not my point since I had not noticed this, but this fully makes sense to have the refresh interval specifically linked to the UWP OCR provider and not to all providers.
My point was just regarding documentation, to move this sentence ("When you want to monitor ... of the [NVDA Settings #NVDASettings] dialog."). This sentence is currently in the general paragraph "Content Recognition" and I suggest to move it to the paragraph "Windows OCR", just below.
Sorry for my unclear previous comment with a useless suggestion.
| [uwpOcr] | ||
| language = string(default="") | ||
| autoRefresh = boolean(default=false) | ||
| autoRefreshIntervalMs = integer(default=1500,min=500) |
There was a problem hiding this comment.
It is not common to have the unit in the config param name:
| autoRefreshIntervalMs = integer(default=1500,min=500) | |
| autoRefreshInterval = integer(default=1500,min=500) | |
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
See test results for failed build of commit c2ef5e7ad8 |
|
@CyrilleB79 I figured that this pr has enough complexity for now. If you really need a toggle script, feel free to add it yourself in a follow up. I don't care much myself. |
See test results for failed build of commit 03f4f92b29 |
See test results for failed build of commit ee53c6bffe |
|
FYI @LeonarddeR, when I have auto refresh checked and when I have a recog result open, I get the following error each 1.5 second in the log: With add-ons disabled, the error disappears. So it's probably not an issue in this PR itself. But I just wanted to inform you. I may search for the culprit add-on, unless you already have an idea. |
|
@CyrilleB79 I have really no idea why this add-on kicks in here, but I'd figure it might also cause errors when diffing in command prompt or windows terminal. |
See test results for failed build of commit 12ac9ec32b |
Qchristensen
left a comment
There was a problem hiding this comment.
User guide change looks good.
… OCR result (#17740) Discussed in #15331 (review) and following Summary of the issue: Most options in the UI have an associated toggle script (assigned or unassigned). In Windows OCR settings, "Periodically refresh recognized content" has no such toggle script. Description of user facing changes An unassigned command has been added to toggle the value of "Periodically refresh recognized content". In an OCR result, using this script takes immediately effect. Description of development approach Use the toggle helper fonction defined for global commands. If the final state is enabled, schedule a new recognition If the final state is disabled, nothing more needs to be done, the next recognition will not schedule subsequent ones.
Many thanks for @jcsteh for his valuable work!
Link to issue number:
Replaces #11270, adding configuration.
Fixes #2797.
Summary of the issue:
Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.
This could be useful for other scenarios as well such as virtual machines and games.
Description of how this pull request fixes the issue:
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible.
The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.
Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.
While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
Testing performed:
@jcsteh Tested on a video with scrolling text. After pressing NVDA+r, NVDA spoke the new text as it appeared and he could also follow it on his braille display. Unfortunately, he wasn't able to share the particular video.
Known issues with pull request:
While you can disable the spoken reporting using report dynamic content changes, you can't currently disable the actual automatic OCR refresh. This could be a problem for resource usage, though I haven't seen that in practice. It's possible there may be other use cases for disabling it.
Change log entry: