Skip to content

Continually refresh OCR and speak new text as it appears#15331

Merged
seanbudd merged 16 commits into
nvaccess:masterfrom
LeonarddeR:continualOcr
Sep 1, 2023
Merged

Continually refresh OCR and speak new text as it appears#15331
seanbudd merged 16 commits into
nvaccess:masterfrom
LeonarddeR:continualOcr

Conversation

@LeonarddeR

Copy link
Copy Markdown
Collaborator

Many thanks for @jcsteh for his valuable work!

Link to issue number:

Replaces #11270, adding configuration.
Fixes #2797.

Summary of the issue:

Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.

This could be useful for other scenarios as well such as virtual machines and games.

Description of how this pull request fixes the issue:

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

Testing performed:

@jcsteh Tested on a video with scrolling text. After pressing NVDA+r, NVDA spoke the new text as it appeared and he could also follow it on his braille display. Unfortunately, he wasn't able to share the particular video.

Known issues with pull request:

While you can disable the spoken reporting using report dynamic content changes, you can't currently disable the actual automatic OCR refresh. This could be a problem for resource usage, though I haven't seen that in practice. It's possible there may be other use cases for disabling it.

Change log entry:

New Features:
- NVDA is now able to continually update the result when performing optical character recognition (OCR), speaking new text as it appears. To enable this functionality, enable the option "Periodically refresh recognized content" in the Windows OCR category of NVDA's settings dialog. After recognizing content with Windows 10 OCR using NVDA+r, 
  - You can disable speaking of new text by turning off report dynamic content changes (pressing NVDA+5).
Changes for Developers:
- ContentRecognizers can specify whether they want to allow automatic, periodic refresh using the new allowAutoRefresh attribute.

jcsteh and others added 4 commits August 24, 2023 21:58
When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again.
The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added.
LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service.
Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.
@ABuffEr

ABuffEr commented Aug 25, 2023

Copy link
Copy Markdown
Contributor

I'm extremely delighted to see this feature. Thanks Jamie! And Leonard, of course.

@cary-rowen

Copy link
Copy Markdown
Contributor

Hi,

Great work, I'm glad to see this PR come up.
Especially for third-party OCR engines, they can easily use this feature, which is great.

Thanks

@XLTechie

XLTechie commented Aug 26, 2023 via email

Copy link
Copy Markdown
Collaborator

@Adriani90

Copy link
Copy Markdown
Collaborator

Here is an addon that could also serve as inspiration for this PR, it works very very well with videos and there is no performance issue. However, it does not work in games etc. Is just for subtitles. But combining the works here could bring some nice advantages.
https://github.com/maxe-hsieh/subtitle_reader

cc: @maxe-hsieh maybe you can contribute as well.

@jcsteh

jcsteh commented Aug 28, 2023

Copy link
Copy Markdown
Contributor

wouldn't an image comparison with last snapshot, to determine if there was actually a change before re-recognizing, be more resource efficient?

That's an interesting idea. It probably would be, though Windows 10 OCR is fast enough that I'm not sure whether it's worth it. This could be done in a follow-up if it does prove to be a problem, though. I too have no idea what you would use for this. I guess you could just compare the pixel arrays as a start.

It might also allow a faster refresh rate.

It would, but this isn't necessarily a good thing. When I was initially working on this, I assumed faster refresh rate would always be better. It turns out that this isn't ideal for things like videos because the text is sometimes (often?) deliberately animated so that it appears slowly. If the refresh rate is too fast, you get spammed with updates. Currently, LiveText reports updates by line, so you'd get a lot of repeated text. Even if we went smaller than that (word), you'd still get a lot of tiny updates and some repeats.

But will it prevent new text from appearing in the buffer?

No. See the known issues section of the PR description.

If you're doing manual screen review of the buffer, and the text is updated: even if it isn't spoken when it updates, the next line read will speak/braille something different than it did just a second before.

Yes it will. However, this is consistent with any other kind of review: object navigation, browse mode, screen review, etc.

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Aug 28, 2023

@seanbudd seanbudd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up @LeonarddeR and thanks for your original work @jcsteh

This looks almost ready

Comment thread source/config/configSpec.py Outdated
Comment thread source/contentRecog/recogUi.py
Comment thread source/contentRecog/recogUi.py Outdated
Comment thread source/contentRecog/recogUi.py Outdated
Comment thread source/contentRecog/recogUi.py Outdated
Comment thread user_docs/en/userGuide.t2t Outdated
Comment thread user_docs/en/userGuide.t2t Outdated
@seanbudd seanbudd marked this pull request as draft August 29, 2023 05:39
@jcsteh

jcsteh commented Aug 29, 2023

Copy link
Copy Markdown
Contributor

But will it prevent new text from appearing in the buffer?

Oh. I didn't see that a setting was added to disable refreshing. So if you really want to, you can turn it off.

@LeonarddeR

LeonarddeR commented Aug 29, 2023 via email

Copy link
Copy Markdown
Collaborator Author

@CyrilleB79 CyrilleB79 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work.

Please find my review comments.

  1. There is an edge case when you can enable screen curtain while in OCR result. STR:
  • Open vision settings
  • disable "show a warning when loading screen curtain" checkbox
    • tab back to "Make screen black" checkbox
  • Press NVDA+R to OCR this checkbox
  • Press space to activate it
    Actual result: Screen curtain is enabled
    Expected result: A ui.message should inform that screen curtain cannot be used while OCRing, especially if auto refresh is enabled.
  1. Extra linting
    This PR contains extra linting related to blank spaces. This makes harder to spot the real changes of this PR and may cause git blame to report non useful information, especially when the change is done on a line of code (less impact when the change is on a blank line or a comment).

  2. Most (if not all) options in the UI have an associated toggle script in globalCommands.py. Could you add an unbound script to toggle auto refresh config value?

Thanks!

Comment thread user_docs/en/userGuide.t2t Outdated
Pressing enter or space will activate (normally click) the text at the cursor if possible.
Pressing escape dismisses the recognition result.

When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the option "automatic refresh of the recognized content" is in "Windows OCR" panel, I imagine that it only applies to Windows OCR and not to any other recognizer (e.g. implemented by add-ons). Having this sentence in the general paragraph "Content Recognition" is then misleading since it may let think that the checkbox applies for any recognizer.

You may move this comment in "Windows OCR" paragraph instead.

Suggested change
When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.
When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refresh interval is now bound to the UWP provider.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that was not my point since I had not noticed this, but this fully makes sense to have the refresh interval specifically linked to the UWP OCR provider and not to all providers.

My point was just regarding documentation, to move this sentence ("When you want to monitor ... of the [NVDA Settings #NVDASettings] dialog."). This sentence is currently in the general paragraph "Content Recognition" and I suggest to move it to the paragraph "Windows OCR", just below.

Sorry for my unclear previous comment with a useless suggestion.

Comment thread source/config/configSpec.py Outdated
Comment thread source/globalCommands.py Outdated
Comment thread source/config/configSpec.py Outdated
[uwpOcr]
language = string(default="")
autoRefresh = boolean(default=false)
autoRefreshIntervalMs = integer(default=1500,min=500)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not common to have the unit in the config param name:

Suggested change
autoRefreshIntervalMs = integer(default=1500,min=500)
autoRefreshInterval = integer(default=1500,min=500)

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit c2ef5e7ad8

@LeonarddeR

Copy link
Copy Markdown
Collaborator Author

@CyrilleB79 I figured that this pr has enough complexity for now. If you really need a toggle script, feel free to add it yourself in a follow up. I don't care much myself.

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 03f4f92b29

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit ee53c6bffe

@LeonarddeR LeonarddeR marked this pull request as ready for review August 30, 2023 18:52
@CyrilleB79

Copy link
Copy Markdown
Contributor

FYI @LeonarddeR, when I have auto refresh checked and when I have a recog result open, I get the following error each 1.5 second in the log:

ERROR - diffHandler.DiffMatchPatch.diff (22:57:31.573) - RefreshableRecogResultNVDAObject._monitorThread (10000):
Exception in DMP, falling back to difflib
Traceback (most recent call last):
  File "diffHandler.py", line 72, in diff
    self._initialize()
  File "diffHandler.py", line 59, in _initialize
    stdout=subprocess.PIPE
  File "C:\Users\Cyrille\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 753, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "C:\Users\Cyrille\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 1107, in _get_handles
    errwrite = self._make_inheritable(errwrite)
  File "C:\Users\Cyrille\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 1119, in _make_inheritable
    _winapi.DUPLICATE_SAME_ACCESS)
OSError: [WinError 50] Cette demande n’est pas prise en charge

With add-ons disabled, the error disappears. So it's probably not an issue in this PR itself. But I just wanted to inform you. I may search for the culprit add-on, unless you already have an idea.

@LeonarddeR

Copy link
Copy Markdown
Collaborator Author

@CyrilleB79 I have really no idea why this add-on kicks in here, but I'd figure it might also cause errors when diffing in command prompt or windows terminal.

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 12ac9ec32b

@Qchristensen Qchristensen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User guide change looks good.

@seanbudd seanbudd merged commit ce32058 into nvaccess:master Sep 1, 2023
@nvaccessAuto nvaccessAuto added this to the 2023.3 milestone Sep 1, 2023
seanbudd pushed a commit that referenced this pull request Feb 25, 2025
… OCR result (#17740)

Discussed in #15331 (review) and following

Summary of the issue:
Most options in the UI have an associated toggle script (assigned or unassigned). In Windows OCR settings, "Periodically refresh recognized content" has no such toggle script.

Description of user facing changes
An unassigned command has been added to toggle the value of "Periodically refresh recognized content". In an OCR result, using this script takes immediately effect.

Description of development approach
Use the toggle helper fonction defined for global commands.
If the final state is enabled, schedule a new recognition
If the final state is disabled, nothing more needs to be done, the next recognition will not schedule subsequent ones.
@LeonarddeR LeonarddeR deleted the continualOcr branch August 23, 2025 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for reading subtitles in videos, i.e. continuous and refreshable OCR