Continually refresh OCR and speak new text as it appears by LeonarddeR · Pull Request #15331 · nvaccess/nvda

LeonarddeR · 2023-08-25T16:27:55Z

Many thanks for @jcsteh for his valuable work!

Link to issue number:

Replaces #11270, adding configuration.
Fixes #2797.

Summary of the issue:

Some videos include text which is graphical only with no accompanying verbalization, thus making it inaccessible to screen reader users. OCR is very useful for this purpose. However, having to repeatedly and manually dismiss and recognize is tedious and inefficient. It would be useful if NVDA could do this automatically, reporting new text as it appears.

This could be useful for other scenarios as well such as virtual machines and games.

Description of how this pull request fixes the issue:

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible.

The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur.

While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

Testing performed:

@jcsteh Tested on a video with scrolling text. After pressing NVDA+r, NVDA spoke the new text as it appeared and he could also follow it on his braille display. Unfortunately, he wasn't able to share the particular video.

Known issues with pull request:

While you can disable the spoken reporting using report dynamic content changes, you can't currently disable the actual automatic OCR refresh. This could be a problem for resource usage, though I haven't seen that in practice. It's possible there may be other use cases for disabling it.

Change log entry:

New Features:
- NVDA is now able to continually update the result when performing optical character recognition (OCR), speaking new text as it appears. To enable this functionality, enable the option "Periodically refresh recognized content" in the Windows OCR category of NVDA's settings dialog. After recognizing content with Windows 10 OCR using NVDA+r, 
  - You can disable speaking of new text by turning off report dynamic content changes (pressing NVDA+5).
Changes for Developers:
- ContentRecognizers can specify whether they want to allow automatic, periodic refresh using the new allowAutoRefresh attribute.

When the user performs content recognition (e.g. NVDA+r for Windows 10 OCR), we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result, keeping the previous cursor position if possible. The LiveText NVDAObject is used to report any new text that has been added. LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken. Because the result document is updated, this means braille is updated as well, allowing braille users to see changes as they occur. While Windows 10 OCR is local and relatively fast, other recognizers might not be; e.g. they might be resource intensive or use an internet service. Recognizers can thus specify whether they want to support auto refresh using the allowAutoRefresh attribute, which defaults to False.

ABuffEr · 2023-08-25T17:14:43Z

I'm extremely delighted to see this feature. Thanks Jamie! And Leonard, of course.

cary-rowen · 2023-08-25T17:25:18Z

Hi,

Great work, I'm glad to see this PR come up.
Especially for third-party OCR engines, they can easily use this feature, which is great.

Thanks

XLTechie · 2023-08-26T03:19:25Z

Thanks @jcsteh and @leonardder--many people will welcome this feature! Two comments/questions:

we now periodically (every 1.5 seconds) recognize the same area of the screen again. The result document is updated with the new recognition result

I don't know what library you would have to use, but wouldn't an image comparison with last snapshot, to determine if there was actually a change before re-recognizing, be more resource efficient? It might also allow a faster refresh rate. Something like the same way you can detect motion with video cameras, by comparing frames.

LiveText honours the report dynamic content changes setting, so turning this off will prevent new text from being spoken.

But will it prevent new text from appearing in the buffer? If you're doing manual screen review of the buffer, and the text is updated: even if it isn't spoken when it updates, the next line read will speak/braille something different than it did just a second before.

Adriani90 · 2023-08-28T19:37:32Z

Here is an addon that could also serve as inspiration for this PR, it works very very well with videos and there is no performance issue. However, it does not work in games etc. Is just for subtitles. But combining the works here could bring some nice advantages.
https://github.com/maxe-hsieh/subtitle_reader

cc: @maxe-hsieh maybe you can contribute as well.

jcsteh · 2023-08-28T22:55:18Z

wouldn't an image comparison with last snapshot, to determine if there was actually a change before re-recognizing, be more resource efficient?

That's an interesting idea. It probably would be, though Windows 10 OCR is fast enough that I'm not sure whether it's worth it. This could be done in a follow-up if it does prove to be a problem, though. I too have no idea what you would use for this. I guess you could just compare the pixel arrays as a start.

It might also allow a faster refresh rate.

It would, but this isn't necessarily a good thing. When I was initially working on this, I assumed faster refresh rate would always be better. It turns out that this isn't ideal for things like videos because the text is sometimes (often?) deliberately animated so that it appears slowly. If the refresh rate is too fast, you get spammed with updates. Currently, LiveText reports updates by line, so you'd get a lot of repeated text. Even if we went smaller than that (word), you'd still get a lot of tiny updates and some repeats.

But will it prevent new text from appearing in the buffer?

No. See the known issues section of the PR description.

If you're doing manual screen review of the buffer, and the text is updated: even if it isn't spoken when it updates, the next line read will speak/braille something different than it did just a second before.

Yes it will. However, this is consistent with any other kind of review: object navigation, browse mode, screen review, etc.

seanbudd

Thanks for picking this up @LeonarddeR and thanks for your original work @jcsteh

This looks almost ready

jcsteh · 2023-08-29T05:46:40Z

But will it prevent new text from appearing in the buffer?

Oh. I didn't see that a setting was added to disable refreshing. So if you really want to, you can turn it off.

LeonarddeR · 2023-08-29T05:58:13Z

In fact, refreshing is disabled by default. We can cahange that, but I think it should be opt-in rather than opt-out.

CyrilleB79

Thanks for this work.

Please find my review comments.

There is an edge case when you can enable screen curtain while in OCR result. STR:

Open vision settings
disable "show a warning when loading screen curtain" checkbox
- tab back to "Make screen black" checkbox
Press NVDA+R to OCR this checkbox
Press space to activate it
Actual result: Screen curtain is enabled
Expected result: A ui.message should inform that screen curtain cannot be used while OCRing, especially if auto refresh is enabled.

Extra linting
This PR contains extra linting related to blank spaces. This makes harder to spot the real changes of this PR and may cause git blame to report non useful information, especially when the change is done on a line of code (less impact when the change is on a blank line or a comment).
Most (if not all) options in the UI have an associated toggle script in globalCommands.py. Could you add an unbound script to toggle auto refresh config value?

Thanks!

CyrilleB79 · 2023-08-29T08:21:11Z

 Pressing enter or space will activate (normally click) the text at the cursor if possible.
 Pressing escape dismisses the recognition result.

+When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.


Since the option "automatic refresh of the recognized content" is in "Windows OCR" panel, I imagine that it only applies to Windows OCR and not to any other recognizer (e.g. implemented by add-ons). Having this sentence in the general paragraph "Content Recognition" is then misleading since it may let think that the checkbox applies for any recognizer.

You may move this comment in "Windows OCR" paragraph instead.

Suggested change

When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.

When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.

The refresh interval is now bound to the UWP provider.

Actually, that was not my point since I had not noticed this, but this fully makes sense to have the refresh interval specifically linked to the UWP OCR provider and not to all providers.

My point was just regarding documentation, to move this sentence ("When you want to monitor ... of the [NVDA Settings #NVDASettings] dialog."). This sentence is currently in the general paragraph "Content Recognition" and I suggest to move it to the paragraph "Windows OCR", just below.

Sorry for my unclear previous comment with a useless suggestion.

CyrilleB79 · 2023-08-29T08:42:22Z

 [uwpOcr]
 	language = string(default="")
+	autoRefresh = boolean(default=false)
+	autoRefreshIntervalMs = integer(default=1500,min=500)


It is not common to have the unit in the config param name:

Suggested change

autoRefreshIntervalMs = integer(default=1500,min=500)

autoRefreshInterval = integer(default=1500,min=500)

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

AppVeyorBot · 2023-08-30T16:17:06Z

PASS: Translation comments check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/m40u43f8gc8w91l7/artifacts/output/nvda_snapshot_pr15331-29013,c2ef5e7a.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 26.8,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 22.5,
FINISH_END 0.1

See test results for failed build of commit c2ef5e7ad8

LeonarddeR · 2023-08-30T16:18:42Z

@CyrilleB79 I figured that this pr has enough complexity for now. If you really need a toggle script, feel free to add it yourself in a follow up. I don't care much myself.

AppVeyorBot · 2023-08-30T17:10:36Z

FAIL: Translation comments check. Translation comments missing or unexpectedly included. See build log for more information.
PASS: Unit tests.
PASS: Lint check.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/qhtyck5yg134fkyp/artifacts/output/nvda_snapshot_pr15331-29014,03f4f92b.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.9,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 26.1,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 22.7,
FINISH_END 0.1

See test results for failed build of commit 03f4f92b29

AppVeyorBot · 2023-08-30T18:19:24Z

FAIL: Translation comments check. Translation comments missing or unexpectedly included. See build log for more information.
PASS: Unit tests.
PASS: Lint check.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/bepb2x4jg04srjd3/artifacts/output/nvda_snapshot_pr15331-29016,ee53c6bf.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 23.8,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 22.3,
FINISH_END 0.1

See test results for failed build of commit ee53c6bffe

CyrilleB79 · 2023-08-30T21:01:31Z

FYI @LeonarddeR, when I have auto refresh checked and when I have a recog result open, I get the following error each 1.5 second in the log:

ERROR - diffHandler.DiffMatchPatch.diff (22:57:31.573) - RefreshableRecogResultNVDAObject._monitorThread (10000):
Exception in DMP, falling back to difflib
Traceback (most recent call last):
  File "diffHandler.py", line 72, in diff
    self._initialize()
  File "diffHandler.py", line 59, in _initialize
    stdout=subprocess.PIPE
  File "C:\Users\Cyrille\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 753, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "C:\Users\Cyrille\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 1107, in _get_handles
    errwrite = self._make_inheritable(errwrite)
  File "C:\Users\Cyrille\AppData\Local\Programs\Python\Python37-32\lib\subprocess.py", line 1119, in _make_inheritable
    _winapi.DUPLICATE_SAME_ACCESS)
OSError: [WinError 50] Cette demande n’est pas prise en charge

With add-ons disabled, the error disappears. So it's probably not an issue in this PR itself. But I just wanted to inform you. I may search for the culprit add-on, unless you already have an idea.

LeonarddeR · 2023-08-31T05:04:04Z

@CyrilleB79 I have really no idea why this add-on kicks in here, but I'd figure it might also cause errors when diffing in command prompt or windows terminal.

AppVeyorBot · 2023-08-31T07:14:28Z

PASS: Translation comments check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/nwtel0dwrhbyc1dv/artifacts/output/nvda_snapshot_pr15331-29022,12ac9ec3.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 24.6,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 20.9,
FINISH_END 0.1

See test results for failed build of commit 12ac9ec32b

Qchristensen

User guide change looks good.

… OCR result (#17740) Discussed in #15331 (review) and following Summary of the issue: Most options in the UI have an associated toggle script (assigned or unassigned). In Windows OCR settings, "Periodically refresh recognized content" has no such toggle script. Description of user facing changes An unassigned command has been added to toggle the value of "Periodically refresh recognized content". In an OCR result, using this script takes immediately effect. Description of development approach Use the toggle helper fonction defined for global commands. If the final state is enabled, schedule a new recognition If the final state is disabled, nothing more needs to be done, the next recognition will not schedule subsequent ones.

jcsteh and others added 4 commits August 24, 2023 21:58

Add config option and sanity fixes

04223ec

Update user guide

3a407b7

Disallow screen curtain when recognizing

43d9ad6

LeonarddeR requested review from a team as code owners August 25, 2023 16:27

LeonarddeR requested review from Qchristensen and seanbudd August 25, 2023 16:27

LeonarddeR mentioned this pull request Aug 25, 2023

Continually refresh OCR and speak new text as it appears. #11270

Closed

Add note about refresh interval

7301fc2

seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Aug 28, 2023

seanbudd reviewed Aug 29, 2023

View reviewed changes

seanbudd marked this pull request as draft August 29, 2023 05:39

CyrilleB79 reviewed Aug 29, 2023

View reviewed changes

LeonarddeR and others added 4 commits August 30, 2023 16:57

Merge remote-tracking branch 'origin/master' into continualOcr

54aa157

Apply suggestions from code review

fd3f2b6

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Many review actions

a6b4f36

Fix edge case for screen curtain

7d43fa5

LeonarddeR added 2 commits August 30, 2023 18:26

Remove debugging

07820a0

Undo whitespace changes

c1a18bb

Two remainings

d7a5a40

Fix checkPot

72ebf0c

LeonarddeR marked this pull request as ready for review August 30, 2023 18:52

Move sentence in user guide

95897a7

seanbudd approved these changes Sep 1, 2023

View reviewed changes

Qchristensen approved these changes Sep 1, 2023

View reviewed changes

seanbudd added 2 commits September 1, 2023 12:53

Merge remote-tracking branch 'origin/master' into continualOcr

e9eedb1

update changes

1078b75

seanbudd merged commit ce32058 into nvaccess:master Sep 1, 2023

nvaccessAuto added this to the 2023.3 milestone Sep 1, 2023

Adriani90 mentioned this pull request Sep 4, 2023

Auto language detection based on script detection In t2990 review #7629

Closed

CyrilleB79 mentioned this pull request Sep 4, 2023

Introduce precommit hooks #15365

Closed

Adriani90 mentioned this pull request Sep 5, 2023

NVDA does not read the subtitles or alerts in youtube while using microsoft edge #8337

Open

Qchristensen mentioned this pull request Jul 23, 2024

Add (unassigned) input gesture to toggle periodically refresh OCR #16897

Closed

CyrilleB79 mentioned this pull request Feb 25, 2025

Add an unassigned command to toggle periodical refresh of the Windows OCR result #17740

Merged

5 tasks

LeonarddeR deleted the continualOcr branch August 23, 2025 06:27

	When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.
	When you want to monitor constantly changing content, such as when watching a video with sub titles, you can optionally enable automatic refresh of the recognized content in the [Windows OCR category #Win10OcrSettings] of the [NVDA Settings #NVDASettings] dialog.

	autoRefreshIntervalMs = integer(default=1500,min=500)
	autoRefreshInterval = integer(default=1500,min=500)

Uh oh!

Conversation

LeonarddeR commented Aug 25, 2023

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

Uh oh!

ABuffEr commented Aug 25, 2023

Uh oh!

cary-rowen commented Aug 25, 2023

Uh oh!

XLTechie commented Aug 26, 2023 via email

Uh oh!

Adriani90 commented Aug 28, 2023

Uh oh!

jcsteh commented Aug 28, 2023

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcsteh commented Aug 29, 2023

Uh oh!

LeonarddeR commented Aug 29, 2023 via email

Uh oh!

CyrilleB79 left a comment

Choose a reason for hiding this comment

Uh oh!

CyrilleB79 Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

LeonarddeR Aug 30, 2023

Choose a reason for hiding this comment

Uh oh!

CyrilleB79 Aug 30, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CyrilleB79 Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

AppVeyorBot commented Aug 30, 2023

Uh oh!

LeonarddeR commented Aug 30, 2023

Uh oh!

AppVeyorBot commented Aug 30, 2023

Uh oh!

AppVeyorBot commented Aug 30, 2023

Uh oh!

CyrilleB79 commented Aug 30, 2023

Uh oh!

LeonarddeR commented Aug 31, 2023

Uh oh!

AppVeyorBot commented Aug 31, 2023

Uh oh!

Qchristensen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants