Show dialog to enable image desc once if it is not enabled in settings panel by tianzeshi-study · Pull Request #19243 · nvaccess/nvda

tianzeshi-study · 2025-11-20T08:21:30Z

Link to issue number:

Resolves #19097

Summary of the issue:

Users need to open settings panel to enable image description

Description of user facing changes:

If image description is not enabled in settings pannel, it will show a dialog to enable it in current session when press the keyboard shortcut

Description of developer facing changes:

Nonne

Description of development approach:

None

Testing strategy:

Disable image description in settings panel, then press the keyboard shortcut to enable it in a dialog.

Known issues with pull request:

After enabling it requires another key press for recognition, but I think this is acceptable

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

…s panel

seanbudd

Thanks @tianzeshi-study

CyrilleB79 · 2025-11-21T16:15:51Z

Why this choice of a per-session validity?

It's quite strange to validate the dialog box, and then, when you open the settings, the checkbox remains unchecked. It's worth noting that the fact that this parameter is temporary (only for the session) is not mentioned in the new enabling dialog box, what creates more confusion.

The only other parameter that I know which has both a temporary value and a persistent config value (provided you save the config) is the screen curtain. I already find this quite confusing for the screen curtain. But for AI description, I find it more confusing and even useless.

I'd recommend that we drop this temporary setting and that pressing yes just enable the AI description feature in settings. This UX would be similar to Windows History clipboard for example.

If you disagree with me, could you please provide the different use cases / user stories for:

AI Image description temporarily enabled, only for the current session
AI image description enabled in config

Thanks.

tianzeshi-study · 2025-11-21T16:29:04Z

Why this choice of a per-session validity?

I think the configuration should only be modified in the settings panel.
And enabling image description may cause NVDA to start late or use high memory, so it is more appropriate to just stay in the current session.

seanbudd · 2025-11-25T01:35:25Z

@tianzeshi-study - do our keyboard gestures set the config? I would assume they would

tianzeshi-study · 2025-11-26T07:11:10Z

@tianzeshi-study - do our keyboard gestures set the config? I would assume they would

No, they don't. They are also session-only.
Just like in this PR, the changes apply only to the current session, making it convenient to quickly enable or disable image descriptions within that session.

The config can only be set in settings panel.

tianzeshi-study · 2025-11-26T07:17:07Z

My initial thought was that the setting in the Settings panel should explicitly say “Enable image descriptions on startup,” rather than the more ambiguous “Enable image descriptions.”

And “Enable image descriptions” should apply only to the current session by default, without modifying the configuration.

CyrilleB79 · 2025-11-26T08:59:09Z

With this feature, there are 2 points where an information from the user is required:

Download the model if it is not already on the user's machine. It cannot be done automatically because:
- the size of the model is important and the user may not want to have NVDA size growing
- the download time may be important so the model cannot just be downloaded when the script is used for the first time
Enable the feature (i.e. load the model in memory). It cannot be done automatically because:
- doing this the first time the script is called may delay the answer time when the script is called for the first time
- doing at NVDA startup delays the startup, even if the model is loaded in a separate thread if I am correct.

This PR was addressing the second point. IMO for this, there was no need to have a dialog which just makes the UX less easy.
Note that I had not in mind either that the script to enable the feature was to enable the feature for the session.

Given that, my suggestion for the best UX is:

Do not speak anymore of enabled / disabled AI recognition feature. The feature should be present in NVDA and considered available as soon as the model has been downloaded
Do not ask the user if they want to load the captioner in memory. Do you really know many applications which ask for your confirmation before loading something in memory?
Since loading the model (the captioner) in memory may take some time (between 1 second and many seconds), give the user the choice, through the checkbox (or better, a 2-item combo-box) in the settings panel, when the captioner should be loaded in memory:
- at NVDA startup. Pro: the recognition script works immediately. Cons: NVDA startup is delayed.
- upon first call of the script. Pro: no impact on NVDA startup, and no impact at all if the feature is note used. Cons: The first script call may be delayed; if it is very long, a "Enabling image captioner..." message could be issued.
Remove the script to enable the feature / load the captioner in memory, as well as the confirm dialog of this PR

tianzeshi-study · 2025-11-26T15:00:47Z

4. Remove the script to enable the feature / load the captioner in memory, as well as the confirm dialog of this PR

For some advanced users, it may be necessary to unload the model when memory usage becomes too high, in order to ensure that NVDA continues running smoothly without stuttering. Without a toggle script, NVDA’s memory usage would remain permanently high.

CyrilleB79 · 2025-11-26T15:37:04Z

For some advanced users, it may be necessary to unload the model when memory usage becomes too high, in order to ensure that NVDA continues running smoothly without stuttering. Without a toggle script, NVDA’s memory usage would remain permanently high.

I do not understand the use case. Moreover if it's mainly for advanced users, does it mean that non-advanced user will never bother to unload the model from memory and thus, experience NVDA stuttering?

Again, unloading software components from memory is something very unusual that I have never seen in any other application. If it's really needed, this should be at least better documented.

tianzeshi-study · 2025-11-26T15:59:11Z

I do not understand the use case. Moreover if it's mainly for advanced users, does it mean that non-advanced user will never bother to unload the model from memory and thus, experience NVDA stuttering?

Again, unloading software components from memory is something very unusual that I have never seen in any other application. If it's really needed, this should be at least better documented.

Just like closing a large Microsoft Word document or shutting down a browser tab that is causing the system to lag.

Ordinary users may also need to learn to manage memory usage manually to avoid potential lag. They generally understand that when their computer becomes slow, they need to close something.

cary-rowen · 2025-11-26T23:22:15Z

I completely agree with @CyrilleB79

The current design of the merged Pull Request is fundamentally flawed in two main areas: User Experience (UX) and Design Philosophy.

Poor Temporary UX: The "session-only" enablement via a dialog is highly confusing. Users expect any confirmed "Enable" action to be saved persistently, not just for the current session.
Anti-Pattern Memory Management: Forcing users to manually unload the AI model (by toggling it off) to free up RAM is an unacceptable, counter-intuitive design choice. Memory management should be an automated function of the application (e.g., through configurable "load on demand" settings), not a manual task for the end-user.
Invalid Analogy: The comparison to closing browser tabs to "free memory" is weak. Users primarily close tabs because the content is no longer required, not because they are consciously engaging in memory resource management.

tianzeshi-study · 2025-11-27T03:42:30Z

* **Poor Temporary UX:** The "session-only" enablement via a dialog is highly confusing. Users expect any confirmed "Enable" action to be saved persistently, not just for the current session.

* **Anti-Pattern Memory Management:** Forcing users to manually **unload the AI model** (by toggling it off) to free up RAM is an unacceptable, **counter-intuitive** design choice. Memory management should be an automated function of the application (e.g., through configurable "load on demand" settings), not a manual task for the end-user.

* **Invalid Analogy:** The comparison to closing browser tabs to "free memory" is weak. Users primarily close tabs because the content is **no longer required**, not because they are consciously engaging in memory resource management.

First, no one is being forced to unload the model manually — it’s an optional feature, and it doesn’t even have a gesture assigned.
Second, when a computer becomes slow, users naturally choose to close things they no longer need. This is not contradictory.

… settings panel (#19243)" This reverts commit 480b087.

Reverts: - #18475 - #19036 - #19024 - #19055 - #19057 - #19178 - #19243 - #19327 - Partial revert: #19342 ### Issues fixed Fixes #19298 ### Issues reopened Reopens #16281 ### Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298). The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer. An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662. This would have removed numpy, and improved flexibility for using different models in the future. Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403). This means we have the following options for image descriptions: 1. Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as [git-base-coco](https://huggingface.co/microsoft/git-base-coco) or [blip2](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco). 2. Wait until MS builds ARM64EC into C++ ONNX (blocked by microsoft/onnxruntime#15403) 3. Attempt to build our own fork of ONNX with ARM64EC 4. Build a separate ARM native installer of NVDA, offer as an alternative to allow for ARM devices to do image descriptions with numpy. 5. Release the feature on C++ without support for ARM devices. All of these options require a significant amount of work. As such, sadly this feature is not ready for a stable release. Instead this code will be moved to a feature branch, until ONNX C++ matures such as fixing microsoft/onnxruntime#15403. Additionally, ONNX C++ runtimes are only available through the experimental 2.0 version of the Windows App SDK, and requires you to build your own headers from it. I think this feature will be blocked until microsoft/onnxruntime#15403 is implemented and the 2.0 version of the Windows App SDK becomes stable. Future re-implementations should also consider using higher quality, more modern models.

Show dialog to enable image desc once if it is not enabled in setting…

c2c9c46

…s panel

tianzeshi-study requested a review from a team as a code owner November 20, 2025 08:21

tianzeshi-study requested a review from seanbudd November 20, 2025 08:21

Pre-commit auto-fix

ce8bd49

seanbudd approved these changes Nov 21, 2025

View reviewed changes

Comment thread source/gui/_localCaptioner/messageDialogs.py Outdated

Update source/gui/_localCaptioner/messageDialogs.py

d2a2ec7

seanbudd merged commit 480b087 into nvaccess:master Nov 21, 2025
4 of 5 checks passed

github-actions Bot added this to the 2026.1 milestone Nov 21, 2025

seanbudd added a commit that referenced this pull request Jan 9, 2026

Revert "Show dialog to enable image desc once if it is not enabled in…

f51d2e3

… settings panel (#19243)" This reverts commit 480b087.

seanbudd mentioned this pull request Jan 9, 2026

Revert AI image description work #19425

Merged

Uh oh!

Conversation

tianzeshi-study commented Nov 20, 2025

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CyrilleB79 commented Nov 21, 2025

Uh oh!

tianzeshi-study commented Nov 21, 2025

Uh oh!

seanbudd commented Nov 25, 2025

Uh oh!

tianzeshi-study commented Nov 26, 2025

Uh oh!

tianzeshi-study commented Nov 26, 2025

Uh oh!

CyrilleB79 commented Nov 26, 2025

Uh oh!

tianzeshi-study commented Nov 26, 2025

Uh oh!

CyrilleB79 commented Nov 26, 2025

Uh oh!

tianzeshi-study commented Nov 26, 2025

Uh oh!

cary-rowen commented Nov 26, 2025

Uh oh!

tianzeshi-study commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants