Show dialog to enable image desc once if it is not enabled in settings panel#19243
Conversation
|
Why this choice of a per-session validity? It's quite strange to validate the dialog box, and then, when you open the settings, the checkbox remains unchecked. It's worth noting that the fact that this parameter is temporary (only for the session) is not mentioned in the new enabling dialog box, what creates more confusion. The only other parameter that I know which has both a temporary value and a persistent config value (provided you save the config) is the screen curtain. I already find this quite confusing for the screen curtain. But for AI description, I find it more confusing and even useless. I'd recommend that we drop this temporary setting and that pressing yes just enable the AI description feature in settings. This UX would be similar to Windows History clipboard for example. If you disagree with me, could you please provide the different use cases / user stories for:
Thanks. |
I think the configuration should only be modified in the settings panel. |
|
@tianzeshi-study - do our keyboard gestures set the config? I would assume they would |
No, they don't. They are also session-only. The config can only be set in settings panel. |
|
My initial thought was that the setting in the Settings panel should explicitly say “Enable image descriptions on startup,” rather than the more ambiguous “Enable image descriptions.” And “Enable image descriptions” should apply only to the current session by default, without modifying the configuration. |
|
With this feature, there are 2 points where an information from the user is required:
This PR was addressing the second point. IMO for this, there was no need to have a dialog which just makes the UX less easy. Given that, my suggestion for the best UX is:
|
For some advanced users, it may be necessary to unload the model when memory usage becomes too high, in order to ensure that NVDA continues running smoothly without stuttering. Without a toggle script, NVDA’s memory usage would remain permanently high. |
I do not understand the use case. Moreover if it's mainly for advanced users, does it mean that non-advanced user will never bother to unload the model from memory and thus, experience NVDA stuttering? Again, unloading software components from memory is something very unusual that I have never seen in any other application. If it's really needed, this should be at least better documented. |
Just like closing a large Microsoft Word document or shutting down a browser tab that is causing the system to lag. Ordinary users may also need to learn to manage memory usage manually to avoid potential lag. They generally understand that when their computer becomes slow, they need to close something. |
|
I completely agree with @CyrilleB79 The current design of the merged Pull Request is fundamentally flawed in two main areas: User Experience (UX) and Design Philosophy.
|
First, no one is being forced to unload the model manually — it’s an optional feature, and it doesn’t even have a gesture assigned. |
Reverts: - #18475 - #19036 - #19024 - #19055 - #19057 - #19178 - #19243 - #19327 - Partial revert: #19342 ### Issues fixed Fixes #19298 ### Issues reopened Reopens #16281 ### Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298). The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer. An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662. This would have removed numpy, and improved flexibility for using different models in the future. Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403). This means we have the following options for image descriptions: 1. Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as [git-base-coco](https://huggingface.co/microsoft/git-base-coco) or [blip2](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco). 2. Wait until MS builds ARM64EC into C++ ONNX (blocked by microsoft/onnxruntime#15403) 3. Attempt to build our own fork of ONNX with ARM64EC 4. Build a separate ARM native installer of NVDA, offer as an alternative to allow for ARM devices to do image descriptions with numpy. 5. Release the feature on C++ without support for ARM devices. All of these options require a significant amount of work. As such, sadly this feature is not ready for a stable release. Instead this code will be moved to a feature branch, until ONNX C++ matures such as fixing microsoft/onnxruntime#15403. Additionally, ONNX C++ runtimes are only available through the experimental 2.0 version of the Windows App SDK, and requires you to build your own headers from it. I think this feature will be blocked until microsoft/onnxruntime#15403 is implemented and the 2.0 version of the Windows App SDK becomes stable. Future re-implementations should also consider using higher quality, more modern models.
Link to issue number:
Resolves #19097
Summary of the issue:
Users need to open settings panel to enable image description
Description of user facing changes:
If image description is not enabled in settings pannel, it will show a dialog to enable it in current session when press the keyboard shortcut
Description of developer facing changes:
Nonne
Description of development approach:
None
Testing strategy:
Disable image description in settings panel, then press the keyboard shortcut to enable it in a dialog.
Known issues with pull request:
After enabling it requires another key press for recognition, but I think this is acceptable
Code Review Checklist: