Improve image captioner by tianzeshi-study · Pull Request #19024 · nvaccess/nvda

tianzeshi-study · 2025-10-03T08:41:26Z

Link to issue number:

Follow up #18924
Fixes #19033
Fixes #19039

Summary of the issue:

AI Image Descriptions turned on causes NVDA to load more slowly to some extent

the image description is not shown in braille

Description of user facing changes:

Improved NVDA startup speed to some extent when AI image descriptions is enabled.

Show image description in braille

Reduced NVDA memory usage to some extent.

Description of developer facing changes:

Improve grammar and variable name

Show image description message in main thread

Load image descriptioner in background

Description of development approach:

nothing

Testing strategy:

enable AI image descriptions, restart NVDA, and observe startup times and braille displays

Known issues with pull request:

numpy sstill import when init,may cause additional memory usage

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

seanbudd · 2025-10-06T04:48:11Z

Please fill out the PR template

beqabeqa473 · 2025-10-06T07:49:09Z

Numpy should be imported only when needed. Additional 300-400 mb wasted should not be a case.

This reverts commit 121c221.

Reverts: - #18475 - #19036 - #19024 - #19055 - #19057 - #19178 - #19243 - #19327 - Partial revert: #19342 ### Issues fixed Fixes #19298 ### Issues reopened Reopens #16281 ### Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298). The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer. An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662. This would have removed numpy, and improved flexibility for using different models in the future. Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403). This means we have the following options for image descriptions: 1. Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as [git-base-coco](https://huggingface.co/microsoft/git-base-coco) or [blip2](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco). 2. Wait until MS builds ARM64EC into C++ ONNX (blocked by microsoft/onnxruntime#15403) 3. Attempt to build our own fork of ONNX with ARM64EC 4. Build a separate ARM native installer of NVDA, offer as an alternative to allow for ARM devices to do image descriptions with numpy. 5. Release the feature on C++ without support for ARM devices. All of these options require a significant amount of work. As such, sadly this feature is not ready for a stable release. Instead this code will be moved to a feature branch, until ONNX C++ matures such as fixing microsoft/onnxruntime#15403. Additionally, ONNX C++ runtimes are only available through the experimental 2.0 version of the Windows App SDK, and requires you to build your own headers from it. I think this feature will be blocked until microsoft/onnxruntime#15403 is implemented and the 2.0 version of the Windows App SDK becomes stable. Future re-implementations should also consider using higher quality, more modern models.

tianzeshi-study added 2 commits October 3, 2025 14:54

Correct grammar And variable name

2fc65c6

show UI message in main thread

1b69ab8

tianzeshi-study requested a review from a team as a code owner October 3, 2025 08:41

tianzeshi-study requested a review from seanbudd October 3, 2025 08:41

tianzeshi-study and others added 3 commits October 4, 2025 23:02

show notice in main thread; Load image descriptioner in child thread

028f7b5

lazy import onnxruntime to reduce startup time and memory usage.

8e66af4

Pre-commit auto-fix

0c238a4

This was referenced Oct 5, 2025

Having AI Image Descriptions turned on causes NVDA to load more slowly. #19033

Closed

When braille mode is set to follow cursors, the image description is not shown in braille and an error is logged #19039

Closed

seanbudd approved these changes Oct 6, 2025

View reviewed changes

Comment thread source/_localCaptioner/captioner.py

Update source/_localCaptioner/captioner.py

164c59f

seanbudd enabled auto-merge (squash) October 6, 2025 07:29

seanbudd merged commit 121c221 into nvaccess:master Oct 6, 2025
29 checks passed

github-actions Bot added this to the 2026.1 milestone Oct 6, 2025

seanbudd added a commit that referenced this pull request Jan 9, 2026

Revert "Improve image captioner (#19024)"

3e36f5e

This reverts commit 121c221.

seanbudd mentioned this pull request Jan 9, 2026

Revert AI image description work #19425

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve image captioner#19024

Improve image captioner#19024
seanbudd merged 6 commits into
nvaccess:masterfrom
tianzeshi-study:improve-image-captioner

tianzeshi-study commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

seanbudd commented Oct 6, 2025

Uh oh!

beqabeqa473 commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tianzeshi-study commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

Uh oh!

seanbudd commented Oct 6, 2025

Uh oh!

beqabeqa473 commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianzeshi-study commented Oct 3, 2025 •

edited

Loading