Lazy load heavy deps for AI image descriptions by tianzeshi-study · Pull Request #19055 · nvaccess/nvda

tianzeshi-study · 2025-10-07T05:36:08Z

Link to issue number:

Fixes #19031

Summary of the issue:

import numpy directly may cause high memory useage when AI image descriptions is disabled.

Description of user facing changes:

The memory usage caused by NumPy will only occur when the AI image descriptions is loaded.

Description of developer facing changes:

Move captioner.py into a captioner/ package to support lazy import of large libraries such as numpy and onnxruntime. This reduces startup time and memory usage to some extent.

Description of development approach:

None

Testing strategy:

Observe memory useage and start time of NVDA

Known issues with pull request:

None

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

… heavy deps

hwf1324 · 2025-10-07T06:21:28Z

How about when AI image description is switched from enabled to disabled? does the memory occupied by numpy free up? Or must NVDA be restarted?

tianzeshi-study · 2025-10-07T06:35:53Z

How about when AI image description is switched from enabled to disabled? does the memory occupied by numpy free up?

Once imported, packages remain in memory until the process restarts.

must NVDA be restarted?

Yes, I think.

seanbudd

Thanks @tianzeshi-study

This reverts commit c9b9d02.

Reverts: - #18475 - #19036 - #19024 - #19055 - #19057 - #19178 - #19243 - #19327 - Partial revert: #19342 ### Issues fixed Fixes #19298 ### Issues reopened Reopens #16281 ### Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298). The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer. An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662. This would have removed numpy, and improved flexibility for using different models in the future. Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403). This means we have the following options for image descriptions: 1. Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as [git-base-coco](https://huggingface.co/microsoft/git-base-coco) or [blip2](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco). 2. Wait until MS builds ARM64EC into C++ ONNX (blocked by microsoft/onnxruntime#15403) 3. Attempt to build our own fork of ONNX with ARM64EC 4. Build a separate ARM native installer of NVDA, offer as an alternative to allow for ARM devices to do image descriptions with numpy. 5. Release the feature on C++ without support for ARM devices. All of these options require a significant amount of work. As such, sadly this feature is not ready for a stable release. Instead this code will be moved to a feature branch, until ONNX C++ matures such as fixing microsoft/onnxruntime#15403. Additionally, ONNX C++ runtimes are only available through the experimental 2.0 version of the Windows App SDK, and requires you to build your own headers from it. I think this feature will be blocked until microsoft/onnxruntime#15403 is implemented and the 2.0 version of the Windows App SDK becomes stable. Future re-implementations should also consider using higher quality, more modern models.

refactor(captioner): split captioner.py into package for lazy loading…

64cde7d

… heavy deps

tianzeshi-study requested a review from a team as a code owner October 7, 2025 05:36

tianzeshi-study requested a review from SaschaCowley October 7, 2025 05:36

seanbudd requested review from seanbudd and removed request for SaschaCowley October 7, 2025 05:56

seanbudd approved these changes Oct 7, 2025

View reviewed changes

seanbudd merged commit c9b9d02 into nvaccess:master Oct 7, 2025
29 checks passed

github-actions Bot added this to the 2026.1 milestone Oct 7, 2025

seanbudd added a commit that referenced this pull request Jan 9, 2026

Revert "Lazy load heavy deps for AI image descriptions (#19055)"

411c667

This reverts commit c9b9d02.

seanbudd mentioned this pull request Jan 9, 2026

Revert AI image description work #19425

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lazy load heavy deps for AI image descriptions#19055

Lazy load heavy deps for AI image descriptions#19055
seanbudd merged 1 commit into
nvaccess:masterfrom
tianzeshi-study:fix-numpy-import

tianzeshi-study commented Oct 7, 2025

Uh oh!

hwf1324 commented Oct 7, 2025

Uh oh!

tianzeshi-study commented Oct 7, 2025

Uh oh!

seanbudd left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tianzeshi-study commented Oct 7, 2025

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

hwf1324 commented Oct 7, 2025

Uh oh!

tianzeshi-study commented Oct 7, 2025

Uh oh!

seanbudd left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seanbudd left a comment •

edited

Loading