Support image descriptions using local AI model by tianzeshi-study · Pull Request #18475 · nvaccess/nvda

tianzeshi-study · 2025-07-15T11:38:36Z

Link to issue number:

Resolves #16281

Summary of the issue:

NVDA currently lacks a built‑in, offline image captioning feature. Existing solutions require a reliable internet connection—raising privacy concerns, potential costs, and latency—and many NVDA users (especially in developing regions or on older hardware) have limited connectivity or constrained resources. There is no robust, integrated offline alternative.

Description of user facing changes:

Introduces device‑side image description directly within NVDA, requiring no cloud service.
Adds three global commands (with default shortcuts):
- --NVDA+Windows+,--: Generate a caption for the current image under focus.
- --NVDA+Windows+Shift+,--: Release the loaded model and free memory.
- --NVDA+Windows+Ctrl+,--: Open the Model Manager GUI to download or manage models.
Extends NVDA’s settings panel to enable/disable offline captioning and configure model paths.

Description of developer facing changes:

New _localCaptioner module containing:
- captioner.py: Core inference engine exposing generate_caption(image) for producing text descriptions.
- panel.py: NVDA settings integration (lazy or on‑startup model loading, custom path).
- modelDownloader.py: CLI tool to download ONNX models.
- modelManager.py: GUI for selecting download paths and managing available models.
Uses the Hugging Face Xenova/vit-gpt2-image-captioning model in ONNX format (via onnxruntime) to balance accuracy, speed, and low resource usage.
Modular design allows for future extension to additional models or formats.

Description of development approach:

--Modular integration--: Keeps _localCaptioner self‑contained and compatible with NVDA’s plugin architecture.
--Lightweight inference--: Leverages ONNXRuntime for fast, local inference without heavy PyTorch or TensorFlow dependencies.
--Lazy loading--: Model is only loaded when first invoked (or at startup, if configured), minimizing initial memory footprint.
--Dual interfaces--: Provides both CLI scripts (captioner.py, modelDownloader.py) for quick tests and a GUI (modelManager.py) for end‑users.
--Extensible architecture--: Configuration files (e.g., config.json) conform to Hugging Face format for easy swapping of models.

Testing strategy:

--Manual/CLI tests--:
- Activated in a .venv, running python captioner.py to verify caption generation.
- Running python modelDownloader.py and python modelManager.py to validate download and GUI workflows.
--Shortcut verification--: Ensured all three default keybindings trigger the expected actions.
--Performance--: Measured caption generation time < 5 seconds on representative hardware.
--Resource cleanup--: Confirmed that releasing the model frees allocated memory.
--Compatibility--: Tested on Windows with limited CPU/RAM configurations to mimic low‑end hardware.

Known issues with pull request:

Captions are currently generated in English only; multi‑language support may be added later.
First‑time model download requires an active internet connection.Generate image caption need to download models first, may cause difficulty in unit test and system test.
In some complex UI contexts, NVDA may not correctly identify the target image for captioning.

Code Review Checklist:

@coderabbitai summary

… and triggering via keyboard shortcut

hwf1324 · 2025-07-15T14:35:08Z

Welcome to NVDA!

We are glad to see your contribution to NVDA.

When I tried to build the launcher to compare the file size of the launcher bundled with ONNXRuntime to the previous launcher, I found that a portable version could be created from this launcher. However, I encountered an error when starting the created portable version. Below is the log.

INFO - __main__ (22:34:47.643) - MainThread (21824):
Starting NVDA version source-18475-b4a029d x86
INFO - core.main (22:34:47.730) - MainThread (21824):
Config dir: D:\NVDA\snapshot\pr\18475\userConfig
INFO - config.ConfigManager._loadConfig (22:34:47.733) - MainThread (21824):
Loading config: D:\NVDA\snapshot\pr\18475\userConfig\nvda.ini
INFO - core.main (22:34:48.223) - MainThread (21824):
Windows version: Windows 11 24H2 (10.0.26100.4652) workstation AMD64
INFO - core.main (22:34:48.223) - MainThread (21824):
Using Python version 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:00:00) [MSC v.1938 32 bit (Intel)]
INFO - core.main (22:34:48.223) - MainThread (21824):
Using comtypes version 1.4.6
INFO - core.main (22:34:48.225) - MainThread (21824):
Using configobj version 5.1.0 with validate version 1.0.1
ERROR - braille.getDisplayDrivers (22:34:48.286) - MainThread (21824):
Error while importing braille display driver alva
Traceback (most recent call last):
  File "braille.pyc", line 3869, in getDisplayDrivers
  File "braille.pyc", line 470, in _getDisplayDriver
  File "braille.pyc", line 464, in _getDisplayDriver
  File "importlib\__init__.pyc", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "brailleDisplayDrivers\alva.pyc", line 16, in <module>
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - braille.getDisplayDrivers (22:34:48.306) - MainThread (21824):
Error while importing braille display driver eurobraille
Traceback (most recent call last):
  File "braille.pyc", line 3869, in getDisplayDrivers
  File "braille.pyc", line 470, in _getDisplayDriver
  File "braille.pyc", line 464, in _getDisplayDriver
  File "importlib\__init__.pyc", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "brailleDisplayDrivers\eurobraille\__init__.pyc", line 9, in <module>
  File "brailleDisplayDrivers\eurobraille\driver.pyc", line 20, in <module>
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - braille.getDisplayDrivers (22:34:48.321) - MainThread (21824):
Error while importing braille display driver handyTech
Traceback (most recent call last):
  File "braille.pyc", line 3869, in getDisplayDrivers
  File "braille.pyc", line 470, in _getDisplayDriver
  File "braille.pyc", line 464, in _getDisplayDriver
  File "importlib\__init__.pyc", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "brailleDisplayDrivers\handyTech.pyc", line 29, in <module>
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
INFO - NVDAHelperLocal (22:34:48.348) - MainThread (21824):
Thread 21824, build\x86\localWin10\oneCoreSpeech.cpp, ocSpeech_initialize, 215:
ocSpeech_initialize

INFO - NVDAHelperLocal (22:34:48.348) - MainThread (21824):
Thread 21824, build\x86\localWin10\oneCoreSpeech.cpp, OcSpeechState::activate, 89:
Activating

INFO - NVDAHelperLocal (22:34:48.407) - MainThread (21824):
Thread 21824, build\x86\localWin10\oneCoreSpeech.cpp, preventEndUtteranceSilence_, 443:
AppendedSilence supported

INFO - synthDriverHandler.setSynth (22:34:48.416) - MainThread (21824):
Loaded synthDriver oneCore
WARNING - mathPres.initialize (22:34:48.421) - MainThread (21824):
MathPlayer 4 not available
INFO - core._setUpWxApp (22:34:48.421) - MainThread (21824):
Using wx version 4.2.2 msw (phoenix) wxWidgets 3.2.6 with six version 1.17.0
INFO - brailleInput.initialize (22:34:48.422) - MainThread (21824):
Braille input initialized
INFO - braille.initialize (22:34:48.422) - MainThread (21824):
Using liblouis version 3.34.0
INFO - braille.initialize (22:34:48.422) - MainThread (21824):
Using pySerial version 3.5
INFO - braille.BrailleHandler._setDisplay (22:34:48.425) - MainThread (21824):
Loaded braille display driver 'noBraille', current display has 0 cells.
INFO - core.main (22:34:48.649) - MainThread (21824):
Java Access Bridge support initialized
INFO - UIAHandler.UIAHandler.MTAThreadFunc (22:34:48.882) - UIAHandler.UIAHandler.MTAThread (30568):
UIAutomation: IUIAutomation6
CRITICAL - __main__ (22:34:49.105) - MainThread (21824):
core failure
Traceback (most recent call last):
  File "nvda.pyw", line 309, in <module>
  File "core.pyc", line 911, in main
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - keyboardHandler.internal_keyDownEvent (22:34:49.111) - winInputHook (31268):
internal_keyDownEvent
Traceback (most recent call last):
  File "keyboardHandler.pyc", line 276, in internal_keyDownEvent
  File "inputCore.pyc", line 529, in executeGesture
  File "baseObject.pyc", line 59, in __get__
  File "baseObject.pyc", line 167, in _getPropertyViaCache
  File "inputCore.pyc", line 182, in _get_script
  File "scriptHandler.pyc", line 112, in findScript
  File "scriptHandler.pyc", line 125, in _findScript
  File "scriptHandler.pyc", line 183, in _yieldObjectsForFindScript
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - keyboardHandler.internal_keyDownEvent (22:34:49.123) - winInputHook (31268):
internal_keyDownEvent
Traceback (most recent call last):
  File "keyboardHandler.pyc", line 276, in internal_keyDownEvent
  File "inputCore.pyc", line 529, in executeGesture
  File "baseObject.pyc", line 59, in __get__
  File "baseObject.pyc", line 167, in _getPropertyViaCache
  File "inputCore.pyc", line 182, in _get_script
  File "scriptHandler.pyc", line 112, in findScript
  File "scriptHandler.pyc", line 125, in _findScript
  File "scriptHandler.pyc", line 183, in _yieldObjectsForFindScript
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'

AppVeyorBot · 2025-07-15T14:49:45Z

FAIL: Translation comments check. Translation comments missing or unexpectedly included. See build log for more information.
PASS: License check.
PASS: Unit tests.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/ilrmf24qtlap4bcp/artifacts/output/nvda_snapshot_pr18475-37348,99389e13.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.9,
INSTALL_END 1.2,
BUILD_START 0.0,
BUILD_END 28.9,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 1.3,
FINISH_END 0.1

See test results for failed build of commit 99389e1305

tianzeshi-study · 2025-07-15T15:29:08Z

Welcome to NVDA!

We are glad to see your contribution to NVDA.

When I tried to build the launcher to compare the file size of the launcher bundled with ONNXRuntime to the previous launcher, I found that a portable version could be created from this launcher. However, I encountered an error when starting the created portable version. Below is the log.

INFO - __main__ (22:34:47.643) - MainThread (21824):
Starting NVDA version source-18475-b4a029d x86
INFO - core.main (22:34:47.730) - MainThread (21824):
Config dir: D:\NVDA\snapshot\pr\18475\userConfig
INFO - config.ConfigManager._loadConfig (22:34:47.733) - MainThread (21824):
Loading config: D:\NVDA\snapshot\pr\18475\userConfig\nvda.ini
INFO - core.main (22:34:48.223) - MainThread (21824):
Windows version: Windows 11 24H2 (10.0.26100.4652) workstation AMD64
INFO - core.main (22:34:48.223) - MainThread (21824):
Using Python version 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:00:00) [MSC v.1938 32 bit (Intel)]
INFO - core.main (22:34:48.223) - MainThread (21824):
Using comtypes version 1.4.6
INFO - core.main (22:34:48.225) - MainThread (21824):
Using configobj version 5.1.0 with validate version 1.0.1
ERROR - braille.getDisplayDrivers (22:34:48.286) - MainThread (21824):
Error while importing braille display driver alva
Traceback (most recent call last):
  File "braille.pyc", line 3869, in getDisplayDrivers
  File "braille.pyc", line 470, in _getDisplayDriver
  File "braille.pyc", line 464, in _getDisplayDriver
  File "importlib\__init__.pyc", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "brailleDisplayDrivers\alva.pyc", line 16, in <module>
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - braille.getDisplayDrivers (22:34:48.306) - MainThread (21824):
Error while importing braille display driver eurobraille
Traceback (most recent call last):
  File "braille.pyc", line 3869, in getDisplayDrivers
  File "braille.pyc", line 470, in _getDisplayDriver
  File "braille.pyc", line 464, in _getDisplayDriver
  File "importlib\__init__.pyc", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "brailleDisplayDrivers\eurobraille\__init__.pyc", line 9, in <module>
  File "brailleDisplayDrivers\eurobraille\driver.pyc", line 20, in <module>
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - braille.getDisplayDrivers (22:34:48.321) - MainThread (21824):
Error while importing braille display driver handyTech
Traceback (most recent call last):
  File "braille.pyc", line 3869, in getDisplayDrivers
  File "braille.pyc", line 470, in _getDisplayDriver
  File "braille.pyc", line 464, in _getDisplayDriver
  File "importlib\__init__.pyc", line 126, in import_module
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "brailleDisplayDrivers\handyTech.pyc", line 29, in <module>
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
INFO - NVDAHelperLocal (22:34:48.348) - MainThread (21824):
Thread 21824, build\x86\localWin10\oneCoreSpeech.cpp, ocSpeech_initialize, 215:
ocSpeech_initialize

INFO - NVDAHelperLocal (22:34:48.348) - MainThread (21824):
Thread 21824, build\x86\localWin10\oneCoreSpeech.cpp, OcSpeechState::activate, 89:
Activating

INFO - NVDAHelperLocal (22:34:48.407) - MainThread (21824):
Thread 21824, build\x86\localWin10\oneCoreSpeech.cpp, preventEndUtteranceSilence_, 443:
AppendedSilence supported

INFO - synthDriverHandler.setSynth (22:34:48.416) - MainThread (21824):
Loaded synthDriver oneCore
WARNING - mathPres.initialize (22:34:48.421) - MainThread (21824):
MathPlayer 4 not available
INFO - core._setUpWxApp (22:34:48.421) - MainThread (21824):
Using wx version 4.2.2 msw (phoenix) wxWidgets 3.2.6 with six version 1.17.0
INFO - brailleInput.initialize (22:34:48.422) - MainThread (21824):
Braille input initialized
INFO - braille.initialize (22:34:48.422) - MainThread (21824):
Using liblouis version 3.34.0
INFO - braille.initialize (22:34:48.422) - MainThread (21824):
Using pySerial version 3.5
INFO - braille.BrailleHandler._setDisplay (22:34:48.425) - MainThread (21824):
Loaded braille display driver 'noBraille', current display has 0 cells.
INFO - core.main (22:34:48.649) - MainThread (21824):
Java Access Bridge support initialized
INFO - UIAHandler.UIAHandler.MTAThreadFunc (22:34:48.882) - UIAHandler.UIAHandler.MTAThread (30568):
UIAutomation: IUIAutomation6
CRITICAL - __main__ (22:34:49.105) - MainThread (21824):
core failure
Traceback (most recent call last):
  File "nvda.pyw", line 309, in <module>
  File "core.pyc", line 911, in main
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - keyboardHandler.internal_keyDownEvent (22:34:49.111) - winInputHook (31268):
internal_keyDownEvent
Traceback (most recent call last):
  File "keyboardHandler.pyc", line 276, in internal_keyDownEvent
  File "inputCore.pyc", line 529, in executeGesture
  File "baseObject.pyc", line 59, in __get__
  File "baseObject.pyc", line 167, in _getPropertyViaCache
  File "inputCore.pyc", line 182, in _get_script
  File "scriptHandler.pyc", line 112, in findScript
  File "scriptHandler.pyc", line 125, in _findScript
  File "scriptHandler.pyc", line 183, in _yieldObjectsForFindScript
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'
ERROR - keyboardHandler.internal_keyDownEvent (22:34:49.123) - winInputHook (31268):
internal_keyDownEvent
Traceback (most recent call last):
  File "keyboardHandler.pyc", line 276, in internal_keyDownEvent
  File "inputCore.pyc", line 529, in executeGesture
  File "baseObject.pyc", line 59, in __get__
  File "baseObject.pyc", line 167, in _getPropertyViaCache
  File "inputCore.pyc", line 182, in _get_script
  File "scriptHandler.pyc", line 112, in findScript
  File "scriptHandler.pyc", line 125, in _findScript
  File "scriptHandler.pyc", line 183, in _yieldObjectsForFindScript
  File "globalCommands.pyc", line 75, in <module>
  File "_localCaptioner\__init__.pyc", line 34, in <module>
  File "_localCaptioner\captioner.pyc", line 22, in <module>
ModuleNotFoundError: No module named 'numpy'

It seems that portable version can not find numpy as python dependency. That's a bit confusing, because onnxruntime Depend on numpy and will automate install it as dependency.
Maybe portable version has some difference, I'll try to find out what the problem is. Maybe I should directly add numpy to pyproject.toml as project dependency.
Thank you for your reply!

hwf1324 · 2025-07-15T16:03:41Z

Maybe portable version has some difference, I'll try to find out what the problem is. Maybe I should directly add numpy to pyproject.toml as project dependency.

No, this is related to the py2exe script. NVDA excludes numpy in setup.py.

tianzeshi-study · 2025-07-15T16:13:32Z

Maybe portable version has some difference, I'll try to find out what the problem is. Maybe I should directly add numpy to pyproject.toml as project dependency.

No, this is related to the py2exe script. NVDA excludes numpy in setup.py.

yes you are right, I see it : # numpy is an optional dependency of comtypes but we don't require it.
Maybe I will remove this exclusion in next commit.

cary-rowen · 2025-07-16T03:02:33Z

NVDA+Windows+Shift+,--: Release the loaded model and free memory.

Need to release manually by the user I don't think this is a good experience. Can this be an automatic recycling design in the background?

NVDA+Windows+Ctrl+,--: Open the Model Manager GUI to download or manage models.

This is not a high-frequency operation, please do not assign gestures by default.

tianzeshi-study · 2025-07-16T05:34:12Z

Need to release manually by the user I don't think this is a good experience. Can this be an automatic recycling design in the background?

This is because keeping the model in memory reduces the time it takes for each recognition without loading the model every time. Manually releasing the model is just a way to balance memory footprint and recognition speed, otherwise it may require internal maintenance of a timer to automatic released model after a period of time. However, the time for the user to recognize is unknown.

This is not a high-frequency operation, please do not assign gestures by default.

yes, you are right, It seems that need to add a button to open Model Manager in the settings panel instead of keyboard shortcut

coding Standards Co-authored-by: Sean Budd <seanbudd123@gmail.com>

No need to test on command line Co-authored-by: Sean Budd <seanbudd123@gmail.com>

gettext formatting Co-authored-by: Sean Budd <seanbudd123@gmail.com>

No need to test on command line Co-authored-by: Sean Budd <seanbudd123@gmail.com>

AppVeyorBot · 2025-07-16T07:12:32Z

FAIL: Translation comments check. Translation comments missing or unexpectedly included. See build log for more information.
PASS: License check.
PASS: Unit tests.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/wh2a3cg5j26v7l67/artifacts/output/nvda_snapshot_pr18475-37378,94f89120.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.5,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 21.7,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 1.0,
FINISH_END 0.1

See test results for failed build of commit 94f89120cc

…nd downloader; replace optional type to follow coding standards; improve some comment

AppVeyorBot · 2025-07-16T12:13:23Z

FAIL: Translation comments check. Translation comments missing or unexpectedly included. See build log for more information.
PASS: License check.
PASS: Unit tests.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/y0ot19yyxbeoail6/artifacts/output/nvda_snapshot_pr18475-37387,0200bcbd.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 2.0,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 25.8,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 1.3,
FINISH_END 0.1

See test results for failed build of commit 0200bcbd56

…nloader log

seanbudd

Thanks and congrats!

seanbudd · 2025-09-15T01:00:25Z

Re-opening to trigger a build against the 64bit only branch

Qchristensen

Looks good and will be of great interest to many users

This reverts commit e1cef07.

Reverts: - #18475 - #19036 - #19024 - #19055 - #19057 - #19178 - #19243 - #19327 - Partial revert: #19342 ### Issues fixed Fixes #19298 ### Issues reopened Reopens #16281 ### Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298). The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer. An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662. This would have removed numpy, and improved flexibility for using different models in the future. Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403). This means we have the following options for image descriptions: 1. Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as [git-base-coco](https://huggingface.co/microsoft/git-base-coco) or [blip2](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco). 2. Wait until MS builds ARM64EC into C++ ONNX (blocked by microsoft/onnxruntime#15403) 3. Attempt to build our own fork of ONNX with ARM64EC 4. Build a separate ARM native installer of NVDA, offer as an alternative to allow for ARM devices to do image descriptions with numpy. 5. Release the feature on C++ without support for ARM devices. All of these options require a significant amount of work. As such, sadly this feature is not ready for a stable release. Instead this code will be moved to a feature branch, until ONNX C++ matures such as fixing microsoft/onnxruntime#15403. Additionally, ONNX C++ runtimes are only available through the experimental 2.0 version of the Windows App SDK, and requires you to build your own headers from it. I think this feature will be blocked until microsoft/onnxruntime#15403 is implemented and the 2.0 version of the Windows App SDK becomes stable. Future re-implementations should also consider using higher quality, more modern models.

tianzeshi-study and others added 2 commits July 15, 2025 17:39

generate image caption use local AI model, supports model downloading…

9259e3e

… and triggering via keyboard shortcut

Pre-commit auto-fix

b4a029d

hwf1324 reviewed Jul 15, 2025

View reviewed changes

Comment thread pyproject.toml

seanbudd changed the title ~~generate image caption use local AI model, supports model downloadin…~~ Support image descriptions using local AI model Jul 16, 2025

seanbudd reviewed Jul 16, 2025

View reviewed changes

Comment thread source/_localCaptioner/modelManager.py Outdated

seanbudd reviewed Jul 16, 2025

View reviewed changes

Comment thread source/_localCaptioner/modelManager.py Outdated

Comment thread source/_localCaptioner/modelDownloader.py Outdated

Comment thread source/_localCaptioner/modelManager.py Outdated

josephsl reviewed Jul 16, 2025

View reviewed changes

Comment thread source/_localCaptioner/panel.py Outdated

tianzeshi-study and others added 8 commits July 16, 2025 13:48

Update source/_localCaptioner/panel.py

1caa3c3

coding Standards Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Update source/_localCaptioner/panel.py

5688f6f

coding Standards Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Pre-commit auto-fix

95d8442

Update source/_localCaptioner/modelManager.py

0afb421

No need to test on command line Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Pre-commit auto-fix

006df7f

Update source/_localCaptioner/modelDownloader.py

e3d0d32

gettext formatting Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Update source/_localCaptioner/captioner.py

26d4c20

No need to test on command line Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Pre-commit auto-fix

e94d635

tianzeshi-study and others added 3 commits July 16, 2025 16:16

correct copyright header in _localCaptioner

44e2b51

use manual test function instead of main function to test captioner a…

93d3cb9

…nd downloader; replace optional type to follow coding standards; improve some comment

Pre-commit auto-fix

7f460c8

tianzeshi-study added 2 commits July 17, 2025 23:40

integrate local captioner settings pannel into core gui

a3a0740

improve local model manager GUI text formatting and the way model dow…

3100ccc

…nloader log

seanbudd marked this pull request as draft September 12, 2025 05:21

tianzeshi-study and others added 3 commits September 12, 2025 16:07

improve type hint; add numpy as pinned dependency

aa69aad

Merge branch 'master' into captionUseLocalModel

2dc3629

Pre-commit auto-fix

c64864a

tianzeshi-study marked this pull request as ready for review September 12, 2025 08:29

seanbudd approved these changes Sep 15, 2025

View reviewed changes

Comment thread pyproject.toml Outdated

Update pyproject.toml

832b3b4

seanbudd changed the base branch from master to try-64bit September 15, 2025 00:57

seanbudd force-pushed the try-64bit branch from 2fdc220 to 58dd147 Compare September 15, 2025 01:00

seanbudd closed this Sep 15, 2025

seanbudd reopened this Sep 15, 2025

seanbudd added 2 commits September 15, 2025 11:10

Merge branch 'try-64bit' into captionUseLocalModel

978068a

Merge branch 'try-64bit' into captionUseLocalModel

1cad450

seanbudd reviewed Sep 15, 2025

View reviewed changes

Comment thread tests/system/robot/automatedImageDescriptions.py Outdated

Update tests/system/robot/automatedImageDescriptions.py

60f2ad0

seanbudd reviewed Sep 15, 2025

View reviewed changes

Comment thread tests/system/robot/automatedImageDescriptions.py Outdated

Update tests/system/robot/automatedImageDescriptions.py

137f964

seanbudd reviewed Sep 15, 2025

View reviewed changes

Comment thread tests/system/robot/automatedImageDescriptions.py Outdated

seanbudd and others added 3 commits September 15, 2025 14:14

Update tests/system/robot/automatedImageDescriptions.py

5c8eba1

minor fixes

5a9a8f7

fix up sys test

676182a

Qchristensen approved these changes Sep 15, 2025

View reviewed changes

Comment thread user_docs/en/changes.md Outdated

seanbudd merged commit e1cef07 into nvaccess:try-64bit Sep 15, 2025
23 checks passed

github-actions Bot added this to the 2026.1 milestone Sep 15, 2025

seanbudd mentioned this pull request Oct 10, 2025

Update all dependencies for 2026.1 #18681

Closed

15 tasks

wmhn1872265132 mentioned this pull request Oct 11, 2025

improve the input help message and gesture for the image descriptions #19088

Merged

5 tasks

seanbudd added a commit that referenced this pull request Jan 9, 2026

Revert "Support image descriptions using local AI model (#18475)"

3ec57b2

This reverts commit e1cef07.

seanbudd mentioned this pull request Jan 9, 2026

Revert AI image description work #19425

Merged

Uh oh!

Conversation

tianzeshi-study commented Jul 15, 2025 • edited by seanbudd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

Uh oh!

hwf1324 commented Jul 15, 2025

Uh oh!

AppVeyorBot commented Jul 15, 2025

Uh oh!

tianzeshi-study commented Jul 15, 2025

Uh oh!

hwf1324 commented Jul 15, 2025

Uh oh!

tianzeshi-study commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cary-rowen commented Jul 16, 2025

Uh oh!

tianzeshi-study commented Jul 16, 2025

Uh oh!

AppVeyorBot commented Jul 16, 2025

Uh oh!

AppVeyorBot commented Jul 16, 2025

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seanbudd commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Qchristensen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

tianzeshi-study commented Jul 15, 2025 •

edited by seanbudd

Loading