Conversation
Resolves #16281 Summary of the issue: NVDA currently lacks a built‑in, offline image captioning feature. Existing solutions require a reliable internet connection—raising privacy concerns, potential costs, and latency—and many NVDA users (especially in developing regions or on older hardware) have limited connectivity or constrained resources. There is no robust, integrated offline alternative. Description of user facing changes: Introduces device‑side image description directly within NVDA, requiring no cloud service. Adds three global commands (with default shortcuts): --NVDA+Windows+,--: Generate a caption for the current image under focus. --NVDA+Windows+Shift+,--: Release the loaded model and free memory. --NVDA+Windows+Ctrl+,--: Open the Model Manager GUI to download or manage models. Extends NVDA’s settings panel to enable/disable offline captioning and configure model paths. Description of developer facing changes: New _localCaptioner module containing: captioner.py: Core inference engine exposing generate_caption(image) for producing text descriptions. panel.py: NVDA settings integration (lazy or on‑startup model loading, custom path). modelDownloader.py: CLI tool to download ONNX models. modelManager.py: GUI for selecting download paths and managing available models. Uses the Hugging Face Xenova/vit-gpt2-image-captioning model in ONNX format (via onnxruntime) to balance accuracy, speed, and low resource usage. Modular design allows for future extension to additional models or formats. Description of development approach: --Modular integration--: Keeps _localCaptioner self‑contained and compatible with NVDA’s plugin architecture. --Lightweight inference--: Leverages ONNXRuntime for fast, local inference without heavy PyTorch or TensorFlow dependencies. --Lazy loading--: Model is only loaded when first invoked (or at startup, if configured), minimizing initial memory footprint. --Dual interfaces--: Provides both CLI scripts (captioner.py, modelDownloader.py) for quick tests and a GUI (modelManager.py) for end‑users. --Extensible architecture--: Configuration files (e.g., config.json) conform to Hugging Face format for easy swapping of models.
…18934) Summary of the issue: change button shown after a successful download from 'Yes' to 'OK' Description of user facing changes: user will see "OK" button to confirm that AI image descriptions is download successfully rather than "YES" button
…8945) Summary of the issue: Fixed an issue where image descriptions would download successfully but not automatically load enabled Description of user facing changes: Image descriptions will be automatically loaded after successful download
|
Hi, Trying to create a portable from the latest snapshot of this branch, the process fails with the following:
Running the temp copy from the installer is OK. |
|
Just realizing that it is fixed in #18927 merged in |
|
@CyrilleB79 - done |
|
Thanks @seanbudd for the new build. Please find below issues found while testing nvda_snapshot_try-64bit-52706,70587e82: The "Reports the text on the Windows clipboard" command (
|
|
Hi @CyrilleB79 - please report these as proper issues, this PR is mainly for testing the image description work, and any other code that can only go into 64bit NVDA |
There was a problem hiding this comment.
Pull Request Overview
This PR migrates NVDA builds to 64-bit only and adds a new on-device AI Image Descriptions (local captioner) feature, with documentation, configuration, tests, and CI updates.
- Switch build and CI to 64-bit only; drop x86 references.
- Introduce local image captioning: ONNX-based captioner, model downloader, settings panel, global commands, docs, and comprehensive unit/system tests.
Reviewed Changes
Copilot reviewed 27 out of 30 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| user_docs/en/userGuide.md | Update OS support notes and add user docs for AI Image Descriptions and settings. |
| user_docs/en/changes.md | Add changelog for 2026.1 including AI Image Descriptions and 64-bit requirement. |
| tests/unit/test_localCaptioner/test_downloader.py | Unit tests for model downloader behavior. |
| tests/unit/test_localCaptioner/test_captioner.py | Unit tests for ONNX captioner pipeline and configuration parsing. |
| tests/system/robot/automatedImageDescriptions.robot | Robot system test for AI image descriptions. |
| tests/system/robot/automatedImageDescriptions.py | System test helper to render an image and trigger captioning. |
| tests/system/nvdaSettingsFiles/standard-doLoadMockModel.ini | Test config to enable mock model loading. |
| tests/system/libraries/SystemTestSpy/mockModels.py | Generate mock ONNX encoder/decoder and config/vocab for tests. |
| tests/system/libraries/SystemTestSpy/configManager.py | Generate mock model files into the staged NVDA profile. |
| source/setup.py | Packaging adjustments to include numpy for local captioning. |
| source/gui/settingsDialogs.py | Add AI Image Descriptions settings panel. |
| source/gui/_localCaptioner/messageDialogs.py | Dialogs for downloading models and handling outcomes. |
| source/gui/init.py | Hook settings panel into GUI. |
| source/globalCommands.py | Add gestures for captioning and opening the captioner settings. |
| source/core.py | Initialize/terminate the local captioner at startup/shutdown. |
| source/config/configSpec.py | Add automatedImageDescriptions section and defaults. |
| source/config/init.py | Include new config section in base configuration. |
| source/_remoteClient/transport.py | Minor docstring parameter style fix. |
| source/_localCaptioner/modelDownloader.py | Multi-threaded model downloader with retries and progress. |
| source/_localCaptioner/modelConfig.py | Dataclass-based model/preprocessor configuration parsing. |
| source/_localCaptioner/imageDescriber.py | Orchestration for capturing, running captioner, and messaging. |
| source/_localCaptioner/captioner.py | ONNX Runtime-based ViT+GPT2 captioner implementation. |
| source/_localCaptioner/init.py | Module lifecycle and instance management for captioner. |
| source/NVDAState.py | Add modelsDir path to user config write paths. |
| pyproject.toml | Add onnxruntime/numpy and bump sphinx; add onnx for system tests. |
| .python-versions | Remove 32-bit Python build target. |
| .github/workflows/testAndPublish.yml | Restrict arch matrix to x64 and add imageDescriptions system test suite. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| try: | ||
| # Use a short timeout to avoid blocking indefinitely | ||
| ok, msg = future.result(timeout=1.0) | ||
| if ok: | ||
| successful.append(filePath) | ||
| log.debug(f"successful {filePath=}") | ||
| else: | ||
| failed.append(filePath) | ||
| log.debug(f"failed: {filePath} - {msg}") | ||
| except Exception as err: | ||
| failed.append(filePath) | ||
| log.debug(f"failed: {filePath} – {err}") |
There was a problem hiding this comment.
Using future.result(timeout=1.0) will mark in-progress downloads as failed after 1 second. This can cause spurious failures for large files or slow connections. Replace this loop with concurrent.futures.as_completed(futures) or call future.result() without a timeout to wait for completion.
| def ensureModelsDirectory(self) -> str: | ||
| """ | ||
| Ensure the *models* directory exists (``../../models`` relative to *basePath*). | ||
|
|
||
| :return: Absolute path of the *models* directory. | ||
| :raises OSError: When the directory cannot be created. | ||
| """ | ||
| modelsDir = os.path.abspath(config.conf["automatedImageDescriptions"]["defaultModel"]) | ||
|
|
||
| try: | ||
| Path(modelsDir).mkdir(parents=True, exist_ok=True) | ||
| except OSError as err: | ||
| raise OSError(f"Failed to create models directory {modelsDir}: {err}") from err | ||
| else: | ||
| log.debug(f"Models directory ensured: {modelsDir}") | ||
| return modelsDir |
There was a problem hiding this comment.
This creates the directory relative to the current working directory and ignores the configured models root (WritePaths.modelsDir). Build the path under the user's config directory instead, e.g. modelsDir = os.path.join(WritePaths.modelsDir, config.conf['automatedImageDescriptions']['defaultModel']). Also update the docstring which refers to a removed basePath concept.
| obj = api.getNavigatorObject() | ||
|
|
||
| # Get the object's position and size information | ||
| x, y, width, height = obj.location | ||
|
|
||
| # Create a bitmap with the same size as the object | ||
| bmp = wx.Bitmap(width, height) | ||
|
|
||
| # Create a memory device context for drawing operations on the bitmap | ||
| mem = wx.MemoryDC(bmp) | ||
|
|
||
| # Copy the specified screen region to the memory bitmap | ||
| mem.Blit(0, 0, width, height, wx.ScreenDC(), x, y) |
There was a problem hiding this comment.
Some navigator objects do not expose a location; attempting to unpack obj.location will raise. Wrap this in a try/except (e.g., AttributeError/TypeError/NotImplementedError) and report a user-friendly message (e.g., 'Object has no location') instead of raising.
| ui.message(pgettext("imageDesc", "Failed to generate description")) | ||
| log.exception("Failed to generate caption") | ||
| else: | ||
| ui.message(description) |
There was a problem hiding this comment.
ui.message is called from a background thread (captionThread). UI updates should be marshaled onto the GUI thread. Use wx.CallAfter(ui.message, ...) in both the exception and success paths.
| ui.message(pgettext("imageDesc", "Failed to generate description")) | |
| log.exception("Failed to generate caption") | |
| else: | |
| ui.message(description) | |
| wx.CallAfter(ui.message, pgettext("imageDesc", "Failed to generate description")) | |
| log.exception("Failed to generate caption") | |
| else: | |
| wx.CallAfter(ui.message, description) |
| "numpy._core._exceptions", | ||
| "numpy._core._multiarray_umath", |
There was a problem hiding this comment.
These module paths are incorrect for NumPy 2.x. Use 'numpy.core._exceptions' and 'numpy.core._multiarray_umath' (without the leading underscore package). Incorrect includes will cause import errors in the frozen build.
| "numpy._core._exceptions", | |
| "numpy._core._multiarray_umath", | |
| "numpy.core._exceptions", | |
| "numpy.core._multiarray_umath", |
| Test Setup start NVDA standard-doLoadMockModel.ini | ||
| Test Teardown default teardown |
There was a problem hiding this comment.
NVDA is started twice: once in Test Setup and again in the test's [Setup]. Remove one of these to avoid double startup interference.
| *** Test Cases *** | ||
| automatedImageDescriptions | ||
| [Documentation] Ensure that local captioner work | ||
| [Setup] start NVDA standard-doLoadMockModel.ini |
There was a problem hiding this comment.
NVDA is started twice: once in Test Setup and again in the test's [Setup]. Remove one of these to avoid double startup interference.
| [Setup] start NVDA standard-doLoadMockModel.ini |
| *** Keywords *** | ||
| default teardown | ||
| ${screenshotName}= create_preserved_test_output_filename failedTest.png | ||
| Run Keyword If Test Failed Take Screenshot ${screenShotName} |
There was a problem hiding this comment.
Variable name mismatch: you set ${screenshotName} but use ${screenShotName}. Use the same variable name in both lines to ensure screenshots are captured on failure.
| Run Keyword If Test Failed Take Screenshot ${screenShotName} | |
| Run Keyword If Test Failed Take Screenshot ${screenshotName} |
| * Press `NVDA+Windows+,` to get an AI generated image description. (#18475, @tianzeshi-study) | ||
| * This is generated locally on the device - no information is sent to the internet. | ||
| * A new unassigned command is available for quickly opening the settings dialog for local image description. (#18475) | ||
| * Another new unassigned command is available for toggle image captioning. (#18475) |
There was a problem hiding this comment.
Correct grammar: 'Another new unassigned command is available to toggle image captioning.'
| * Another new unassigned command is available for toggle image captioning. (#18475) | |
| * Another new unassigned command is available to toggle image captioning. (#18475) |
| if actualSize == 0: | ||
| return False, "Downloaded file is empty" | ||
|
|
||
| if total > 0 and actualSize != total: | ||
| return False, f"File incomplete: {actualSize}/{total} bytes downloaded" | ||
|
|
||
| # Final progress callback | ||
| if progressCallback and not self.cancelRequested: | ||
| progressCallback(fileName, actualSize, max(total, actualSize), 100.0) |
There was a problem hiding this comment.
Only file size is verified. Consider adding optional checksum verification (e.g., SHA-256) against a known manifest to protect against corruption/tampering when downloading model files.
|
@tianzeshi-study congratulations! Your work is now available on NVDA alphas. Would you also mind looking into a PR to master to resolve some of CoPilots above comments? |
Ok, my pleasure. |
Follow up #18924 Fixes #19033 Fixes #19039 Summary of the issue: AI Image Descriptions turned on causes NVDA to load more slowly to some extent the image description is not shown in braille Description of user facing changes: Improved NVDA startup speed to some extent when AI image descriptions is enabled. Show image description in braille Reduced NVDA memory usage to some extent. Description of developer facing changes: Improve grammar and variable name Show image description message in main thread Load image descriptioner in background
Link to issue number:
Part of #16304
Summary of the issue:
We are migrating to 64bit NVDA in 2026.1
Description of user facing changes:
Switch alpha builds to 64bit
Description of developer facing changes:
Description of development approach:
Testing strategy:
Known issues with pull request:
Blocked by:
Code Review Checklist: