Fix issues in uniscribe code by LeonarddeR · Pull Request #18744 · nvaccess/nvda

LeonarddeR · 2025-08-18T16:34:10Z

Link to issue number:

Closes #18722
Reverts 021bcd8

Summary of the issue:

The code that split a string at character boundaries discarded characters like \u0301 when such a character was the only character in a string.

Description of user facing changes:

When unicode normalization is on, navigating by character will again correctly announce characters like acute (\u0301)

Description of developer facing changes:

NVDAHelper local calculateCharacterBoundaries now needs an offsets array that is textLength + 1 in size, rather than only the text length. This is to store the end offset of the last character.
This reverts 021bcd8, therefore we don't pass additional extraneous alphanumeric characters to uniscribe to make it happy.

Description of development approach:

While debugging this issue, I found several issues in NVDAHelper local textUtils _getLogAttrArray function that is the base of the several calculation functions:

The buffer passed to ScriptItemize was too small when a string only contained one character. The buffer should have space for at least two SCRIPT_ITEM structures in size. This is my hypothesis about the necessity of 021bcd8, namely the addition of two alphanumeric characters passed to the uniscribe methods. While there are no exactly known str to reproduce the issue behind 021bcd8, I tried several combinations of shorter and longer strings without alpha numeric characters, and I couldn't reproduce any of the behavior described in it after my changes to the c++ code. Therefore I reverted 021bcd8, as there is enough time to test this in Alpha thoroughly.
Calculation of character offsets would never set the end offset when requesting the offsets of the last character in the string, since the end offset calculation starts at offset + 1 and offset + 1 is equal to textLength.
Failed calls of ScriptItemize and ScriptBreak were never logged. ScriptItemize definitely failed when passing a string with only one character in it because the expected buffer was to small.

Testing strategy:

Unit tests
System tests
STR from Combining diacritics are not read when typing/navigating if Unicode normalization is enabled #18722

Known issues with pull request:

The call of ScriptBreak seems wrong when a string contains two or more items generated by ScriptItemize. This seems so because the iCharLength variable is never limited to the length of the item, but rather extended to the length of the string every time. However, if we reset iCharLength and call ScriptBreak for only the text that belongs to the current item, we get word navigation behavior that diverges from the one in Notepad. Therefore I left this as is and documented it in code in case someone stumbles upon this again in the future.

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

@coderabbitai summary

Copilot

Pull Request Overview

This PR fixes issues in the uniscribe code related to character boundary detection that was causing characters like \u0301 (combining acute accent) to be discarded when they were the only character in a string. The fix removes a workaround that appended "xx" to strings and corrects several bugs in the underlying NVDAHelper code.

Key changes:

Removes the "xx" padding workaround from uniscribe text processing
Fixes buffer sizing and error handling in NVDAHelper's _getLogAttrArray function
Corrects character boundary calculation to properly handle string end positions

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
source/textUtils/uniscribe.py	Removes "xx" padding workaround and adjusts buffer size for character boundaries
source/textInfos/offsets.py	Removes "xx" padding from uniscribe offset calculations
nvdaHelper/local/textUtils.cpp	Fixes ScriptItemize buffer sizing, adds error logging, and corrects character boundary detection

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.}

seanbudd · 2025-08-19T01:29:06Z

Hi @LeonarddeR - this is much too risky for a beta, and the issue is not a recent regression.

seanbudd

Thanks @LeonarddeR - generally looks good to me

LeonarddeR · 2025-08-19T06:31:26Z

The issue is not a recent regression.

Fair point.

Note that I intended to revert 021bcd8 in a follow up, but since this is now targeting master anyway, it makes sense to do it in one wave.

Link to issue number: Related to #18744 Follow up of #18348 Summary of the issue: When working on #18744, I again stumbled upon the fact that the story length reported by EditTextInfo is too high. This was discovered after system test failures caused by #18348, i.e. reporting of an extraneous blank at the end of the document when using NVDA review cursor. Note, this is not a bug in #18348 and was accounted for in #18744 after the test failure. To reproduce 1. Open notepad classic 2. type `a` 3. Position the caret at the end of the line 4. Press nvda+delete or nvda+numpad delete Observe that NVDA reports 50%, not 100% Observe that `nav.makeTextInfo("caret")._getStoryLength()` reports 2, not 1 Description of user facing changes: None Description of developer facing changes: None Description of development approach: Fixed the off by one errors by removing the `+ 1` cases. Testing strategy: Tested str from this pr. Known issues with pull request: None

) fixes #18912 ### Summary of the issue: #18744 fixed up issues with uniscribe. Since that change, NVDAObjectTextInfo properly allows moving offset past end, however NVDAObjectTextInfo is not supposed to do so as it has no insertion point anyway. ### Description of user facing changes: NVDA no longer reports blank at the end of NVDA Objext text info text review, such as for list items. ### Description of developer facing changes: Since `allowMoveToUnitOffsetPastEnd` is now implemented on NVDAObjectTextInfo, I had to swap two classes in the inheritance hiërarchie for a unit tests class. ### Description of development approach: Handle NVDAObjectTextInfo as a text info without insertion point, similar to virtual buffers. In other words, don't allow moving offset past end, as there's no point to do so. ### Testing strategy: Tested str from #18912 ### Known issues with pull request: None known

Copilot AI review requested due to automatic review settings August 18, 2025 16:34

LeonarddeR requested a review from a team as a code owner August 18, 2025 16:34

LeonarddeR requested a review from seanbudd August 18, 2025 16:34

Copilot AI reviewed Aug 18, 2025

View reviewed changes

Comment thread nvdaHelper/local/textUtils.cpp Outdated

seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Aug 18, 2025

seanbudd changed the base branch from beta to master August 19, 2025 01:29

seanbudd added the merge-early Merge Early in a developer cycle label Aug 19, 2025

seanbudd approved these changes Aug 19, 2025

View reviewed changes

Comment thread user_docs/en/changes.md Outdated

Comment thread nvdaHelper/local/textUtils.cpp

Comment thread nvdaHelper/local/textUtils.cpp Outdated

This comment was marked as off-topic.

Sign in to view

LeonarddeR added 4 commits August 19, 2025 07:52

Fix splitAtCharacterBoundaries

9f1bb22

Fix character navigation

58d66ac

Update changes

9ec1250

Fix whitespace

b3372fa

LeonarddeR force-pushed the normFix2 branch from e3530b0 to b3372fa Compare August 19, 2025 06:27

LeonarddeR added 2 commits August 19, 2025 08:49

Merge branch 'master' into normFix2

4a2f32f

Restore relEnd

9660cba

seanbudd merged commit 92f345e into nvaccess:master Aug 20, 2025
22 checks passed

github-actions Bot added this to the 2026.1 milestone Aug 20, 2025

LeonarddeR mentioned this pull request Aug 20, 2025

Fix off by one text length in EditTextInfo #18767

Merged

5 tasks

LeonarddeR mentioned this pull request Sep 14, 2025

Using text review, when navigating by word, unexpected blank item encountered at the end #18912

Closed

LeonarddeR mentioned this pull request Jan 29, 2026

No longer allow moving to offset past end for NVDAObjectTextInfo #19531

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issues in uniscribe code#18744

Fix issues in uniscribe code#18744
seanbudd merged 6 commits into
nvaccess:masterfrom
LeonarddeR:normFix2

LeonarddeR commented Aug 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

seanbudd commented Aug 19, 2025

Uh oh!

seanbudd left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

LeonarddeR commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

LeonarddeR commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

seanbudd commented Aug 19, 2025

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

LeonarddeR commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LeonarddeR commented Aug 18, 2025 •

edited

Loading