Skip to content

Fix issues in uniscribe code#18744

Merged
seanbudd merged 6 commits into
nvaccess:masterfrom
LeonarddeR:normFix2
Aug 20, 2025
Merged

Fix issues in uniscribe code#18744
seanbudd merged 6 commits into
nvaccess:masterfrom
LeonarddeR:normFix2

Conversation

@LeonarddeR

@LeonarddeR LeonarddeR commented Aug 18, 2025

Copy link
Copy Markdown
Collaborator

Link to issue number:

Closes #18722 
Reverts 021bcd8

Summary of the issue:

The code that split a string at character boundaries discarded characters like \u0301 when such a character was the only character in a string.

Description of user facing changes:

When unicode normalization is on, navigating by character will again correctly announce characters like acute (\u0301)

Description of developer facing changes:

  • NVDAHelper local calculateCharacterBoundaries now needs an offsets array that is textLength + 1 in size, rather than only the text length. This is to store the end offset of the last character.
  • This reverts 021bcd8, therefore we don't pass additional extraneous alphanumeric characters to uniscribe to make it happy.

Description of development approach:

While debugging this issue, I found several issues in NVDAHelper local textUtils _getLogAttrArray function that is the base of the several calculation functions:

  1. The buffer passed to ScriptItemize was too small when a string only contained one character. The buffer should have space for at least two SCRIPT_ITEM structures in size. This is my hypothesis about the necessity of 021bcd8, namely the addition of two alphanumeric characters passed to the uniscribe methods. While there are no exactly known str to reproduce the issue behind 021bcd8, I tried several combinations of shorter and longer strings without alpha numeric characters, and I couldn't reproduce any of the behavior described in it after my changes to the c++ code. Therefore I reverted 021bcd8, as there is enough time to test this in Alpha thoroughly.
  2. Calculation of character offsets would never set the end offset when requesting the offsets of the last character in the string, since the end offset calculation starts at offset + 1 and offset + 1 is equal to textLength.
  3. Failed calls of ScriptItemize and ScriptBreak were never logged. ScriptItemize definitely failed when passing a string with only one character in it because the expected buffer was to small.

Testing strategy:

Known issues with pull request:

The call of ScriptBreak seems wrong when a string contains two or more items generated by ScriptItemize. This seems so because the iCharLength variable is never limited to the length of the item, but rather extended to the length of the string every time. However, if we reset iCharLength and call ScriptBreak for only the text that belongs to the current item, we get word navigation behavior that diverges from the one in Notepad. Therefore I left this as is and documented it in code in case someone stumbles upon this again in the future.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

Copilot AI review requested due to automatic review settings August 18, 2025 16:34
@LeonarddeR LeonarddeR requested a review from a team as a code owner August 18, 2025 16:34
@LeonarddeR LeonarddeR requested a review from seanbudd August 18, 2025 16:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues in the uniscribe code related to character boundary detection that was causing characters like \u0301 (combining acute accent) to be discarded when they were the only character in a string. The fix removes a workaround that appended "xx" to strings and corrects several bugs in the underlying NVDAHelper code.

Key changes:

  • Removes the "xx" padding workaround from uniscribe text processing
  • Fixes buffer sizing and error handling in NVDAHelper's _getLogAttrArray function
  • Corrects character boundary calculation to properly handle string end positions

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
source/textUtils/uniscribe.py Removes "xx" padding workaround and adjusts buffer size for character boundaries
source/textInfos/offsets.py Removes "xx" padding from uniscribe offset calculations
nvdaHelper/local/textUtils.cpp Fixes ScriptItemize buffer sizing, adds error logging, and corrects character boundary detection

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment thread nvdaHelper/local/textUtils.cpp Outdated
@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Aug 18, 2025
@seanbudd

Copy link
Copy Markdown
Member

Hi @LeonarddeR - this is much too risky for a beta, and the issue is not a recent regression.

@seanbudd seanbudd changed the base branch from beta to master August 19, 2025 01:29
@seanbudd seanbudd added the merge-early Merge Early in a developer cycle label Aug 19, 2025

@seanbudd seanbudd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LeonarddeR - generally looks good to me

Comment thread user_docs/en/changes.md Outdated
Comment thread nvdaHelper/local/textUtils.cpp
Comment thread nvdaHelper/local/textUtils.cpp Outdated
@seanbudd

This comment was marked as off-topic.

@LeonarddeR

Copy link
Copy Markdown
Collaborator Author

The issue is not a recent regression.

Fair point.

Note that I intended to revert 021bcd8 in a follow up, but since this is now targeting master anyway, it makes sense to do it in one wave.

@seanbudd seanbudd merged commit 92f345e into nvaccess:master Aug 20, 2025
22 checks passed
@github-actions github-actions Bot added this to the 2026.1 milestone Aug 20, 2025
SaschaCowley pushed a commit that referenced this pull request Sep 2, 2025
Link to issue number:
Related to #18744 
Follow up of #18348 

Summary of the issue:
When working on #18744, I again stumbled upon the fact that the story
length reported by EditTextInfo is too high. This was discovered after
system test failures caused by #18348, i.e. reporting of an extraneous
blank at the end of the document when using NVDA review cursor. Note,
this is not a bug in #18348 and was accounted for in #18744 after the
test failure.

To reproduce
1. Open notepad classic
2. type `a`
3. Position the caret at the end of the line
4. Press nvda+delete or nvda+numpad delete

Observe that NVDA reports 50%, not 100%
Observe that `nav.makeTextInfo("caret")._getStoryLength()` reports 2,
not 1

Description of user facing changes:
None

Description of developer facing changes:
None

Description of development approach:
Fixed the off by one errors by removing the `+ 1` cases.

Testing strategy:
Tested str from this pr.

Known issues with pull request:
None
SaschaCowley pushed a commit that referenced this pull request Jan 30, 2026
)

fixes #18912 

### Summary of the issue:
#18744 fixed up issues with uniscribe. Since that change,
NVDAObjectTextInfo properly allows moving offset past end, however
NVDAObjectTextInfo is not supposed to do so as it has no insertion point
anyway.

### Description of user facing changes:
NVDA no longer reports blank at the end of NVDA Objext text info text
review, such as for list items.

### Description of developer facing changes:
Since `allowMoveToUnitOffsetPastEnd` is now implemented on
NVDAObjectTextInfo, I had to swap two classes in the inheritance
hiërarchie for a unit tests class.

### Description of development approach:
Handle NVDAObjectTextInfo as a text info without insertion point,
similar to virtual buffers. In other words, don't allow moving offset
past end, as there's no point to do so.

### Testing strategy:
Tested str from #18912 

### Known issues with pull request:
None known
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. merge-early Merge Early in a developer cycle

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Combining diacritics are not read when typing/navigating if Unicode normalization is enabled

3 participants