Fix issues in uniscribe code#18744
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes issues in the uniscribe code related to character boundary detection that was causing characters like \u0301 (combining acute accent) to be discarded when they were the only character in a string. The fix removes a workaround that appended "xx" to strings and corrects several bugs in the underlying NVDAHelper code.
Key changes:
- Removes the "xx" padding workaround from uniscribe text processing
- Fixes buffer sizing and error handling in NVDAHelper's
_getLogAttrArrayfunction - Corrects character boundary calculation to properly handle string end positions
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| source/textUtils/uniscribe.py | Removes "xx" padding workaround and adjusts buffer size for character boundaries |
| source/textInfos/offsets.py | Removes "xx" padding from uniscribe offset calculations |
| nvdaHelper/local/textUtils.cpp | Fixes ScriptItemize buffer sizing, adds error logging, and corrects character boundary detection |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
|
Hi @LeonarddeR - this is much too risky for a beta, and the issue is not a recent regression. |
seanbudd
left a comment
There was a problem hiding this comment.
Thanks @LeonarddeR - generally looks good to me
This comment was marked as off-topic.
This comment was marked as off-topic.
Fair point. Note that I intended to revert 021bcd8 in a follow up, but since this is now targeting master anyway, it makes sense to do it in one wave. |
Link to issue number: Related to #18744 Follow up of #18348 Summary of the issue: When working on #18744, I again stumbled upon the fact that the story length reported by EditTextInfo is too high. This was discovered after system test failures caused by #18348, i.e. reporting of an extraneous blank at the end of the document when using NVDA review cursor. Note, this is not a bug in #18348 and was accounted for in #18744 after the test failure. To reproduce 1. Open notepad classic 2. type `a` 3. Position the caret at the end of the line 4. Press nvda+delete or nvda+numpad delete Observe that NVDA reports 50%, not 100% Observe that `nav.makeTextInfo("caret")._getStoryLength()` reports 2, not 1 Description of user facing changes: None Description of developer facing changes: None Description of development approach: Fixed the off by one errors by removing the `+ 1` cases. Testing strategy: Tested str from this pr. Known issues with pull request: None
) fixes #18912 ### Summary of the issue: #18744 fixed up issues with uniscribe. Since that change, NVDAObjectTextInfo properly allows moving offset past end, however NVDAObjectTextInfo is not supposed to do so as it has no insertion point anyway. ### Description of user facing changes: NVDA no longer reports blank at the end of NVDA Objext text info text review, such as for list items. ### Description of developer facing changes: Since `allowMoveToUnitOffsetPastEnd` is now implemented on NVDAObjectTextInfo, I had to swap two classes in the inheritance hiërarchie for a unit tests class. ### Description of development approach: Handle NVDAObjectTextInfo as a text info without insertion point, similar to virtual buffers. In other words, don't allow moving offset past end, as there's no point to do so. ### Testing strategy: Tested str from #18912 ### Known issues with pull request: None known
Link to issue number:
Closes #18722
Reverts 021bcd8
Summary of the issue:
The code that split a string at character boundaries discarded characters like \u0301 when such a character was the only character in a string.
Description of user facing changes:
When unicode normalization is on, navigating by character will again correctly announce characters like acute (\u0301)
Description of developer facing changes:
calculateCharacterBoundariesnow needs an offsets array that is textLength + 1 in size, rather than only the text length. This is to store the end offset of the last character.Description of development approach:
While debugging this issue, I found several issues in NVDAHelper local textUtils
_getLogAttrArrayfunction that is the base of the several calculation functions:ScriptItemizewas too small when a string only contained one character. The buffer should have space for at least two SCRIPT_ITEM structures in size. This is my hypothesis about the necessity of 021bcd8, namely the addition of two alphanumeric characters passed to the uniscribe methods. While there are no exactly known str to reproduce the issue behind 021bcd8, I tried several combinations of shorter and longer strings without alpha numeric characters, and I couldn't reproduce any of the behavior described in it after my changes to the c++ code. Therefore I reverted 021bcd8, as there is enough time to test this in Alpha thoroughly.ScriptItemizeandScriptBreakwere never logged.ScriptItemizedefinitely failed when passing a string with only one character in it because the expected buffer was to small.Testing strategy:
Known issues with pull request:
The call of
ScriptBreakseems wrong when a string contains two or more items generated byScriptItemize. This seems so because theiCharLengthvariable is never limited to the length of the item, but rather extended to the length of the string every time. However, if we resetiCharLengthand callScriptBreakfor only the text that belongs to the current item, we get word navigation behavior that diverges from the one in Notepad. Therefore I left this as is and documented it in code in case someone stumbles upon this again in the future.Code Review Checklist:
@coderabbitai summary