ANother attempt to use moveToCodePointOffset for cursor routing#16876
Conversation
WalkthroughThe code changes primarily enhance text handling and braille cursor routing within NVDA. A new method for text extraction is added to handle list bullets in Word documents, and improvements were made to braille text information retrieval. Additionally, unit tests for braille cursor routing now cover complex characters like emoji and composites, ensuring more reliable cursor positioning. Changes
Sequence Diagram(s)No sequence diagrams are necessary as the changes are too varied and involve enhancements rather than new features or control flow modifications. Assessment against linked issues
These changes enhance the handling of Unicode variation selectors and decomposed characters, ensuring correct cursor positions and symbol representation on braille displays, addressing the issues specified. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
Outside diff range, codebase verification and nitpick comments (1)
tests/unit/textProvider.py (1)
Line range hint
1-82: Review ofBasicTextInfoclass after removal ofuseUniscribeattribute.The removal of the
useUniscribeattribute fromBasicTextInfocould potentially alter how text is processed within this class, especially in relation to encoding and text utility module testing. This change should be thoroughly tested to ensure that it does not introduce any regressions or unexpected behaviors in text handling, particularly sinceBasicTextInfois used for testing other components.
See test results for failed build of commit 1181beb0f6 |
|
@LeonarddeR can you mention in the PR description how this pr improves upon #16477 and #16497 , E.g. how it solves the MS Word bullet / numbering problem? |
|
@michaelDCurran Sure, just did. |
seanbudd
left a comment
There was a problem hiding this comment.
Looks good to me, just 2 minor issues
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
See test results for failed build of commit 209069f4df |
|
I'm very puzzled about why this failed linting. Update: See #16928 (comment) |
844470b to
3d83711
Compare
See test results for failed build of commit e3b89e7993 |
Link to issue number:
Fixes #10960
Replaces #16477, #16497
Summary of the issue:
There are cases where moving one character on a textInfo instance actually moves more than one unicode point offset. This is described by @mltony in the doc string for
textInfos.TextInfo.moveToCodepointOffset.This causes of by one errors when cursor routing, since we're asking the textInfo to move by 1 characters, that might be presented by two or even more characters within the liblouis mapping.
Description of user facing changes
Cursor routing should be more reliable.
Description of development approach
@mltony's creation of
moveToCodepointOffsetallows us to move x code points from the start of the reading unit. As we're using 32 bit encoding for liblouis, every character as presented by liblouis is equal to one code point. Therefore we can safely assume that this method to move is much more reliable than the previous method.Differences with #16477, #16497
To fix the issue mentioned in #10960 (comment), i.e. with bullets in Word, I did the following:
moveToCodepointOffsetin the base textInfo to use a new method_getTextForCodepointMovementto get the text to operate on. this was thetext property before._getTextForCodepointMovementdefaults toreturn self.text, however for Word UIA, it uses getTextWithFields to construct the text to operate on. This ensuresmoveToCodepointOffsetworks with text in which bullets and numbering are excluded.Testing strategy:
Known issues with pull request:
In textInfo instances where multiple codepoint emoji are treated as one character (e.g. in Word),
moveToCodepointOffsetraises an error when trying to move to the second code point. Therefore for routing, this should be handled more gracefully, that's why whenmoveToCodePointOffsetfails, we move one character back with a maximum of 10 iterations. This behavior is covered by a unit test.Code Review Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation