Skip to content

ANother attempt to use moveToCodePointOffset for cursor routing#16876

Merged
seanbudd merged 13 commits into
nvaccess:masterfrom
LeonarddeR:codePointOffsetTake3
Jul 30, 2024
Merged

ANother attempt to use moveToCodePointOffset for cursor routing#16876
seanbudd merged 13 commits into
nvaccess:masterfrom
LeonarddeR:codePointOffsetTake3

Conversation

@LeonarddeR

@LeonarddeR LeonarddeR commented Jul 17, 2024

Copy link
Copy Markdown
Collaborator

Link to issue number:

Fixes #10960
Replaces #16477, #16497

Summary of the issue:

There are cases where moving one character on a textInfo instance actually moves more than one unicode point offset. This is described by @mltony in the doc string for textInfos.TextInfo.moveToCodepointOffset.
This causes of by one errors when cursor routing, since we're asking the textInfo to move by 1 characters, that might be presented by two or even more characters within the liblouis mapping.

Description of user facing changes

Cursor routing should be more reliable.

Description of development approach

@mltony's creation of moveToCodepointOffset allows us to move x code points from the start of the reading unit. As we're using 32 bit encoding for liblouis, every character as presented by liblouis is equal to one code point. Therefore we can safely assume that this method to move is much more reliable than the previous method.

Differences with #16477, #16497

To fix the issue mentioned in #10960 (comment), i.e. with bullets in Word, I did the following:

  1. Changed moveToCodepointOffset in the base textInfo to use a new method _getTextForCodepointMovement to get the text to operate on. this was the text property before.
  2. _getTextForCodepointMovement defaults to return self.text, however for Word UIA, it uses getTextWithFields to construct the text to operate on. This ensures moveToCodepointOffset works with text in which bullets and numbering are excluded.

Testing strategy:

Known issues with pull request:

In textInfo instances where multiple codepoint emoji are treated as one character (e.g. in Word), moveToCodepointOffset raises an error when trying to move to the second code point. Therefore for routing, this should be handled more gracefully, that's why when moveToCodePointOffset fails, we move one character back with a maximum of 10 iterations. This behavior is covered by a unit test.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

Summary by CodeRabbit

  • New Features

    • Enhanced text extraction in Word documents by stripping list bullets during character movement.
    • Improved braille cursor routing for Unicode variation selectors and decomposed characters.
  • Bug Fixes

    • Enhanced error handling and logging for braille text information retrieval.
  • Tests

    • Added new unit tests for braille cursor routing, including tests for emoji and composite characters.
  • Documentation

    • Updated user documentation to highlight improvements in braille cursor routing reliability.

@LeonarddeR LeonarddeR marked this pull request as ready for review July 17, 2024 19:01
@LeonarddeR LeonarddeR requested a review from a team as a code owner July 17, 2024 19:01
@LeonarddeR LeonarddeR requested a review from seanbudd July 17, 2024 19:01
@coderabbitai

coderabbitai Bot commented Jul 17, 2024

Copy link
Copy Markdown
Contributor

Walkthrough

The code changes primarily enhance text handling and braille cursor routing within NVDA. A new method for text extraction is added to handle list bullets in Word documents, and improvements were made to braille text information retrieval. Additionally, unit tests for braille cursor routing now cover complex characters like emoji and composites, ensuring more reliable cursor positioning.

Changes

File Path Change Summary
source/NVDAObjects/UIA/wordDocument.py Added _getTextForCodepointMovement method to handle text extraction with considerations for list bullets.
source/braille.py Enhanced getTextInfoForBraillePos method to include error handling and logging.
source/textInfos/__init__.py Added _getTextForCodepointMovement method and modified moveToCodepointOffset to use this method.
tests/unit/test_braille/test_routing.py Added tests for braille cursor routing with emoji and composite characters.
tests/unit/textProvider.py Removed useUniscribe attribute from BasicTextInfo class.
user_docs/en/changes.md Documented improvement in braille cursor routing reliability for Unicode variation selectors and decomposed characters.

Sequence Diagram(s)

No sequence diagrams are necessary as the changes are too varied and involve enhancements rather than new features or control flow modifications.

Assessment against linked issues

Objective (Issue #10960) Addressed Explanation
Ensure cursor moves to the correct position with variation selectors
Ensure ⚠️ symbol takes one cell on braille display

These changes enhance the handling of Unicode variation selectors and decomposed characters, ensuring correct cursor positions and symbol representation on braille displays, addressing the issues specified.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range, codebase verification and nitpick comments (1)
tests/unit/textProvider.py (1)

Line range hint 1-82: Review of BasicTextInfo class after removal of useUniscribe attribute.

The removal of the useUniscribe attribute from BasicTextInfo could potentially alter how text is processed within this class, especially in relation to encoding and text utility module testing. This change should be thoroughly tested to ensure that it does not introduce any regressions or unexpected behaviors in text handling, particularly since BasicTextInfo is used for testing other components.

Comment thread tests/unit/test_braille/test_routing.py
Comment thread source/braille.py
Comment thread source/textInfos/__init__.py
@LeonarddeR LeonarddeR marked this pull request as draft July 18, 2024 06:22
@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 1181beb0f6

@LeonarddeR LeonarddeR marked this pull request as ready for review July 19, 2024 05:47
@michaelDCurran

Copy link
Copy Markdown
Member

@LeonarddeR can you mention in the PR description how this pr improves upon #16477 and #16497 , E.g. how it solves the MS Word bullet / numbering problem?

@LeonarddeR

Copy link
Copy Markdown
Collaborator Author

@michaelDCurran Sure, just did.

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Jul 30, 2024

@seanbudd seanbudd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just 2 minor issues

Comment thread source/braille.py Outdated
Comment thread tests/unit/test_braille/test_routing.py Outdated
Comment thread source/braille.py Outdated
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 209069f4df

@LeonarddeR

LeonarddeR commented Jul 30, 2024

Copy link
Copy Markdown
Collaborator Author

I'm very puzzled about why this failed linting.

Update: See #16928 (comment)

@LeonarddeR LeonarddeR force-pushed the codePointOffsetTake3 branch from 844470b to 3d83711 Compare July 30, 2024 12:06
@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit e3b89e7993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Braille: Variation Selectors break cursor positions

4 participants