Skip to content

Several fixups for unicode normalization#16584

Merged
seanbudd merged 6 commits into
nvaccess:masterfrom
LeonarddeR:fixupNormalizationBraille
May 27, 2024
Merged

Several fixups for unicode normalization#16584
seanbudd merged 6 commits into
nvaccess:masterfrom
LeonarddeR:fixupNormalizationBraille

Conversation

@LeonarddeR

@LeonarddeR LeonarddeR commented May 21, 2024

Copy link
Copy Markdown
Collaborator

Link to issue number:

Fixup of #16521
Fixes #11570
Partial fix for #4631

Summary of the issue:

  1. It turns out that rawTextTypeforms on a region may be None, this was an oversight on my end.
  2. cursorPos may also be None.
  3. @burmancomp reported a zero division error in case a string ended with a non breaking space and a space.

Description of user facing changes

No longer errors in the log when getting flash messages in Thunderbird and/or reading messages in WhatsApp UWP.

Description of development approach

  1. Explicitly check for None typeforms and cursorPos, thereby improving readability as well.
  2. Improve the calculateOffsets method in textUtils to ensure it can handle the case as reported by @burmancomp

Testing strategy:

From a python console
braille.TextRegion("ij").update()
No longer results in an error.
Same for braille.TextRegion("\xa0 ").update()

Known issues with pull request:

None known

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@LeonarddeR LeonarddeR requested a review from a team as a code owner May 21, 2024 11:09
@LeonarddeR LeonarddeR requested a review from seanbudd May 21, 2024 11:09
@LeonarddeR LeonarddeR changed the title Respect cases where no typeforms are provided Braille unicode normalization: Respect cases where no typeforms or cursor position provided May 21, 2024
@AppVeyorBot

Copy link
Copy Markdown
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit 8147d2e76b

@AppVeyorBot

Copy link
Copy Markdown
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit 740467ce53

@AppVeyorBot

Copy link
Copy Markdown
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit b9ed18e76b

@LeonarddeR LeonarddeR marked this pull request as draft May 22, 2024 12:29
@LeonarddeR

Copy link
Copy Markdown
Collaborator Author

@burmancomp reported a zero division error when doing:
textUtils.UnicodeNormalizationOffsetConverter("removed original text\xa0 ")

@LeonarddeR LeonarddeR changed the title Braille unicode normalization: Respect cases where no typeforms or cursor position provided Several fixups for unicode normalization May 22, 2024
@AppVeyorBot

Copy link
Copy Markdown
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit e687938887

@LeonarddeR LeonarddeR force-pushed the fixupNormalizationBraille branch from 3185531 to 5f3f0f8 Compare May 23, 2024 12:30
@LeonarddeR LeonarddeR marked this pull request as ready for review May 25, 2024 06:49
@LeonarddeR LeonarddeR requested a review from a team as a code owner May 25, 2024 06:49
@LeonarddeR LeonarddeR requested a review from Qchristensen May 25, 2024 06:49
@LeonarddeR

Copy link
Copy Markdown
Collaborator Author

I think for now, we could probably best leave this as is. Note that there are still some open questions, but these can be handled in a follow up:

  1. Whether unicode normalization should be disabled by default (I think it still should),. That said, we should be able to change the default without breaking explicit configurations, hence the feature flag with combo box approach.
  2. Whether character navigation should also normalize. It currently doesn't, and I think there's a real good reason for it. An option to consider is to report the normalized character preceded by the word normalized, but in other languages (e.g. in Dutch) this will probably be very weird and confusing. Other options hook into the character descriptions mechanism.

@seanbudd seanbudd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LeonarddeR

Comment thread user_docs/en/userGuide.md Outdated
Comment thread user_docs/en/changes.md Outdated

@Qchristensen Qchristensen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@seanbudd

Copy link
Copy Markdown
Member

@LeonarddeR - are these 2 things tracked in a separate issue or discussion? I think they'd be good to discuss for 2024.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support mathematical alphanumeric symbols

4 participants