Skip to content

Use uniscribe to calculate character offsets where allowed#10550

Merged
michaelDCurran merged 6 commits into
masterfrom
uniscribeCharacterOffsets
Nov 28, 2019
Merged

Use uniscribe to calculate character offsets where allowed#10550
michaelDCurran merged 6 commits into
masterfrom
uniscribeCharacterOffsets

Conversation

@michaelDCurran

@michaelDCurran michaelDCurran commented Nov 26, 2019

Copy link
Copy Markdown
Member

Link to issue number:

None.

Summary of the issue:

When moving by character in an NVDA virtualBuffer, each and every unicode code point is treated as its own character, even if is is visually combined with another code point to create one composit character. Examples are:

  • é: e with an acute
  • ❤️: red heart with a variation selector
  • 8⃣: number 8 inside a keycap
    Similarly, when moving by character in Notepad with the arrow keys, NVDA only reads the first code point in composit characters.

Description of how this pull request fixes the issue:

Just like how we use the Windows uniscribe library to calculate word offsets in some places, use it to also calculate character offsets.
This involved:

  • Abstracted code from nvdaHelperLocal's calculateWordOffsets into _calculateUniscribeOffsets which takes a unit argument of either character or word. For word the code is as it was. For character, it widens the offsets until hitting a code point marked with fCharStop. calculateCharacterOffsets has been added which just calls _calculateUniscribeOffsets with a unit of character.
  • Abstract out all the uniscribe code from OffsetsTextInfo._getWordOffsets into a new _getUniscribeOffsets method, and use this also in _getCharacterOffsets.

Testing performed:

  • Using the example characters above in a virtualBuffer, moved throw them with the arrow keys making sure that they are all treated as single composit characters.
  • Using the example characters above in a notepad document, move through them with left and right arrow and make sure that NVDA announces the entire composit character.

Known issues with pull request:

Although NVDA now matches the behaviour of notepad and other standard edit controls, which includes treating acutes, variation selectors and some other modifiers as being a part of the previous symbol, complex tied emojies that use multiple unicode characters connected with a tie u+200d symbol, are still not treated as one single composit character. But if we did, this would differ from Windows' own standard edit control behaviour.

Change log entry:

Bug fixes:

  • NVDA will now treat certain composit unicode characters such as e-acute as one single character when moving through text.

… calculate the bounds for a character. This allows us to treat something like e-acute as one character.
…cterOffsets, allowing unit tests to pass again.
@LeonarddeR

LeonarddeR commented Nov 27, 2019 via email

Copy link
Copy Markdown
Collaborator

@michaelDCurran

michaelDCurran commented Nov 27, 2019 via email

Copy link
Copy Markdown
Member Author

@LeonarddeR

LeonarddeR commented Nov 27, 2019 via email

Copy link
Copy Markdown
Collaborator

@michaelDCurran

michaelDCurran commented Nov 27, 2019 via email

Copy link
Copy Markdown
Member Author

Comment thread nvdaHelper/local/textUtils.cpp Outdated
Comment thread source/textInfos/offsets.py Outdated
Comment thread source/textInfos/offsets.py
Comment thread source/textInfos/offsets.py Outdated
Comment thread source/textInfos/offsets.py Outdated
Comment thread source/textInfos/offsets.py
@michaelDCurran

Copy link
Copy Markdown
Member Author

@LeonarddeR I have addressed all your review actions I believe. When abstracting _calculateUniscribeOffsets in textUtils.cpp, I still copied the two basic for loops that walk the offsets, otherwise it would have become very complex to read with fWordStop changed to fCharStop dynamically changed based on the unit somehow.
Also, that comment about uniscribe being broken without the two extra chars added: I'm not sure if this is still true, but as there are many Operating System variations to test on, I'd prefer not to address that at the moment. Moving by word logic should not be changed, just code moved a bit.

@LeonarddeR LeonarddeR left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks awesome now!

@michaelDCurran michaelDCurran merged commit 1045d2d into master Nov 28, 2019
@nvaccessAuto nvaccessAuto added this to the 2019.3 milestone Nov 28, 2019
michaelDCurran added a commit that referenced this pull request Nov 28, 2019
@LeonarddeR

Copy link
Copy Markdown
Collaborator

I'm afraid that this pull request introduces off by n errors in braille.TextInfoRegion.getTextInfoForBraillePos. However, it is pretty difficult to avoid this. We should somehow be able to decouple braille positions from characters.

@michaelDCurran

michaelDCurran commented Dec 3, 2019 via email

Copy link
Copy Markdown
Member Author

@LeonarddeR

Copy link
Copy Markdown
Collaborator

No, no need to revert this. At least for uniscribe, we can disable it at the TextInfo object level before moving by character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants