Speak Symbols When moving by word by feerrenrut · Pull Request #12710 · nvaccess/nvda

feerrenrut · 2021-08-04T11:45:01Z

Closed reason.

This requires an API breaking change, time ran out prior to the 2022.1 release while resolving usage and design concerns.

There are many translated strings in NVDA (approx. 2300). It will take a while to consider them, and wrap with a type.
The full set of types required is not known, what about strings with placeholders. The placeholder text may need to be considered for symbol substitution, but not the surrounding text. Consider the contrived example: _("The column's name is %s"). The apostrophe should not be announced as tick, but if the the column name should be processed eg for column name don't: "The column's name is don tick t"
The commands inherit from str to ensure that they can be used in the same way is prior items in the speech sequence, however this also means it is easy for a logic error to drop the extra information provided by a class.

Link to issue number:

#12695
Fixes: #10855
Fixes #11779

Summary of the issue:

NV Access took open this abandoned PR:
Recent history: PR #12695 reverted PR #11856 fixing a regression described in issue #12653.

Navigation by word is inconsistent among different editors.
This might lead to different symbol announcements, or opening and closing symbols not announced consistently.

Description of how this pull request fixes the issue:

New config "symbolLevelWordAll" enabling symbol level ALL when navigating by word.
When navigating by character, use symbol level all.
When describing a symbol a new speech command is used to wrap the replacement.
This propagates the initial intention and semantics of the speech sequence item.

Intentionally changes behavior for symbol text sent to synth when moving by word. This now matches the behavior for moving by line. The hyphen character in symbols like "t-shirt" and "right-pointing arrow" are sent to the synthesizer.

Note, commits have been carefully authored to show the progression of changes, however there are two commits that can be ignored, which only tidy related code:

Clarify symbol level default
add type information

The notepad tests have been confirmed to pass for all commits, looking at the changes to these tests will demonstrate the changes in behavior with that commit.

Testing strategy:

System tests
Manual testing

Known issues with pull request:

None

Change log entries:

Already included in PR, see diff.

Code Review Checklist:

Pull Request description is up to date.
Unit tests.
System (end to end) tests.
Manual testing.
User Documentation.
Change log entry.
Context sensitive help for GUI changes.
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers

feerrenrut · 2021-08-09T03:19:39Z

From the known issues with this PR:

The final commit "Restore initial behavior for dash containing symbols" is potentially unnecessary.
However, a decision must be made about the preference for "t shirt" or "t-shirt" sent to the synth.
In the latter case a user's synth may speak this as "t dash shirt".

The changes in commit "Restore initial behavior for dash containing symbols":
16da2f0#diff-a29456c11e438847c67bfe5c503ef7ca66bd0ffad420b23f7cf064505622cd45L1337

Restores the removed hyphen behavior.

We believe this code historically was required to provide the expected behavior for replacing whitespace characters (via the symbols pronunciation mechanism at level "character").

To resolve the "known issue" tests have been updated to include white space characters . The tests are added to the first commit of this PR to make it easier to assess the changes in behavior with each commit.

Reviewing the changes in notepadTests.py demonstrates how the behavior is changed.
I think we should drop the final commit, allowing the right pointing arrow and t shirt symbols to be spoken with hyphens in the text sent to the synthesizer.

amirsol81 · 2021-08-10T06:47:01Z

Does this also apply to selecting text when SHIFT is added to navigation shortcuts?

CyrilleB79 · 2021-08-11T23:11:05Z

I thought I had already commented in this PR but it seems not. So here I go.

I did not read it all carefully. However I can affirm that with French OneCore (Hortense), passing "T shirt" (without dash) to the synth instead of "T-shirt" (with dash) leads to an erroneous mispronunciation. Actually, in the latter case, the letter T is pronounced "tee" in English whereas in the first case, it is pronounced "teh" (in French) what is wrong since in the word "T-shirt" the "T" is pronounced in English even in the French word.

feerrenrut · 2021-08-13T07:39:10Z

Thanks for the information @CyrilleB79. The systems tests for this PR show that before any changes, when moving by word in notepad, the synth is sent "t shirt" (no dash). Can you confirm that recent releases of NVDA using French OneCore this problem exists?

At this stage I expect we'll remove the code that results in the dashes being removed, but since it is a change in behavior I'm being cautious.

seanbudd

Great work here, the general approach and testing looks good to me.

seanbudd · 2021-08-19T04:59:22Z

+	"""This method prepares a list, which contains character and its description for all characters the text is
+	made up of, by checking the presence of character descriptions in characterDescriptions.dic of that locale
+	for all possible combination of consecutive characters in the text.


I found this hard to parse, is this correct?

Suggested change

"""This method prepares a list, which contains character and its description for all characters the text is

made up of, by checking the presence of character descriptions in characterDescriptions.dic of that locale

for all possible combination of consecutive characters in the text.

"""This method prepares a list, where each item is a tuple of characters and a list of the description for all

characters the text is made up of.

This is generated by checking the presence of character descriptions in characterDescriptions.dic of the given

locale for all possible combination of consecutive characters in the text.

To be honest I'm not sure. I just broke it up into shorter lines and added return type information.
I don't think I could reword this quickly.

feerrenrut · 2021-08-20T10:33:04Z

I'll squash these latest commits into the branch once this is approved.

lukaszgo1 · 2021-08-20T11:40:23Z

@feerrenrut What I'm missing from the description of this PR is a reassurance for add-on authors that introduction of these new speech commands is not going to cause breakage for them. Could you please describe any testing that you've done with add-ons (particularly with ones which were causing troubles with introduction of the new speech commands such as NVDA Remote)?

seanbudd

Thanks @feerrenrut, LGTM pending confirmation that add-ons are not broken

feerrenrut · 2021-08-23T03:19:28Z

@lukaszgo1 No, this hasn't been looked at, good catch.

It certainly is a tricky one. Our intention certainly is not to break add-ons, but the cost could be a more complicated codebase (full of workarounds), or being unable to take up new features / fix issues (such as this PR) until a point one release. Since these ARE speech commands, they should be handled (and likely ignored) by addons. Granted, for addons (such as NVDA remote) that need to process speech, loosing this speech would be a problem.

I'll look into whether this affects NVDA remote, and also consider how we can alter the approach.

feerrenrut · 2021-08-25T04:21:12Z

After some investigation into different approaches, it will be very difficult to introduce this feature without introducing a lot of complexity, or potentially degrading addons.

Instead, we'll postpone this feature until the 2022.1 release.

CyrilleB79 · 2021-08-25T09:00:15Z

Thanks for the information @CyrilleB79. The systems tests for this PR show that before any changes, when moving by word in notepad, the synth is sent "t shirt" (no dash). Can you confirm that recent releases of NVDA using French OneCore this problem exists?

@feerrenrut, no, surprisingly, this problem does not exist. I have been using nvda_snapshot_alpha-23630,0f481d8a.exe and I have tested with French OneCore (Hortense) in Notepad. The results are:

the '👕' symbol is read correctly either by line, by character or by word
a line containing "T-shirt" (with dash) is read correctly
a line containing "T shirt" (without dash) is NOT read correctly

I have activated the log of SpeechManager debug messages and saw something very strange:

With a French voice, the symbol is spelt
- "T-shirt" (with dash) at IO log level
- "T-shirt " (with dash and with two trailing spaces) at SpeechManager debug log level.
With an English voice, the symbol is spelt:
- "t-shirt" (with dash) at IO log level
- "t shirt " (without dash and with two trailing spaces) at SpeechManager debug log level.

I have no explanation why the dash is removed in English but not in French. But this makes the T-shirt read correctly by chance...

seanbudd · 2021-08-26T01:08:32Z

Is it possible that the difference between symbol usage between English and French is due to differences in symbols.dic?

For French, the line is: - tiret. For English: - dash most

CyrilleB79 · 2021-08-26T12:52:53Z

Is it possible that the difference between symbol usage between English and French is due to differences in symbols.dic?

For French, the line is: - tiret. For English: - dash most

This should not make a difference, no? I guess that if the level is not specified in a locale .dic file, the level is inherited from English.

Qchristensen

I don't have an answer to the French pronunciation of T-Shirt either - does it vary when testing with different synthesizers? (which would likely indicate a pronunciation issue with one synth). Just trying to think of other similar things... T-Rex, the dinosaur, or T-Ball, the kids sport.

Otherwise, updates to changes and user guide look fine.

feerrenrut · 2021-08-27T06:44:48Z

@CyrilleB79 It would be great to introduce French variations of the system tests. Hopefully NVDA isn't affected too heavily by the Windows language. Though system tests are a good way to find that.

lukaszgo1 · 2021-09-01T11:14:34Z

It would be great to introduce French variations of the system tests. Hopefully NVDA isn't affected too heavily by the Windows language. Though system tests are a good way to find that.

Is there any interest in general in making system tests working on non English versions of Windows - for now they don't work at all.

feerrenrut · 2021-09-09T02:27:04Z

Rather than specifically a French version of windows, I would start with NVDA set to French language for certain system tests that exercise aspects that may be broken more easily (EG symbol processing, decimal, perhaps dates)?. Ideally this should work on any machine, we should invert the dependency on Windows API calls so that it can be controlled during a system test.

CyrilleB79 · 2022-02-03T10:24:46Z

Hi @feerrenrut

Thanks for this summary.

The core is to fix Move the cursor one word to the right, symbol "|" not reported #10855
(Pipe symbol not reported when moving by word) and Inconsistencies in symbol pronunciation when reading by word and encountering a "word" which only contains multiple symbols. #11779 (Inconsistencies in symbol pronunciation... )

Actually #10855 is already fixed in 2020.3 as explained in #10855 (comment). So this PR just fixes #11779.

Additionally, based on the concern that not everyone wants "symbol level all" for move by word, an option was added to fallback to the prior intended behavior of observing the symbol level setting.

Yes, this should be enough to mitigate the risk of reverting this PR (on the contrary to what I have written before).

We have also fixed some of the concerns demonstrated in add tests for symbol levels #13305. Specifically, double processing of symbols (EG t-shirt and right-pointing-arrow), and the core mechanism to resolve symbol substitution within the speech UI messages.

From here, I'll wrap (certain obvious) speech messages with a UI speech command, and try to get this merged. Rather than aim for a complete fix for this pre-existing issue, I'll improve the obvious case, leave a mechanism for improving the rest. I don't want this PR to get held up further fixing pre-existing issues that weren't in the initial scope.

Yes this makes sense. This PR should be merged as soon as possible in order to identify if some more messages should be wrapped in UI speech command during alpha or beta testing phase.

Just one last point: you haven't answered to #12710 (comment), i.e. symbols are not reported when selecting by word.
It should be clarified:

if you plan to fix it in this PR
if you consider it an issue but do not plan to fix it in this PR
if you do not consider it an issue

lukaszgo1 · 2022-02-03T11:12:05Z

I still stand by my opinion that this should not be merged as is.
To avoid getting out of scope I'm not going to discuss what symbol level should be used when selecting and I'll focus on the regressions this PR introduces mainly based on this comment

* Additionally, based on the concern that not everyone wants "symbol level all" for move by word, an option was added to fallback to the prior intended behavior of observing the symbol level setting.

While this works for word navigation this PR also changes the behavior when navigating by character. The entire sequence is spoken at a level all which causes punctuation in names of the table header and in style names to be wrongly reported and this cannot be controlled by the user.

and the core mechanism to resolve symbol substitution within the speech UI messages.

As demonstrated by the comment to which I've linked above this mechanism is not good enough. While you would be able to fix style names not being reported with the level all by enclosing them in the right speech command the same cannot be done for names of table headers.

What worries me is that I see no way to improve the situation without introducing additional backwards incompatible changes meaning that any fixes would have to wait until 2023.1.
Since we have identified certain regressions which would be introduced when this PR is going to be merged, the proposed mechanism for excluding strings from being spoken at level all is clearly not flexible enough, we do not have a clear path forward for resolving issues with the entire sequences being spoken at the wrong level and there is no high demand for #11779 to be implemented soon I believe a right thing to do is not to include this half baked in 2022.1 and rework the implementation for 2023.1

CyrilleB79 · 2022-02-03T14:01:35Z

To avoid getting out of scope I'm not going to discuss what symbol level should be used when selecting

Sorry, but IMO it is not out of scope of this PR.
Indeed, currently, when moving by character, symbols are reported whatever the symbol level is. And selecting by character, the symbol is also reported whatever the symbol level is.
With this PR, moving by word, symbols will be reported whatever the symbol level is. It would seem coherent to me that with this PR, selecting by word, symbol be reported whatever the symbol level is.

While this works for word navigation this PR also changes the behavior when navigating by character. The entire sequence is spoken at a level all which causes punctuation in names of the table header and in style names to be wrongly reported and this cannot be controlled by the user.

Sorry, I had missed this point. I completely agree with you. The use case moving by character in a table with headers containing symbols is a real-life use case and merging this PR as is would introduce a visible regression.

lukaszgo1 · 2022-02-03T14:17:38Z

Sorry, but IMO it is not out of scope of this PR. Indeed, currently, when moving by character, symbols are reported whatever the symbol level is. And selecting by character, the symbol is also reported whatever the symbol level is. With this PR, moving by word, symbols will be reported whatever the symbol level is. It would seem coherent to me that with this PR, selecting by word, symbol be reported whatever the symbol level is.

I agree that currently this is inconsistent, but there is no visible change in the behavior when selecting text, this can be worked on in a follow up PR and it is not necessary to fix #11779.

This reverts commit 619e5f7. Additionally re-enable and update tests so expectation matches behaviour.

Update tests to show change in behaviour

Update tests to show inital behaviour restored for right pointing arrow and t shirt symbols.

feerrenrut · 2022-02-04T12:08:29Z

I have tested table headers with a codepen sample and didn't find any situation that had been made worse. Ideally, add an automatic test case to demonstrate the "before" behavior on master.

We have exhausted the amount of time we can dedicate to this now. I'll spend a little time on on Monday to write up where this stands, document the approach, and clean up the open issues to clarify goals. I'll close this PR, anyone wishing to pick this up should open a new one.

lukaszgo1 · 2022-02-04T18:55:37Z

I have tested table headers with a codepen sample and didn't find any situation that had been made worse.

You can reproduce the regression as follows:

Position yourself in the cell with the text "Blue" inside
Move with the right arrow to the cell containing "True" and notice that the column header "don't" is announced as "don tick t"

We have exhausted the amount of time we can dedicate to this now. I'll spend a little time on on Monday to write up where this stands, document the approach, and clean up the open issues to clarify goals. I'll close this PR, anyone wishing to pick this up should open a new one.

While dropping this work after so much time has been spend on it must have been difficult I really applaud you for doing a right thing here! The fact that we can merge code when it is working as best as possible rather that on some arbitrary deadline is one of the main reasons why I find working on Open Source stuff so rewarding personally.

feerrenrut · 2022-02-07T09:58:28Z

Let's be clear about the behavior with the table:
In 2021.3.1:

with symbol level all, don tick t reported
with symbol level none, don't reported
With this PR:
with symbol level None and "symbol level all for move by word" disabled, don tick t reported.

This case is questionable, some thoughts on the matter:

The intention with this PR is to speak at symbol level all consistently, it's doing that.
However, in this case it is not primary content being navigated over, it is secondary (IE context for the cell).
It wouldn't be particularly difficult to work around in a similar way to the speech UI, a new command for "secondary" or "context" content.

Note, with the approach demonstrated here, and with #12779 the goal would be to have no un-annotated strings in speech sequences anymore. This would require introducing a "Content" type (in addition to UserInterface, and Symbol). There are few different approaches that we can consider:

We could attach metadata to the these types, like symbolLevel. This results in the creation code (rather than the type) having control. One Content instance may have a different result from another Content instance.
We could add more types for the different behaviors eg (Context could be a type of Content which doesn't expand symbols). The calling code still has control (based on which type is created), but once created it has consistent behavior.

feerrenrut · 2022-02-15T08:19:47Z

If this gets picked up again, I'd recommend the first step be to rebase onto #13305 and confirm the output with those tests.

Related to issue #11779, #10855, and PR #12710 Summary: There a number of pre-existing problems with symbol pronunciation in NVDA. They are hard to reason about due to the variety of settings that can affect behavior. Examples with Symbol level all: - '👕' symbol ("t-shirt") spoken as 't dash shirt' when moving by word or character - Text "don't" spoken as "don tick t" when moving by word - Symbols in NVDA speech UI (EG "spelling error" in French "Faute d'orthographe") are processed as symbols. The example "Faute d'orthographe" becomes "Faute d apostrophe orthographe", equivalent to "spelling e tick rror" in English, except that "apostrophe" is a longer word than "tick". Description of change: Does not introduce user visible changes, intended to help developers reason about the behavior of NVDA. In tests Symbol level All and None are used as the boundary cases for behavior. The tests are introduced to demonstrate behavior with: - Moving by word, line, character. - Selecting by word, line, character. - Symbol Level All when speech UI contains a symbol. For this, the System Test Spy global plugin had to be modified to alter translation strings of NVDA at runtime. - Secondary (contextual) content containing symbols. EG column headers spoken when moving between cell boundaries.

Adriani90 · 2024-11-24T19:54:22Z

Adding the abandoned label here so people can find this great valuable work by filtering that label.

cc: @raducoravu, @dmitrii-drobotov in case you are interested in the mechanics of how NVDA treats symbols when navigating word by word, maybe you'd like to contribute. This PR would improve word navigation alot, still it needs to be rebased onto the current master and would ofcourse need some further testing by the community.
In my view behavior introduced by this PR could be an option in NVDA document naviogation settings called
"Speak symbols as single words when navigating word by word checkbox not checked"
The checkbox should be disabled by default so the caret moves according to the applications API settings.

This comment has been minimized.

Sign in to view

feerrenrut force-pushed the speakSymsMoveByWord branch from 208a268 to 16da2f0 Compare August 9, 2021 03:00

feerrenrut force-pushed the speakSymsMoveByWord branch 2 times, most recently from 3bf1423 to ac6a775 Compare August 18, 2021 08:23

feerrenrut marked this pull request as ready for review August 18, 2021 08:48

feerrenrut requested review from a team as code owners August 18, 2021 08:48

feerrenrut requested review from Qchristensen and seanbudd August 18, 2021 08:48

seanbudd requested changes Aug 19, 2021

View reviewed changes

seanbudd approved these changes Aug 23, 2021

View reviewed changes

feerrenrut added this to the 2022.1 milestone Aug 25, 2021

feerrenrut added the Addon/API Changes to NVDA's API for addons to support addon development. label Aug 25, 2021

This was referenced Aug 25, 2021

Split speech processing commands and commands for synth #12778

Open

No longer use str in speech sequences #12779

Open

Qchristensen approved these changes Aug 27, 2021

View reviewed changes

Merge branch addMoveByWordSymLevelTests

dfce762

feerrenrut added 10 commits February 4, 2022 15:27

Reintroduce code from PR #11856

a06e345

This reverts commit 619e5f7. Additionally re-enable and update tests so expectation matches behaviour.

Clarify symbol level default

38e1614

add type information

06a487c

Introduce atomic, UI, and Symbol speech commands.

442089d

Update tests to show change in behaviour

fix symbol substitution within speech UI

0f350f0

Restore initial behavior for dash containing symbols

c01d4a0

Update tests to show inital behaviour restored for right pointing arrow and t shirt symbols.

remove getSpellingSpeech for move by word

da9bcc2

move changes file updates to 2022.1

cd1934f

spelling error is part of speech UI

98d63fc

fix missing CHUNK_SEPARATOR after speech UI

d1603a7

feerrenrut force-pushed the speakSymsMoveByWord branch from 0e3f2c2 to d1603a7 Compare February 4, 2022 12:05

feerrenrut closed this Feb 4, 2022

lukaszgo1 mentioned this pull request Feb 18, 2022

When running tests locally, NVDA depends on the system language to be English #13362

Closed

feerrenrut mentioned this pull request Mar 17, 2022

Return "blank" if text empty due to processing #13483

Merged

5 tasks

seanbudd mentioned this pull request Jun 10, 2022

add pronunciation for raw Unicode braille cells #13777

Closed

5 tasks

lukaszgo1 mentioned this pull request Sep 14, 2022

Make system tests pass when run locally on non-English systems #14148

Merged

9 tasks

seanbudd mentioned this pull request Oct 24, 2022

Add speakSymsMoveByWord as a feature branch #14287

Merged

feerrenrut mentioned this pull request Oct 25, 2022

Revert "Add speakSymsMoveByWord as a feature branch" #14290

Merged

Adriani90 added the Abandoned requested reports or updates are missing since more than 1 year, author or users are not available. label Nov 24, 2024

-	"""This method prepares a list, which contains character and its description for all characters the text is
-	made up of, by checking the presence of character descriptions in characterDescriptions.dic of that locale
-	for all possible combination of consecutive characters in the text.
+	"""This method prepares a list, where each item is a tuple of characters and a list of the description for all
+	characters the text is made up of.
+	This is generated by checking the presence of character descriptions in characterDescriptions.dic of the given
+	locale for all possible combination of consecutive characters in the text.

Uh oh!

Conversation

feerrenrut commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Closed reason.

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing strategy:

Known issues with pull request:

Change log entries:

Code Review Checklist:

Uh oh!

This comment has been minimized.

feerrenrut commented Aug 9, 2021

Uh oh!

amirsol81 commented Aug 10, 2021

Uh oh!

CyrilleB79 commented Aug 11, 2021

Uh oh!

feerrenrut commented Aug 13, 2021

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seanbudd Aug 19, 2021

Choose a reason for hiding this comment

Uh oh!

feerrenrut Aug 20, 2021

Choose a reason for hiding this comment

Uh oh!

feerrenrut commented Aug 20, 2021

Uh oh!

lukaszgo1 commented Aug 20, 2021

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

feerrenrut commented Aug 23, 2021

Uh oh!

feerrenrut commented Aug 25, 2021

Uh oh!

CyrilleB79 commented Aug 25, 2021

Uh oh!

seanbudd commented Aug 26, 2021

Uh oh!

CyrilleB79 commented Aug 26, 2021

Uh oh!

Qchristensen left a comment

Choose a reason for hiding this comment

Uh oh!

feerrenrut commented Aug 27, 2021

Uh oh!

lukaszgo1 commented Sep 1, 2021

Uh oh!

feerrenrut commented Sep 9, 2021

Uh oh!

CyrilleB79 commented Feb 3, 2022

Uh oh!

lukaszgo1 commented Feb 3, 2022

Uh oh!

CyrilleB79 commented Feb 3, 2022

Uh oh!

lukaszgo1 commented Feb 3, 2022

Uh oh!

feerrenrut commented Feb 4, 2022

Uh oh!

lukaszgo1 commented Feb 4, 2022

Uh oh!

feerrenrut commented Feb 7, 2022

Uh oh!

feerrenrut commented Feb 15, 2022

Uh oh!

Adriani90 commented Nov 24, 2024

Uh oh!

feerrenrut commented Aug 4, 2021 •

edited

Loading