OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths by jcsteh · Pull Request #8934 · nvaccess/nvda

jcsteh · 2018-11-12T00:27:44Z

Link to issue number:

Fixes #7498.

Summary of the issue:

For NVDA's OneCore voices support:

The rate setting is affected by the rate setting in Windows Speech Settings.
The pitch range is very limited (compared with Narrator).

Description of how this pull request fixes the issue:

This uses new SpeechSynthesizerOptions properties introduced in recent updates of Windows 10 to set these values.

Changes by @LeonarddeR since @jcsteh worked on this:

Added rate boost to the synthesizer settings ring. I found this most helpful to test things, and imagined that users might also like this.
For older versions of Windows 10 this driver now behaves equally to how current master behaves.
Added rate boost to the OneCore driver. This is disabled by default so with this pr, speech shouldnt become understandable. The only case where the rate will differ between master and this pr will be when someone changed the rate in the Windows 10 speech settings. See the second point in the known issues section.

Testing performed:

Set the rate to 0 in Windows Speech Settings, tested with NVDA, set the rate to 100 in Windows Speech Settings, tested with NVDA. Observed that NVDA's rate was not affected.
To simulate older versions of Windows 10, built the changes with lower values for each of the Windows.Foundation.UniversalApiContract minimum version checks. Confirmed that the driver works and that volume, rate and pitch settings behave as in the old situation on unsupported versions (as expected). (I don't have systems with older versions of Windows, so I can't confirm this in a real world setting.)
By hearing, compared master with this pull request with regard to rate range, and made sure that the rate rang hasn't changed significantly.

Known issues with pull request:

Even though we;re fairly certain this should work on older versions of Windows 10, this hasn't been tested on actual systems with older versions of Windows 10.
@michaelDCurran wrote in OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths #8934 (comment)

There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected.

Change log entry:

New features:

- For the Windows OneCore voices and eSpeak NG speech synthesizers, you can now enable rate boost using the synthesizer settings ring.

Bug fixes:

- For Windows OneCore voices, the rate set in NVDA is no longer affected by the rate set in Windows 10 Speech Settings.

Changes:

- Greater rate and pitch ranges are now supported for Windows OneCore voices. To increase the rate range, you can enable the new rate boost setting.

You may wish to move some of these into the New Features section. I'll leave that up to you.

josephsl · 2018-11-12T01:05:46Z

Hi, which contract are you looking for? I can test a try build on all Windows 10 releases (10240 included). Thanks.

michaelDCurran · 2018-11-12T07:14:30Z

I'm totally in support of turning off appendSilence. This makes general navigation around Windows much much better with Onecore.

However, turning off punctuation silence in some cases (such as after a comma) seems to not leave enough gap. Though this is different for each voice it seems. Though quickly testing this, I am finding it hard to get used to with Microsoft David for instance. There's a lot of: "Let's eat Grandma" rather than "Let's eat, Grandma" going on.
Perhaps we could keep punctuation silence on, and allow the user to turn it off in NvDA settings. This would be nicer if Microsoft had let punctuationSilence actually take a duration.

Secondly, I am worried that some users may end up with an unusable NVDA as the rate will now suddenly be too fast. For instance, if their Windows rate was set to a slow value, and they had boosted their NVDA rate up high to compensate, now their NVDA rate will be super fast. True for an expert user the rate would not be that fast, but for a beginner user, it could be too much.

And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date. However, added to the above points, I think it worth noting for future conversation on this PR.

I'd really love to take the appendSilence code for 2018.4 at very least.

LeonarddeR · 2018-11-12T07:52:59Z

And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date.

Note that Windows server 2016 is stuck at a particular equivalent build of Windows 10, so server 2016 won't support the new rate code. Server 2019 probably will.

josephsl · 2018-11-12T16:30:19Z

In theory, because Server 2019 is based on now disappeared Version 1809 codebase, the new rate code will work in that edition. Ultimately, it depends on which UWP API contract introduced this, because then we can declare which release is compatible (or the one before that). Thanks.

jcsteh · 2018-11-12T20:52:23Z

The PunctuationSilence option is really weird. It seems to remove silence for commas, but doesn't shorten the silence for full stops, ellipses, etc. That's totally the inverse of what I'd ideally want: as it is, the full stops are too long, but the comma pauses aren't so bad. So, I agree on that one. I think I'd prefer to just disable it rather than having a setting, though. Regarding rate, the only thing I could suggest is that we reset the rate to default if a rate is set. We'd use a flag to determine whether we've already done this and avoid doing it again. That's pretty ugly though. I'm also not sure when I'll be able to work on this, as it'll require a bit of fiddly testing.

LeonarddeR · 2018-12-12T18:34:45Z

@jcsteh: Would you be able to resolve the merge conflicts?

lukaszgo1 · 2018-12-15T18:59:50Z

I've tested this pr on Windows 10 builds 10240, 14393 and 17134.On version 1803 everything worked as expected, but on those two earlier versions the only setting which could be changed for OneCore was voice. Is it expected? I do not believe, that regressing things in this way is a good move. The versions of Win 10 which I've chosen are LTSB builds, so they would be used by enterprise customers for some time. Furthermore blindness specific market is usually slow at updating not only due to fear of something breaking, but also, because people are often using older versions of commercial screen readers and magnifiers, which wouldn'n work with the recent Windows 10 updates. Given this commend by @jcsteh it should be possible to use proper code depending of Windows 10 version in use. Ah, and one more thing, the read me should be updated to mention, that when installing Visual Studio the required version of SDK is now 17763.

josephsl · 2018-12-15T19:04:41Z

Hi, yes, the results you’re seeing is consistent with Microsoft’s documentation – new settings were introduced in Version 1709 (build 16299). The LTS version should not be used by consumers, but there are cases where this is happening. Thanks.

zstanecic · 2018-12-15T19:07:58Z

@josephsl The consumers in some countries love the ide of lTS branch, as it’s stable, and doesn’t introduce some nasti bugs.

LeonarddeR · 2019-01-05T15:10:11Z

Also note that Windows Server 2016 is based on an older build of Windows 10 which won't have this functionality and thus would also suffer from this.

josephsl · 2019-01-05T16:06:55Z

Hi, might be safe to assume Windows Server 2019 will have support for it. Thanks.

LeonarddeR · 2019-04-25T15:22:06Z

@jcsteh: Is this subject to conflict with the new speech framework? In that case, I guess it is up to @michaelDCurran or @feerrenrut to decide whether we want either the framework or this to be merged first.

I'm happy to take the merge conflicts and review actions if that helps you, as long as you could possibly follow the progress and review where needed.

michaelDCurran · 2019-04-25T21:38:28Z

We have already taken some of this code, specifically the turning off of the long gap between phraises. However, after testing the turning off of the punctuation gap, this caused ambiguities in speech. Really we want to be able to configure separate punctuation gaps, and their lengths, separately, which is impossible with that API currently. finally, the move to the new rate system to allow for the faster rates meant that running this code on older Windows 10 builds caused NVDA not to be able to configure rate, pitch etc at all. At very least there are still questions to be answered here.

jcsteh · 2019-04-25T21:47:48Z

I think turning off the punctuation gap is out of the question until Microsoft improve this. So, it should be scrapped from this PR. I think the remaining work here is to implement the old rate/pitch/volume change code alongside the new code so the old code can be used on older versions of Windows. That will be tricky and ugly, but not impossible. Either that or just wait until there aren't any impacted builds of Windows still officially supported by Microsoft (LTSB moves on, etc.), but I'm not sure when those builds are totally EOL.

LeonarddeR · 2019-04-26T04:12:40Z

Is there a rough idea about how to convert old rates to the new rates? Imagine someone has his rate set to 100%, which isn't quite imaginary. If he'd update to a version of NVDA with this patch, the new rate would overwhelm that person and he might not even been able to control his machine due to the enormous increase of the speech rate.
Probably the easiest fix would be enforcing a rate of 50% upon everyone who once changed the rate manually, giving a warning in the release notes about why this happens.

LeonarddeR · 2019-04-26T04:15:47Z

Ough, I wanted to be so kind to at least merge master for now, but github decided that I pushed 33 commits instead. The only thing I did was a merge of master, fixing some merge conflicts. I therefore reverted the merge of master, so everything should be back into its old state now.

ruifontes · 2019-04-26T10:38:00Z

Hello! I agree with the fixing of a confortable rate... Rui Fontes Às 05:12 de 26/04/2019, Leonard de Ruijter escreveu:

…

Is there a rough idea about how to convert old rates to the new rates? Imagine someone has his rate set to 100%, which isn't quite imaginary. If he'd update to a version of NVDA with this patch, the new rate would overwhelm that person and he might not even been able to control his machine due to the enormous increase of the speech rate. Probably the easiest fix would be enforcing a rate of 50% upon everyone who once changed the rate manually, giving a warning in the release notes about why this happens. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8934 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADZAPRTCC5CUOBZWAYMJEFDPSJ6L3ANCNFSM4GDDDNMQ>.

…e base pitch, volume and rate. Previously, we used SSML in every utterance to set the base value of parameters, since there was no other way. However, Windows 10 Fall Creators Update introduced new properties in the SpeechSynthesizerOptions class to set these parameters.

…don't support these new features.

…t once fetches the support from NVDAHelper, and then saves it for the lifetime of the object

LeonarddeR · 2019-04-26T17:58:04Z

I just rebased on master and pushed four additional commits which will add support for the old and new OneCore behavior side by side, i.e. for older windows versions which don't support prosody options, the old behaviour of pitch, volume and rate controls will apply.
There is still a major difference between the two implementations. I guess there's not much we could do about this.

@jcsteh: Is it actually ttrue that we can extend the rate range even fore by using both prosody options and SSML?

jcsteh · 2019-04-30T06:00:04Z

Interestingly, I've never come across documentation saying that you can combine the API and SSML to go beyond the API's maximum. I wonder if that documentation is new or if I just missed it? Regardless, that really doesn't make a huge amount of sense; min should be min and max should be max. This entire API is a total damned mess. I wish Microsoft would sort their stuff out. So much for a nice shiny clean new API. So you're certain that Narrator can go faster than we can if we only use the new API? (I actually wonder whether Narrator is still using the old private API like they were before instead of the new public API.) If so, then I guess merging is the only way and I'm okay with that, plus it deals with the backwards compat problem. On the other hand, if Narrator can't go faster than we can using new API max only, that's more controversial, since it suggests that Microsoft intended this to be the max and going faster with SSML is perhaps considered a bug that might get "fixed" at any time in the future.

LeonarddeR · 2019-04-30T06:32:47Z

Interestingly, I've never come across documentation saying that you can combine the API and SSML to go beyond the API's maximum. I wonder if that documentation is new or if I just missed it? Regardless, that really doesn't make a huge amount of sense; min should be min and max should be max.

From the remarks section on https://docs.microsoft.com/en-us/uwp/api/windows.media.speechsynthesis.speechsynthesizeroptions.speakingrate#Windows_Media_SpeechSynthesis_SpeechSynthesizerOptions_SpeakingRate:

If Speech Synthesis Markup Language (SSML) is used, SpeakingRate is combined with any prosody tags in the markup.

I agree with your last point.

So you're certain that Narrator can go faster than we can if we only use the new API?

I had another test and it seems I was wrong in that. Narrators pitch and rate range are equal to the new implementation, without ssml. So you might be correct that the higher ranges that can be accomplished with SSML are unintentional.

I guess we could use the new API to specify the default value (1.0?) as a base, then use higher values for the boost setting.

I like this idea and will play with it a bit. I think it is best to make rate bost a check box in this case, just as with espeak. Just to make sure, I will again abandon SSML altogether for the new implementation.

…rate, volume and pitch as kwargs" This reverts commit 2048175.

LeonarddeR · 2019-05-02T17:49:10Z

I think this is ready for review. @jcsteh: May be you could have a quick look at what I changed since you worked on this?

feerrenrut

Can you make sure that the description on this PR is correct please.

LeonarddeR · 2019-05-09T15:07:09Z

@feerrenrut: your commen tis addressed. I was actually pretty sure that I properly updated the description. Is there something you're missing in it?

feerrenrut · 2019-05-09T17:07:33Z

Is there something you're missing in it?

Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases.

I'm also holding off on merging it because I would @michaelDCurran to have a look at it too.

michaelDCurran · 2019-05-10T01:38:02Z

There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected.

LeonarddeR · 2019-05-10T03:50:28Z

I agree that without rate boost, the speech is still not too fast to be understood by people.

LeonarddeR · 2019-05-10T04:59:15Z

Is there something you're missing in it?

Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases.

I've tried to address these things in the initial description.

feerrenrut

Thanks @LeonarddeR

LeonarddeR · 2019-05-10T09:11:17Z

I just realised that when the rate is set to 0, rate boost makes no difference because it only changes the max rate, not the min rate. I don't consider this a problem myself, but we could change the min rate for rate boost to be equal to the max rate when rate boost is off. This is pretty trivial.

michaelDCurran · 2019-05-11T02:23:48Z

Got this comment directly to me on Twitter:
"newest changes for Windows one core voices implemented in the NVDA alfa releases do not stay saved as soon as you switch to another synth, and go back to the One core voices, the synth loads but with its default values for pitch and rate."
I have not confirmed this yet.

LeonarddeR · 2019-05-11T09:31:31Z

I can't reproduce this on Windows 10 may 2019 update.

jcsteh requested a review from michaelDCurran November 12, 2018 00:27

michaelDCurran mentioned this pull request Nov 27, 2018

OneCore speech: do not append silence at the end of every speech utterance #8985

Merged

This comment has been minimized.

Sign in to view

LeonarddeR force-pushed the i7498OcSpeechOptions branch from 255a91b to 967d182 Compare April 26, 2019 04:15

jcsteh and others added 6 commits April 26, 2019 17:01

OneCore voices: Gracefully handle older versions of Windows 10 which …

23c432f

…don't support these new features.

Create a supportsProsodyOptions property on the synthesizer class tha…

b4930fe

…t once fetches the support from NVDAHelper, and then saves it for the lifetime of the object

Reverted changes to supportedSettings on the synth class

df647dc

Add _PreAPI5OcSsmlConverter class

372079a

Restore ability to use SSML to set prosody

88ec65e

LeonarddeR force-pushed the i7498OcSpeechOptions branch from 967d182 to 23c432f Compare April 26, 2019 17:54

Use private properties for rate, volume and pitch when building SSML

972695c

Leonard de Ruijter added 4 commits April 30, 2019 19:38

Revert "Revert back to only one SSML converter that allows providing …

7b50d13

…rate, volume and pitch as kwargs" This reverts commit 2048175.

Move rate boost to synthDriverHandler, include it in the settings ring

bb64024

Add rate boost to one core

44bc8b3

Make sure initial values are set when using ssml

01de6e9

Adriani90 mentioned this pull request May 3, 2019

When changing between SAPI 5 voices the selected voice rate is lost and reset to 50. #2320

Closed

feerrenrut approved these changes May 9, 2019

View reviewed changes

Comment thread source/synthDriverHandler.py Outdated

Leonard de Ruijter added 2 commits May 9, 2019 17:05

Merge remote-tracking branch 'origin/master' into i7498OcSpeechOptions

6b851ba

Review action

8cd7cf3

feerrenrut approved these changes May 9, 2019

View reviewed changes

Merge branch 'master' into HEAD

9590cd4

michaelDCurran approved these changes May 10, 2019

View reviewed changes

michaelDCurran changed the title ~~OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume, rate and silence lengths~~ OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths May 10, 2019

Update what's new.

8ebbbf5

Updated copyright header for changed files

48a2663

feerrenrut approved these changes May 10, 2019

View reviewed changes

feerrenrut merged commit 9ebb356 into nvaccess:master May 10, 2019

nvaccessAuto added this to the 2019.2 milestone May 10, 2019

LeonarddeR mentioned this pull request May 11, 2019

Fix OneCore rateboost always initialized at 50% rate and SSML converter issues #9560

Merged

jcsteh deleted the i7498OcSpeechOptions branch May 25, 2026 04:01

Uh oh!

Conversation

jcsteh commented Nov 12, 2018 • edited by LeonarddeR Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Changes by @LeonarddeR since @jcsteh worked on this:

Testing performed:

Known issues with pull request:

Change log entry:

Uh oh!

josephsl commented Nov 12, 2018 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelDCurran commented Nov 12, 2018

Uh oh!

LeonarddeR commented Nov 12, 2018

Uh oh!

josephsl commented Nov 12, 2018 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcsteh commented Nov 12, 2018 via email

Uh oh!

LeonarddeR commented Dec 12, 2018

Uh oh!

lukaszgo1 commented Dec 15, 2018

Uh oh!

josephsl commented Dec 15, 2018 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zstanecic commented Dec 15, 2018 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LeonarddeR commented Jan 5, 2019

Uh oh!

josephsl commented Jan 5, 2019 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LeonarddeR commented Apr 25, 2019

Uh oh!

michaelDCurran commented Apr 25, 2019 via email

Uh oh!

jcsteh commented Apr 25, 2019 via email

Uh oh!

This comment has been minimized.

LeonarddeR commented Apr 26, 2019

Uh oh!

LeonarddeR commented Apr 26, 2019

Uh oh!

ruifontes commented Apr 26, 2019 via email

Uh oh!

LeonarddeR commented Apr 26, 2019

Uh oh!

jcsteh commented Apr 30, 2019 via email

Uh oh!

LeonarddeR commented Apr 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LeonarddeR commented May 2, 2019

Uh oh!

feerrenrut left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LeonarddeR commented May 9, 2019

Uh oh!

feerrenrut commented May 9, 2019

Uh oh!

michaelDCurran commented May 10, 2019

Uh oh!

LeonarddeR commented May 10, 2019 via email

Uh oh!

LeonarddeR commented May 10, 2019

Uh oh!

feerrenrut left a comment

Choose a reason for hiding this comment

Uh oh!

LeonarddeR commented May 10, 2019

Uh oh!

jcsteh commented Nov 12, 2018 •

edited by LeonarddeR

Loading

josephsl commented Nov 12, 2018 via email •

edited by feerrenrut

Loading

josephsl commented Nov 12, 2018 via email •

edited by feerrenrut

Loading

josephsl commented Dec 15, 2018 via email •

edited by feerrenrut

Loading

zstanecic commented Dec 15, 2018 via email •

edited by feerrenrut

Loading

josephsl commented Jan 5, 2019 via email •

edited by feerrenrut

Loading

LeonarddeR commented Apr 30, 2019 •

edited

Loading