OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths#8934
Conversation
|
Hi, which contract are you looking for? I can test a try build on all Windows 10 releases (10240 included). Thanks.
|
|
I'm totally in support of turning off appendSilence. This makes general navigation around Windows much much better with Onecore. However, turning off punctuation silence in some cases (such as after a comma) seems to not leave enough gap. Though this is different for each voice it seems. Though quickly testing this, I am finding it hard to get used to with Microsoft David for instance. There's a lot of: "Let's eat Grandma" rather than "Let's eat, Grandma" going on. Secondly, I am worried that some users may end up with an unusable NVDA as the rate will now suddenly be too fast. For instance, if their Windows rate was set to a slow value, and they had boosted their NVDA rate up high to compensate, now their NVDA rate will be super fast. True for an expert user the rate would not be that fast, but for a beginner user, it could be too much. And finally, there is of course the issue of the new rate code not being supported on older versions of Windows 10. I wasn't too worried about this before as people should keep their Windows up to date. However, added to the above points, I think it worth noting for future conversation on this PR. I'd really love to take the appendSilence code for 2018.4 at very least. |
Note that Windows server 2016 is stuck at a particular equivalent build of Windows 10, so server 2016 won't support the new rate code. Server 2019 probably will. |
|
In theory, because Server 2019 is based on now disappeared Version 1809 codebase, the new rate code will work in that edition. Ultimately, it depends on which UWP API contract introduced this, because then we can declare which release is compatible (or the one before that). Thanks.
|
|
The PunctuationSilence option is really weird. It seems to remove silence for commas, but doesn't shorten the silence for full stops, ellipses, etc. That's totally the inverse of what I'd ideally want: as it is, the full stops are too long, but the comma pauses aren't so bad. So, I agree on that one. I think I'd prefer to just disable it rather than having a setting, though.
Regarding rate, the only thing I could suggest is that we reset the rate to default if a rate is set. We'd use a flag to determine whether we've already done this and avoid doing it again. That's pretty ugly though. I'm also not sure when I'll be able to work on this, as it'll require a bit of fiddly testing.
|
|
@jcsteh: Would you be able to resolve the merge conflicts? |
|
I've tested this pr on Windows 10 builds 10240, 14393 and 17134.On version 1803 everything worked as expected, but on those two earlier versions the only setting which could be changed for OneCore was voice. Is it expected? I do not believe, that regressing things in this way is a good move. The versions of Win 10 which I've chosen are LTSB builds, so they would be used by enterprise customers for some time. Furthermore blindness specific market is usually slow at updating not only due to fear of something breaking, but also, because people are often using older versions of commercial screen readers and magnifiers, which wouldn'n work with the recent Windows 10 updates. Given this commend by @jcsteh it should be possible to use proper code depending of Windows 10 version in use. Ah, and one more thing, the read me should be updated to mention, that when installing Visual Studio the required version of SDK is now 17763. |
|
Hi, yes, the results you’re seeing is consistent with Microsoft’s documentation – new settings were introduced in Version 1709 (build 16299). The LTS version should not be used by consumers, but there are cases where this is happening. Thanks.
|
|
@josephsl
The consumers in some countries love the ide of lTS branch, as it’s stable, and doesn’t introduce some nasti bugs.
|
|
Also note that Windows Server 2016 is based on an older build of Windows 10 which won't have this functionality and thus would also suffer from this. |
|
Hi, might be safe to assume Windows Server 2019 will have support for it. Thanks.
|
|
@jcsteh: Is this subject to conflict with the new speech framework? In that case, I guess it is up to @michaelDCurran or @feerrenrut to decide whether we want either the framework or this to be merged first. I'm happy to take the merge conflicts and review actions if that helps you, as long as you could possibly follow the progress and review where needed. |
|
We have already taken some of this code, specifically the turning off of
the long gap between phraises.
However, after testing the turning off of the punctuation gap, this
caused ambiguities in speech. Really we want to be able to configure
separate punctuation gaps, and their lengths, separately, which is
impossible with that API currently.
finally, the move to the new rate system to allow for the faster rates
meant that running this code on older Windows 10 builds caused NVDA not
to be able to configure rate, pitch etc at all.
At very least there are still questions to be answered here.
|
|
I think turning off the punctuation gap is out of the question until
Microsoft improve this. So, it should be scrapped from this PR.
I think the remaining work here is to implement the old rate/pitch/volume
change code alongside the new code so the old code can be used on older
versions of Windows. That will be tricky and ugly, but not impossible.
Either that or just wait until there aren't any impacted builds of Windows
still officially supported by Microsoft (LTSB moves on, etc.), but I'm not
sure when those builds are totally EOL.
|
This comment has been minimized.
This comment has been minimized.
|
Is there a rough idea about how to convert old rates to the new rates? Imagine someone has his rate set to 100%, which isn't quite imaginary. If he'd update to a version of NVDA with this patch, the new rate would overwhelm that person and he might not even been able to control his machine due to the enormous increase of the speech rate. |
255a91b to
967d182
Compare
|
Ough, I wanted to be so kind to at least merge master for now, but github decided that I pushed 33 commits instead. The only thing I did was a merge of master, fixing some merge conflicts. I therefore reverted the merge of master, so everything should be back into its old state now. |
|
Hello!
I agree with the fixing of a confortable rate...
Rui Fontes
Às 05:12 de 26/04/2019, Leonard de Ruijter escreveu:
… Is there a rough idea about how to convert old rates to the new rates?
Imagine someone has his rate set to 100%, which isn't quite imaginary.
If he'd update to a version of NVDA with this patch, the new rate would
overwhelm that person and he might not even been able to control his
machine due to the enormous increase of the speech rate.
Probably the easiest fix would be enforcing a rate of 50% upon everyone
who once changed the rate manually, giving a warning in the release
notes about why this happens.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8934 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/ADZAPRTCC5CUOBZWAYMJEFDPSJ6L3ANCNFSM4GDDDNMQ>.
|
…e base pitch, volume and rate. Previously, we used SSML in every utterance to set the base value of parameters, since there was no other way. However, Windows 10 Fall Creators Update introduced new properties in the SpeechSynthesizerOptions class to set these parameters.
…don't support these new features.
…t once fetches the support from NVDAHelper, and then saves it for the lifetime of the object
967d182 to
23c432f
Compare
|
I just rebased on master and pushed four additional commits which will add support for the old and new OneCore behavior side by side, i.e. for older windows versions which don't support prosody options, the old behaviour of pitch, volume and rate controls will apply. @jcsteh: Is it actually ttrue that we can extend the rate range even fore by using both prosody options and SSML? |
|
Interestingly, I've never come across documentation saying that you can
combine the API and SSML to go beyond the API's maximum. I wonder if that
documentation is new or if I just missed it? Regardless, that really
doesn't make a huge amount of sense; min should be min and max should be
max. This entire API is a total damned mess. I wish Microsoft would sort
their stuff out. So much for a nice shiny clean new API.
So you're certain that Narrator can go faster than we can if we only use
the new API? (I actually wonder whether Narrator is still using the old
private API like they were before instead of the new public API.) If so,
then I guess merging is the only way and I'm okay with that, plus it deals
with the backwards compat problem.
On the other hand, if Narrator can't go faster than we can using new API
max only, that's more controversial, since it suggests that Microsoft
intended this to be the max and going faster with SSML is perhaps
considered a bug that might get "fixed" at any time in the future.
|
From the remarks section on https://docs.microsoft.com/en-us/uwp/api/windows.media.speechsynthesis.speechsynthesizeroptions.speakingrate#Windows_Media_SpeechSynthesis_SpeechSynthesizerOptions_SpeakingRate:
I agree with your last point.
I had another test and it seems I was wrong in that. Narrators pitch and rate range are equal to the new implementation, without ssml. So you might be correct that the higher ranges that can be accomplished with SSML are unintentional.
I like this idea and will play with it a bit. I think it is best to make rate bost a check box in this case, just as with espeak. Just to make sure, I will again abandon SSML altogether for the new implementation. |
…rate, volume and pitch as kwargs" This reverts commit 2048175.
|
I think this is ready for review. @jcsteh: May be you could have a quick look at what I changed since you worked on this? |
feerrenrut
left a comment
There was a problem hiding this comment.
Can you make sure that the description on this PR is correct please.
|
@feerrenrut: your commen tis addressed. I was actually pretty sure that I properly updated the description. Is there something you're missing in it? |
Not necessarily, I just wanted you to check to make sure it is still accurate. Though now I'm thinking about it, I would like it to be explicit about whether users will need to adjust their config after this change, and if so, in what cases. I'm also holding off on merging it because I would @michaelDCurran to have a look at it too. |
|
There is a chance that if the user had Slowed down their speech using the Windows speech settings, and then sped it up again in NVDA's speech settings, that now their speech is going to be a bit faster. However, as we know they will not have rate boost on, 100% even with the new code (without rate boost) is not so fast that they wouldn't be able to slow it down again. It may not be entirely comforttable, but it won't be completely mis-understandable for anyone I believe. Plus, I'd say it is a rather small subset of user who slowed down in windows Speech settings and then sped back up again with NvDA speech settings. If we are really worried we should refuse to load the user's previously configured rate the first time. But my feeling is we could get this as far as a beta and see if anyone at all is affected. |
|
I agree that without rate boost, the speech is still not too fast to be understood by people.
|
I've tried to address these things in the initial description. |
|
I just realised that when the rate is set to 0, rate boost makes no difference because it only changes the max rate, not the min rate. I don't consider this a problem myself, but we could change the min rate for rate boost to be equal to the max rate when rate boost is off. This is pretty trivial. |
|
Got this comment directly to me on Twitter: |
|
I can't reproduce this on Windows 10 may 2019 update. |
Link to issue number:
Fixes #7498.
Summary of the issue:
For NVDA's OneCore voices support:
Description of how this pull request fixes the issue:
This uses new SpeechSynthesizerOptions properties introduced in recent updates of Windows 10 to set these values.
Changes by @LeonarddeR since @jcsteh worked on this:
Testing performed:
Known issues with pull request:
Even though we;re fairly certain this should work on older versions of Windows 10, this hasn't been tested on actual systems with older versions of Windows 10.
@michaelDCurran wrote in OneCore voices: Use new SpeechSynthesizerOptions properties to set pitch, volume and rate lengths #8934 (comment)
Change log entry:
New features:
Bug fixes:
Changes:
You may wish to move some of these into the New Features section. I'll leave that up to you.