Preserve voice parameters when changing between SAPI4/SAPI5 voices with synth settings ring#17700
Conversation
|
The problem with SAPI4 voices is that there's no fixed range for the parameters, which means that different voices may have different minimum, maximum, and default values for rate, volume, pitch, etc.. This makes applying the same parameter across different voices difficult. When switching between voices, what should we preserve, the absolute value, or the relative percentage like in SAPI5? For volume, we may want to preserve the relative percentage, so that 80% volume is always 80%, no matter the minimum and maximum volume. For speed, the unit is words per minute. Should we preserve the words per minute, or preserve the relative percentage, because different voices may have different default speeds? For pitch, the unit is hertz. We may want to preserve the relative percentage, because male and female voices have different default pitches. We don't want to make male voices use the same high pitch as female voices, or vice versa. Another problem is that using relative percentage will change the meaning of the parameter values. For a male voice, a pitch of 40 might used to be too high, but now 50 means the default pitch, which may confuse users, and cause problems when upgrading. I would like to hear your thoughts about this. As SAPI4 is being deprecated, we may also choose to not fix this for SAPI4. |
|
For SAPI4, I decided to preserve percentage for volume, and preserve the difference between current value and default value for rate and pitch, without changing the meaning of the parameter value. |
|
Personally, I'm not convinced we should preserve the values at all, precisely because different voices can apply different values very differently. With a synth like eSpeak or even OneCore or Vocalizer, it's okay because the voices respond to parameter values fairly consistently; all the voices effectively use the same underlying engine by the same vendor. With SAPI4 and SAPI5, this is not true at all: two different voices could have two entirely different synthesis engines. Rate 100 with one voice could be twice as fast as some other voice. I realise users find the current behaviour surprising, but my feeling is that we're just going to replace one surprising behaviour with another and we'll get bugs like "the voice is unintelligibly fast when I switch voices with SAPI5". |
|
#17693 reports a problem that the rate value may become out of sync with the actual rate when switching to another SAPI5 voice. That's because when implementing rate boost for SAPI5, the rate value is changed to be stored inside a variable, which isn't reset when calling If we should reset all parameters when switching between SAPI4/SAPI5 voices, I will make another PR that resets the rate value properly and turns off rate boost consistently when switching voices. Thanks for your insight. I will wait to see what others think about this. |
Link to issue number:
Fixes #17693. Fixes #2320.
Summary of the issue:
When using the synth settings ring to switch to another SAPI4/SAPI5 voice, its parameters such as rate and volume will be reset to the default value.
This is because the synth driver destroys and re-creates the SAPI object when changing the voice, which resets all its parameters. Then the synth driver selects the correct voice, but other parameters are not preserved.
Changing the voice with the settings dialog works, because the settings dialog reads the property values and assigns them to the slider controls in order to display them, which triggers the event of the slider controls, which assigns the property values just read from the synth back to the synth, which refreshes the property values and therefore "fixes" the problem.
However, when using the synth settings ring, only the voice property is changed. Other properties are not refreshed, which makes the issue appear.
Description of user facing changes
The SAPI4/SAPI5 synthesizer will be able to preserve the settings when switching between voices.
Description of development approach
SAPI5:
In
_set_voice, after using_initTtsto re-create the SAPI5 object and select the voice, restore the rate and volume parameters to the value stored previously. Pitch does not need to be restored, as it is not a parameter of the SAPI5 object.SAPI4:
For volume, the percentage is preserved, so that 80% volume is always 80% of the voice's volume range.
For rate and pitch, the difference (delta) between the current value and the default value is preserved. If the rate is set to be 50 WPM faster, when switched to another voice, the rate will also be 50 WPM faster.
Testing strategy:
Tested manually.
Known issues with pull request:
None.
Code Review Checklist:
@coderabbitai summary