Fix eSpeak language switching by checking for fallback dialects#13739
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
Any suggestions of important and uncontroversial default mappings are welcome on this PR, or if they don't make it in, a new PR. |
This comment was marked as outdated.
This comment was marked as outdated.
lukaszgo1
left a comment
There was a problem hiding this comment.
Assuming this PR indeed works around eSpeak regression it would be good to mention this in the PR description, preferably with the reference to the eSpeak issue / commit which broke the previous behavior.
This comment was marked as outdated.
This comment was marked as outdated.
| _defaultLangToLocaleMappings = { | ||
| "en": "en-gb", | ||
| } |
There was a problem hiding this comment.
The PR description indicates that multiple languages are handled. Does this not affect other languages?
There was a problem hiding this comment.
Other languages are handled by _anyLocaleMatchingLang.
I think this mapping should be expanding as well.
There was a problem hiding this comment.
I think that we should make this an explicit mapping. IE for every unique two letter language code, have a full locale mapping here. Just to make it clear and easy to determine what the expected behavior should be. This will also protect us against changing behaviour due to order changes in available languages.
There was a problem hiding this comment.
I don't think this possible as there are not necessarily clear default mappings. It's also not obvious which language codes need a mapping to an eSpeak language.
There was a problem hiding this comment.
A unit test - 097e434, has been added to ensure the mappings are comprehensive
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
feerrenrut
left a comment
There was a problem hiding this comment.
As discussed earlier today, we might want to look at how this integrates with the onecore language specific code in speechManager.
| textList.append("</voice>") | ||
| textList.append("<voice xml:lang=\"%s\">"%(item.lang if item.lang else defaultLanguage).replace('_','-')) | ||
| langChanged=True | ||
| self._handleLangChangeCommand(item, textList, langChanged) |
There was a problem hiding this comment.
Since the return value isn't used and textList is only ever appended to, this could be made more functional. Return a string that is appended to the textList here.
| lang = self._determineLangFromCommand(langChangeCommand) | ||
| if langChanged: | ||
| textList.append("</voice>") | ||
| if lang is not None: | ||
| textList.append(f'<voice xml:lang="{lang}">') | ||
| else: | ||
| textList.append(f'<voice>') |
There was a problem hiding this comment.
In two cases the result of self._determineLangFromCommand isn't used. I think this logic would be clearer in the outer scope.
There was a problem hiding this comment.
This seems to contradict with #13739 (comment).
Let me know if this isn't resolved by the fix for that review comment in 5c3388b
|
|
||
| def speak(self,speechSequence): | ||
| defaultLanguage=self._language | ||
| def _determineLangFromCommand(self, command: LangChangeCommand) -> Optional[str]: |
There was a problem hiding this comment.
Perhaps rename to normalizeLangChangeCommand? Though then consider returning / modifying the langChangeCommand in place.
| langWithoutLocale: Optional[str] = langWithLocale.split('-')[0] | ||
|
|
||
| # Check for any language where the language code matches, regardless of dialect: e.g. ru-ru to ru | ||
| matchingLanguages = filter(lambda l: l.split('-')[0] == langWithoutLocale, self.availableLanguages) |
There was a problem hiding this comment.
Please don't use single character variable names, especially ones like l which can be easily confused with I or even 1.
| matchingLanguages = filter(lambda l: l.split('-')[0] == langWithoutLocale, self.availableLanguages) | |
| matchingLanguages = filter( | |
| lambda lang: lang.split('-')[0] == langWithoutLocale, self.availableLanguages | |
| ) | |
| class FakeESpeakSynthDriver: | ||
| _language = "default" | ||
| _defaultLangToLocale = {"default": "en-gb"} | ||
| availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"} |
There was a problem hiding this comment.
- What does
frmean when there isfr-fr? - Is this representative of the
availableLanguagesthat come out of espeak?
There was a problem hiding this comment.
These are fictional examples used for testing. In eSpeak "fr" doesn't exist. A comparable example, supported in the version of eSpeak we use for 2022.1, would be "pt" and "pt-br".
| availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"} | ||
|
|
||
|
|
||
| class TestSynthDriver(unittest.TestCase): |
There was a problem hiding this comment.
Perhaps a docstring to explain, this is a test of the logic in our internal espeak driver, rather than an integration test with the espeak-ng binaries.
There was a problem hiding this comment.
This testing class contains both integration and internal testing. I will split these up and add appropriate docstrings.
edit: Done in 5c3388b
| def test_defaultMappingAvailableLanguage(self): | ||
| """Confirms language codes remapped by default are supported by eSpeak via integration testing""" | ||
| mappedDefaultLanguages = set(self._driver._defaultLangToLocale.values()) | ||
| unexpectedUnsupportedDefaultLanguages = mappedDefaultLanguages.difference(self._driver.availableLanguages) |
There was a problem hiding this comment.
Could you add the full output of self._driver.availableLanguages on this PR (in a collapsed section comment) I think it will help to add context.
There was a problem hiding this comment.
Where should this be added? This is a 100 line comment.
I was thinking it should be added to synthDriver.espeak.SynthDriver.availableLanguages.
Without a pretty print:
# For eSpeak commit 7e5457f91e10 (NVDA 2022.1), availableLanguages resolves to:
{'ia', 'ru', 'cy', 'ms', 'af', 'fi', 'fr-fr', 'nog', 'gu', 'hu', 'eu', 'om', 'en-029', 'de', 'es', 'kk', 'an', 'nci', 'uk', 'vi-vn-x-south', 'grc', 'it', 'vi-vn-x-central', 'bg', 'piqd', 'ug', 'ar', 'da', 'mi', 'mr', 'pt-br', 'fr-ch', 'py', 'uz', 'en-gb', 'sw', 'as', 'shn', 'vi', 'nl', 'bs', 'ga', 'pap', 'sv', 'kn', 'gn', 'th', 'tr', 'pa', 'mt', 'chr-US-Qaaa-x-west', 'eo', 'kok', 'ky', 'lfn', 'is', 'pt', 'en-gb-x-gbcwmd', 'en-gb-x-rp', 'ht', 'bpy', 'fr-be', 'nb', 'lt', 'ja', 'te', 'tn', 'es-419', 'gd', 'sjn', 'he', 'hyw', 'et', 'ro', 'ru-lv', 'sq', 'quc', 'am', 'hr', 'qya', 'ka', 'el', 'tt', 'or', 'pl', 'qu', 'ba', 'ta', 'cmn', 'io', 'en-us', 'ur', 'hi', 'en-gb-scotland', 'fa', 'kl', 'tk', 'ku', 'si', 'cv', 'ca', 'qdb', 'hak', 'fa-latn', 'lv', 'en-gb-x-gbclan', 'ltg', 'ne', 'sl', 'az', 'yue', 'sk', 'hy', 'my', 'ko', 'mk', 'smj', 'ml', 'cmn-latn-pinyin', 'id', 'la', 'sr', 'bn', 'sd', 'cs', 'jbo', 'haw'}
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: Bill Dengler <codeofdusk@gmail.com>
cd3765e to
f84c724
Compare
feerrenrut
left a comment
There was a problem hiding this comment.
Approach looks good. Just a few minor ideas.
| def __eq__(self, __o: object) -> bool: | ||
| if isinstance(__o, LangChangeCommand): | ||
| return self.lang == __o.lang | ||
| return super().__eq__(__o) |
There was a problem hiding this comment.
I'd expect this to start with the reference based check, for performance reasons.
| Confirms that eSpeak can manually be switched to all of its supported languages, with the locale removed. | ||
| This doesn't test automatic language switching. | ||
| """ | ||
| availableLanguagesWithoutLocale = set(lang.split("-")[0] for lang in self._driver.availableLanguages) |
There was a problem hiding this comment.
This lang.split("-")[0] is repeated many times. Can you move it to a helper function.
def _getLangCodeOnly(langWithOptionalLocale: str) -> str:
""" Get the lang code eg "en" for "en-au" or "chr" for "chr-US-Qaaa-x-west"
@param langWithOptionalLocale: may already be language only, or include locale specifier. EG "en" or "en-au"
@return The language only part, before the first dash.
"""
return langWithOptionalLocale.split("-")[0]There was a problem hiding this comment.
I've done this in languageHandler
| _setVoiceByLanguage(langWithoutLocale) | ||
| self.assertEqual( | ||
| langWithoutLocale, | ||
| self._driver.voice.split("\\")[-1], # Language code is the last item |
There was a problem hiding this comment.
Please add a comment with an example voice string.
Link to issue number:
Closes #13727
Summary of the issue:
eSpeak fails to perform automatic language switching for a common language codes "en", "fr",
Reported to eSpeak project: espeak-ng/espeak-ng#1200
A PR has been opened against eSpeak to fix this: espeak-ng/espeak-ng#1211
Description of how this pull request fixes the issue:
Creates a dictionary of aliases for language codes without locale, which map to appropriate supported languages with locale.
This may need to be added to over time as appropriate.
Integration tests have been added to ensure this list is comprehensive when we update eSpeak.
When eSpeak is unaware of a language code, the new behaviour is to:
Testing strategy:
Manual testing with STR in #13727
data:text/html,<p>Hello</p><p lang="en">world</p>data:text/html,<p>Hello</p><p lang="de-foobar">world</p>data:text/html,<p>Hello</p><p lang="foobar">world</p>Unit testing has been added to test scenarios for determining a supported language to switch to.
Integration testing has also been added to confirm the default mappings for eSpeak continue to be supported and comprehensive.
Known issues with pull request:
None
Change log entries:
Bug fixes
- Fix eSpeak automatic language switching to English and French by falling back to a similar dialect if possible. (#13727)Code Review Checklist: