Skip to content

Fix eSpeak language switching by checking for fallback dialects#13739

Merged
michaelDCurran merged 18 commits into
masterfrom
fix-espeak-lang-switching
Jun 10, 2022
Merged

Fix eSpeak language switching by checking for fallback dialects#13739
michaelDCurran merged 18 commits into
masterfrom
fix-espeak-lang-switching

Conversation

@seanbudd

@seanbudd seanbudd commented May 26, 2022

Copy link
Copy Markdown
Member

Link to issue number:

Closes #13727

Summary of the issue:

eSpeak fails to perform automatic language switching for a common language codes "en", "fr",
Reported to eSpeak project: espeak-ng/espeak-ng#1200
A PR has been opened against eSpeak to fix this: espeak-ng/espeak-ng#1211

Description of how this pull request fixes the issue:

Creates a dictionary of aliases for language codes without locale, which map to appropriate supported languages with locale.
This may need to be added to over time as appropriate.
Integration tests have been added to ensure this list is comprehensive when we update eSpeak.

When eSpeak is unaware of a language code, the new behaviour is to:

  • check for a default, if not then
  • check for the language stripped of the dialect locale, if not then
  • check for any matching language (regardless of dialect, e.g 'en-au' to 'en-us'), if not then
  • ignore the language change

Testing strategy:

Manual testing with STR in #13727

  1. Set NVDA language and eSpeak language to French
  2. Open the following page in Chrome: data:text/html,<p>Hello</p><p lang="en">world</p>
  3. Confirm "world" is spoken in an US English accent
  4. Open the following page in Chrome: data:text/html,<p>Hello</p><p lang="de-foobar">world</p>
  5. Confirm "world" is spoken in a German accent
  6. Open the following page in Chrome: data:text/html,<p>Hello</p><p lang="foobar">world</p>
  7. Confirm "world" is spoken in the French accent

Unit testing has been added to test scenarios for determining a supported language to switch to.

Integration testing has also been added to confirm the default mappings for eSpeak continue to be supported and comprehensive.

Known issues with pull request:

None

Change log entries:

Bug fixes
- Fix eSpeak automatic language switching to English and French by falling back to a similar dialect if possible. (#13727)

Code Review Checklist:

  • Pull Request description:
    • description is up to date
    • change log entries
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • API is compatible with existing add-ons.
  • Documentation:
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English

@seanbudd seanbudd requested a review from a team as a code owner May 26, 2022 06:40
@seanbudd seanbudd requested a review from michaelDCurran May 26, 2022 06:40
@seanbudd seanbudd changed the title Fix espeak language switching by checking for fallback dialects Fix eSpeak language switching by checking for fallback dialects May 26, 2022
@seanbudd seanbudd changed the base branch from master to rc May 26, 2022 06:40
Comment thread source/synthDrivers/espeak.py Outdated
@Brian1Gaff

This comment was marked as resolved.

codeofdusk
codeofdusk previously approved these changes May 26, 2022

@codeofdusk codeofdusk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@seanbudd

Copy link
Copy Markdown
Member Author

Any suggestions of important and uncontroversial default mappings are welcome on this PR, or if they don't make it in, a new PR.

@AppVeyorBot

This comment was marked as outdated.

@lukaszgo1 lukaszgo1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this PR indeed works around eSpeak regression it would be good to mention this in the PR description, preferably with the reference to the eSpeak issue / commit which broke the previous behavior.

Comment thread source/synthDrivers/espeak.py Outdated
Comment thread source/synthDrivers/espeak.py Outdated
@AppVeyorBot

This comment was marked as outdated.

Comment thread source/synthDrivers/espeak.py Outdated
Comment thread source/synthDrivers/espeak.py Outdated
Comment on lines +30 to +32
_defaultLangToLocaleMappings = {
"en": "en-gb",
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description indicates that multiple languages are handled. Does this not affect other languages?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other languages are handled by _anyLocaleMatchingLang.
I think this mapping should be expanding as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should make this an explicit mapping. IE for every unique two letter language code, have a full locale mapping here. Just to make it clear and easy to determine what the expected behavior should be. This will also protect us against changing behaviour due to order changes in available languages.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this possible as there are not necessarily clear default mappings. It's also not obvious which language codes need a mapping to an eSpeak language.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unit test - 097e434, has been added to ensure the mappings are comprehensive

@feerrenrut

This comment was marked as resolved.

@seanbudd

This comment was marked as resolved.

@seanbudd

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

@seanbudd seanbudd requested a review from feerrenrut June 1, 2022 04:04
@seanbudd seanbudd marked this pull request as draft June 1, 2022 05:10
@seanbudd seanbudd marked this pull request as ready for review June 1, 2022 05:53
@AppVeyorBot

This comment was marked as resolved.

Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated
@seanbudd seanbudd marked this pull request as draft June 3, 2022 05:06

@feerrenrut feerrenrut left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier today, we might want to look at how this integrates with the onecore language specific code in speechManager.

Comment thread source/synthDrivers/espeak.py Outdated
textList.append("</voice>")
textList.append("<voice xml:lang=\"%s\">"%(item.lang if item.lang else defaultLanguage).replace('_','-'))
langChanged=True
self._handleLangChangeCommand(item, textList, langChanged)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the return value isn't used and textList is only ever appended to, this could be made more functional. Return a string that is appended to the textList here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5c3388b

Comment thread source/synthDrivers/espeak.py Outdated
Comment on lines +153 to +159
lang = self._determineLangFromCommand(langChangeCommand)
if langChanged:
textList.append("</voice>")
if lang is not None:
textList.append(f'<voice xml:lang="{lang}">')
else:
textList.append(f'<voice>')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In two cases the result of self._determineLangFromCommand isn't used. I think this logic would be clearer in the outer scope.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to contradict with #13739 (comment).
Let me know if this isn't resolved by the fix for that review comment in 5c3388b

Comment thread source/synthDrivers/espeak.py Outdated

def speak(self,speechSequence):
defaultLanguage=self._language
def _determineLangFromCommand(self, command: LangChangeCommand) -> Optional[str]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename to normalizeLangChangeCommand? Though then consider returning / modifying the langChangeCommand in place.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed in 5c3388b

Comment thread source/synthDrivers/espeak.py Outdated
langWithoutLocale: Optional[str] = langWithLocale.split('-')[0]

# Check for any language where the language code matches, regardless of dialect: e.g. ru-ru to ru
matchingLanguages = filter(lambda l: l.split('-')[0] == langWithoutLocale, self.availableLanguages)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use single character variable names, especially ones like l which can be easily confused with I or even 1.

Suggested change
matchingLanguages = filter(lambda l: l.split('-')[0] == langWithoutLocale, self.availableLanguages)
matchingLanguages = filter(
lambda lang: lang.split('-')[0] == langWithoutLocale, self.availableLanguages
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5c3388b

class FakeESpeakSynthDriver:
_language = "default"
_defaultLangToLocale = {"default": "en-gb"}
availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What does fr mean when there is fr-fr?
  • Is this representative of the availableLanguages that come out of espeak?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are fictional examples used for testing. In eSpeak "fr" doesn't exist. A comparable example, supported in the version of eSpeak we use for 2022.1, would be "pt" and "pt-br".

availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"}


class TestSynthDriver(unittest.TestCase):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a docstring to explain, this is a test of the logic in our internal espeak driver, rather than an integration test with the espeak-ng binaries.

@seanbudd seanbudd Jun 6, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testing class contains both integration and internal testing. I will split these up and add appropriate docstrings.

edit: Done in 5c3388b

def test_defaultMappingAvailableLanguage(self):
"""Confirms language codes remapped by default are supported by eSpeak via integration testing"""
mappedDefaultLanguages = set(self._driver._defaultLangToLocale.values())
unexpectedUnsupportedDefaultLanguages = mappedDefaultLanguages.difference(self._driver.availableLanguages)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the full output of self._driver.availableLanguages on this PR (in a collapsed section comment) I think it will help to add context.

@seanbudd seanbudd Jun 6, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should this be added? This is a 100 line comment.
I was thinking it should be added to synthDriver.espeak.SynthDriver.availableLanguages.

Without a pretty print:

# For eSpeak commit 7e5457f91e10 (NVDA 2022.1), availableLanguages resolves to:
{'ia', 'ru', 'cy', 'ms', 'af', 'fi', 'fr-fr', 'nog', 'gu', 'hu', 'eu', 'om', 'en-029', 'de', 'es', 'kk', 'an', 'nci', 'uk', 'vi-vn-x-south', 'grc', 'it', 'vi-vn-x-central', 'bg', 'piqd', 'ug', 'ar', 'da', 'mi', 'mr', 'pt-br', 'fr-ch', 'py', 'uz', 'en-gb', 'sw', 'as', 'shn', 'vi', 'nl', 'bs', 'ga', 'pap', 'sv', 'kn', 'gn', 'th', 'tr', 'pa', 'mt', 'chr-US-Qaaa-x-west', 'eo', 'kok', 'ky', 'lfn', 'is', 'pt', 'en-gb-x-gbcwmd', 'en-gb-x-rp', 'ht', 'bpy', 'fr-be', 'nb', 'lt', 'ja', 'te', 'tn', 'es-419', 'gd', 'sjn', 'he', 'hyw', 'et', 'ro', 'ru-lv', 'sq', 'quc', 'am', 'hr', 'qya', 'ka', 'el', 'tt', 'or', 'pl', 'qu', 'ba', 'ta', 'cmn', 'io', 'en-us', 'ur', 'hi', 'en-gb-scotland', 'fa', 'kl', 'tk', 'ku', 'si', 'cv', 'ca', 'qdb', 'hak', 'fa-latn', 'lv', 'en-gb-x-gbclan', 'ltg', 'ne', 'sl', 'az', 'yue', 'sk', 'hy', 'my', 'ko', 'mk', 'smj', 'ml', 'cmn-latn-pinyin', 'id', 'la', 'sr', 'bn', 'sd', 'cs', 'jbo', 'haw'}

@AppVeyorBot

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated
Comment thread source/synthDrivers/espeak.py
@seanbudd seanbudd force-pushed the fix-espeak-lang-switching branch from cd3765e to f84c724 Compare June 8, 2022 05:32
@seanbudd seanbudd changed the base branch from rc to master June 8, 2022 05:34
Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated
Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated
@seanbudd seanbudd marked this pull request as ready for review June 9, 2022 01:20
@seanbudd seanbudd requested a review from feerrenrut June 9, 2022 01:31

@feerrenrut feerrenrut left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. Just a few minor ideas.

Comment thread source/speech/commands.py
Comment on lines +147 to +150
def __eq__(self, __o: object) -> bool:
if isinstance(__o, LangChangeCommand):
return self.lang == __o.lang
return super().__eq__(__o)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to start with the reference based check, for performance reasons.

Confirms that eSpeak can manually be switched to all of its supported languages, with the locale removed.
This doesn't test automatic language switching.
"""
availableLanguagesWithoutLocale = set(lang.split("-")[0] for lang in self._driver.availableLanguages)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lang.split("-")[0] is repeated many times. Can you move it to a helper function.

def _getLangCodeOnly(langWithOptionalLocale: str) -> str:
	""" Get the lang code eg "en" for "en-au" or "chr" for "chr-US-Qaaa-x-west"
	@param langWithOptionalLocale: may already be language only, or include locale specifier. EG "en" or "en-au"
	@return The language only part, before the first dash. 
	"""
	return langWithOptionalLocale.split("-")[0]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this in languageHandler

_setVoiceByLanguage(langWithoutLocale)
self.assertEqual(
langWithoutLocale,
self._driver.voice.split("\\")[-1], # Language code is the last item

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment with an example voice string.

@seanbudd seanbudd requested a review from feerrenrut June 10, 2022 02:19
@seanbudd seanbudd mentioned this pull request Jun 10, 2022
3 tasks
@michaelDCurran michaelDCurran merged commit 91805bd into master Jun 10, 2022
@michaelDCurran michaelDCurran deleted the fix-espeak-lang-switching branch June 10, 2022 06:50
@nvaccessAuto nvaccessAuto added this to the 2022.2 milestone Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatic language switching fails with eSpeak

8 participants