Fix eSpeak language switching by checking for fallback dialects by seanbudd · Pull Request #13739 · nvaccess/nvda

seanbudd · 2022-05-26T06:40:24Z

Link to issue number:

Summary of the issue:

eSpeak fails to perform automatic language switching for a common language codes "en", "fr",
Reported to eSpeak project: espeak-ng/espeak-ng#1200
A PR has been opened against eSpeak to fix this: espeak-ng/espeak-ng#1211

Description of how this pull request fixes the issue:

Creates a dictionary of aliases for language codes without locale, which map to appropriate supported languages with locale.
This may need to be added to over time as appropriate.
Integration tests have been added to ensure this list is comprehensive when we update eSpeak.

When eSpeak is unaware of a language code, the new behaviour is to:

check for a default, if not then
check for the language stripped of the dialect locale, if not then
check for any matching language (regardless of dialect, e.g 'en-au' to 'en-us'), if not then
ignore the language change

Testing strategy:

Manual testing with STR in #13727

Set NVDA language and eSpeak language to French
Open the following page in Chrome: data:text/html,Helloworld
Confirm "world" is spoken in an US English accent
Open the following page in Chrome: data:text/html,Helloworld
Confirm "world" is spoken in a German accent
Open the following page in Chrome: data:text/html,Helloworld
Confirm "world" is spoken in the French accent

Unit testing has been added to test scenarios for determining a supported language to switch to.

Integration testing has also been added to confirm the default mappings for eSpeak continue to be supported and comprehensive.

Known issues with pull request:

None

Change log entries:

Bug fixes
- Fix eSpeak automatic language switching to English and French by falling back to a similar dialect if possible. (#13727)

Code Review Checklist:

Pull Request description:
- description is up to date
- change log entries
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
API is compatible with existing add-ons.
Documentation:
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English

codeofdusk

LGTM

seanbudd · 2022-05-26T07:55:55Z

Any suggestions of important and uncontroversial default mappings are welcome on this PR, or if they don't make it in, a new PR.

lukaszgo1

Assuming this PR indeed works around eSpeak regression it would be good to mention this in the PR description, preferably with the reference to the eSpeak issue / commit which broke the previous behavior.

feerrenrut · 2022-05-27T02:59:49Z

+_defaultLangToLocaleMappings = {
+	"en": "en-gb",
+}


The PR description indicates that multiple languages are handled. Does this not affect other languages?

Other languages are handled by _anyLocaleMatchingLang.
I think this mapping should be expanding as well.

I think that we should make this an explicit mapping. IE for every unique two letter language code, have a full locale mapping here. Just to make it clear and easy to determine what the expected behavior should be. This will also protect us against changing behaviour due to order changes in available languages.

I don't think this possible as there are not necessarily clear default mappings. It's also not obvious which language codes need a mapping to an eSpeak language.

A unit test - 097e434, has been added to ensure the mappings are comprehensive

feerrenrut

As discussed earlier today, we might want to look at how this integrates with the onecore language specific code in speechManager.

feerrenrut · 2022-06-03T07:34:08Z

-					textList.append("</voice>")
-				textList.append("<voice xml:lang=\"%s\">"%(item.lang if item.lang else defaultLanguage).replace('_','-'))
-				langChanged=True
+				self._handleLangChangeCommand(item, textList, langChanged)


Since the return value isn't used and textList is only ever appended to, this could be made more functional. Return a string that is appended to the textList here.

Fixed in 5c3388b

feerrenrut · 2022-06-03T07:35:24Z

+		lang = self._determineLangFromCommand(langChangeCommand)
+		if langChanged:
+			textList.append("</voice>")
+		if lang is not None:
+			textList.append(f'<voice xml:lang="{lang}">')
+		else:
+			textList.append(f'<voice>')


In two cases the result of self._determineLangFromCommand isn't used. I think this logic would be clearer in the outer scope.

This seems to contradict with #13739 (comment).
Let me know if this isn't resolved by the fix for that review comment in 5c3388b

feerrenrut · 2022-06-03T07:38:39Z


-	def speak(self,speechSequence):
-		defaultLanguage=self._language
+	def _determineLangFromCommand(self, command: LangChangeCommand) -> Optional[str]:


Perhaps rename to normalizeLangChangeCommand? Though then consider returning / modifying the langChangeCommand in place.

This has been changed in 5c3388b

feerrenrut · 2022-06-03T07:44:21Z

+		langWithoutLocale: Optional[str] = langWithLocale.split('-')[0]
+
+		# Check for any language where the language code matches, regardless of dialect: e.g. ru-ru to ru
+		matchingLanguages = filter(lambda l: l.split('-')[0] == langWithoutLocale, self.availableLanguages)


Please don't use single character variable names, especially ones like l which can be easily confused with I or even 1.

Suggested change

matchingLanguages = filter(lambda l: l.split('-')[0] == langWithoutLocale, self.availableLanguages)

matchingLanguages = filter(

lambda lang: lang.split('-')[0] == langWithoutLocale, self.availableLanguages

)

Fixed in 5c3388b

feerrenrut · 2022-06-03T07:47:52Z

+class FakeESpeakSynthDriver:
+	_language = "default"
+	_defaultLangToLocale = {"default": "en-gb"}
+	availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"}


What does fr mean when there is fr-fr?

Is this representative of the availableLanguages that come out of espeak?

These are fictional examples used for testing. In eSpeak "fr" doesn't exist. A comparable example, supported in the version of eSpeak we use for 2022.1, would be "pt" and "pt-br".

feerrenrut · 2022-06-03T07:49:17Z

+	availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"}
+
+
+class TestSynthDriver(unittest.TestCase):


Perhaps a docstring to explain, this is a test of the logic in our internal espeak driver, rather than an integration test with the espeak-ng binaries.

This testing class contains both integration and internal testing. I will split these up and add appropriate docstrings.

edit: Done in 5c3388b

feerrenrut · 2022-06-03T07:54:54Z

+	def test_defaultMappingAvailableLanguage(self):
+		"""Confirms language codes remapped by default are supported by eSpeak via integration testing"""
+		mappedDefaultLanguages = set(self._driver._defaultLangToLocale.values())
+		unexpectedUnsupportedDefaultLanguages = mappedDefaultLanguages.difference(self._driver.availableLanguages)


Could you add the full output of self._driver.availableLanguages on this PR (in a collapsed section comment) I think it will help to add context.

Where should this be added? This is a 100 line comment.
I was thinking it should be added to synthDriver.espeak.SynthDriver.availableLanguages.

Without a pretty print:

# For eSpeak commit 7e5457f91e10 (NVDA 2022.1), availableLanguages resolves to: {'ia', 'ru', 'cy', 'ms', 'af', 'fi', 'fr-fr', 'nog', 'gu', 'hu', 'eu', 'om', 'en-029', 'de', 'es', 'kk', 'an', 'nci', 'uk', 'vi-vn-x-south', 'grc', 'it', 'vi-vn-x-central', 'bg', 'piqd', 'ug', 'ar', 'da', 'mi', 'mr', 'pt-br', 'fr-ch', 'py', 'uz', 'en-gb', 'sw', 'as', 'shn', 'vi', 'nl', 'bs', 'ga', 'pap', 'sv', 'kn', 'gn', 'th', 'tr', 'pa', 'mt', 'chr-US-Qaaa-x-west', 'eo', 'kok', 'ky', 'lfn', 'is', 'pt', 'en-gb-x-gbcwmd', 'en-gb-x-rp', 'ht', 'bpy', 'fr-be', 'nb', 'lt', 'ja', 'te', 'tn', 'es-419', 'gd', 'sjn', 'he', 'hyw', 'et', 'ro', 'ru-lv', 'sq', 'quc', 'am', 'hr', 'qya', 'ka', 'el', 'tt', 'or', 'pl', 'qu', 'ba', 'ta', 'cmn', 'io', 'en-us', 'ur', 'hi', 'en-gb-scotland', 'fa', 'kl', 'tk', 'ku', 'si', 'cv', 'ca', 'qdb', 'hak', 'fa-latn', 'lv', 'en-gb-x-gbclan', 'ltg', 'ne', 'sl', 'az', 'yue', 'sk', 'hy', 'my', 'ko', 'mk', 'smj', 'ml', 'cmn-latn-pinyin', 'id', 'la', 'sr', 'bn', 'sd', 'cs', 'jbo', 'haw'}

Co-authored-by: Bill Dengler <codeofdusk@gmail.com>

feerrenrut

Approach looks good. Just a few minor ideas.

feerrenrut · 2022-06-09T08:13:27Z

+	def __eq__(self, __o: object) -> bool:
+		if isinstance(__o, LangChangeCommand):
+			return self.lang == __o.lang
+		return super().__eq__(__o)


I'd expect this to start with the reference based check, for performance reasons.

feerrenrut · 2022-06-09T10:38:00Z

+		Confirms that eSpeak can manually be switched to all of its supported languages, with the locale removed.
+		This doesn't test automatic language switching.
+		"""
+		availableLanguagesWithoutLocale = set(lang.split("-")[0] for lang in self._driver.availableLanguages)


This lang.split("-")[0] is repeated many times. Can you move it to a helper function.

def _getLangCodeOnly(langWithOptionalLocale: str) -> str: """ Get the lang code eg "en" for "en-au" or "chr" for "chr-US-Qaaa-x-west" @param langWithOptionalLocale: may already be language only, or include locale specifier. EG "en" or "en-au" @return The language only part, before the first dash. """ return langWithOptionalLocale.split("-")[0]

I've done this in languageHandler

feerrenrut · 2022-06-09T10:39:06Z

+			_setVoiceByLanguage(langWithoutLocale)
+			self.assertEqual(
+				langWithoutLocale,
+				self._driver.voice.split("\\")[-1],  # Language code is the last item


Please add a comment with an example voice string.

seanbudd requested a review from a team as a code owner May 26, 2022 06:40

seanbudd requested a review from michaelDCurran May 26, 2022 06:40

seanbudd changed the title ~~Fix espeak language switching by checking for fallback dialects~~ Fix eSpeak language switching by checking for fallback dialects May 26, 2022

seanbudd changed the base branch from master to rc May 26, 2022 06:40

codeofdusk reviewed May 26, 2022

View reviewed changes

Comment thread source/synthDrivers/espeak.py Outdated

This comment was marked as resolved.

Sign in to view

codeofdusk previously approved these changes May 26, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

lukaszgo1 reviewed May 26, 2022

View reviewed changes

Comment thread source/synthDrivers/espeak.py Outdated

Comment thread source/synthDrivers/espeak.py Outdated

seanbudd dismissed codeofdusk’s stale review via d1f65ee May 27, 2022 00:56

This comment was marked as outdated.

Sign in to view

feerrenrut suggested changes May 27, 2022

View reviewed changes

This comment was marked as resolved.

Sign in to view

seanbudd requested a review from feerrenrut June 1, 2022 04:04

seanbudd marked this pull request as draft June 1, 2022 05:10

seanbudd marked this pull request as ready for review June 1, 2022 05:53

This comment was marked as resolved.

Sign in to view

seanbudd commented Jun 2, 2022

View reviewed changes

Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated

seanbudd marked this pull request as draft June 3, 2022 05:06

feerrenrut suggested changes Jun 3, 2022

View reviewed changes

This comment was marked as resolved.

Sign in to view

seanbudd commented Jun 7, 2022

View reviewed changes

Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated

seanbudd commented Jun 7, 2022

View reviewed changes

Comment thread source/synthDrivers/espeak.py

seanbudd added 2 commits June 8, 2022 14:35

Fix espeak language switching by checking for fallback dialects

2e2304e

Improve docs

d3c59ee

seanbudd and others added 10 commits June 8, 2022 14:35

change default mapping

18a0da9

Co-authored-by: Bill Dengler <codeofdusk@gmail.com>

add unit tests, improve logic

580a82e

add integration test

c5fc81f

normalize LangChangecommand

0d589ea

Fix building for rc

b7774a0

fix unit tests

f7fb53b

fix lint

5da2334

add list of lang codes

7d54827

add mapping for chr and fr

493be90

add manual lang switching integration test

f84c724

seanbudd force-pushed the fix-espeak-lang-switching branch from cd3765e to f84c724 Compare June 8, 2022 05:32

revert to newer appveyor os

aff0a6d

seanbudd changed the base branch from rc to master June 8, 2022 05:34

add integration test to ensure _defaultLangToLocale is comprehensive

097e434

seanbudd commented Jun 8, 2022

View reviewed changes

Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated

seanbudd commented Jun 8, 2022

View reviewed changes

Comment thread tests/unit/test_synthDrivers/test_espeak.py Outdated

crlf to lf

d57a6c9

seanbudd marked this pull request as ready for review June 9, 2022 01:20

seanbudd requested a review from feerrenrut June 9, 2022 01:31

feerrenrut approved these changes Jun 9, 2022

View reviewed changes

address review actions

ee6e577

seanbudd requested a review from feerrenrut June 10, 2022 02:19

feerrenrut approved these changes Jun 10, 2022

View reviewed changes

seanbudd added 2 commits June 10, 2022 15:01

update changes

a2e1f0e

Merge branch 'master' into fix-espeak-lang-switching

7a0da23

seanbudd mentioned this pull request Jun 10, 2022

Start 2022.3 dev cycle #13785

Merged

3 tasks

michaelDCurran merged commit 91805bd into master Jun 10, 2022

michaelDCurran deleted the fix-espeak-lang-switching branch June 10, 2022 06:50

nvaccessAuto added this to the 2022.2 milestone Jun 10, 2022

		availableLanguages = {"fr", "fr-fr", "en-gb", "ta-ta"}


		class TestSynthDriver(unittest.TestCase):

Uh oh!

Conversation

seanbudd commented May 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing strategy:

Known issues with pull request:

Change log entries:

Code Review Checklist:

Uh oh!

Uh oh!

This comment was marked as resolved.

codeofdusk left a comment

Choose a reason for hiding this comment

Uh oh!

seanbudd commented May 26, 2022

Uh oh!

This comment was marked as outdated.

lukaszgo1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

feerrenrut left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanbudd Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanbudd Jun 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

seanbudd commented May 26, 2022 •

edited

Loading

seanbudd Jun 6, 2022 •

edited

Loading

seanbudd Jun 6, 2022 •

edited

Loading