Skip to content

fix(speech): set speechConfig.languageCode so Spanish TTS isn't gringo-accented (tulgey #251)#18

Merged
matin merged 1 commit into
mainfrom
issue-251-tts-languagecode
Jun 9, 2026
Merged

fix(speech): set speechConfig.languageCode so Spanish TTS isn't gringo-accented (tulgey #251)#18
matin merged 1 commit into
mainfrom
issue-251-tts-languagecode

Conversation

@matin

@matin matin commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Symptom (prod, membrane)

After the native-audio-output deploy (#17 / tulgey openclaw#247), membrane's voice notes speak Latin-script Spanish with an English ("gringo") accent. Farsi (Arabic script) correctly gets a native Persian voice. Observed 2026-06-09.

Root cause

The native generateContent AUDIO path set only voiceName in speechConfigno languageCode — so gemini-3.1-flash-tts-preview infers the spoken language from the text's script: Arabic → Persian (Farsi correct), but Latin → English, so Latin-script Spanish is read with an English accent. ADR 0024 (tulgey) anticipated this language non-determinism; explicit languageCode is the deterministic lever.

Provider-level (not membrane-specific): any keyless-Vertex deployment doing native Google TTS for Latin-script non-English hits it. It affects both the direct-reply path and the outbox voice-note path, since both synthesize through this provider.

Verify-first

The whole approach hinges on Vertex actually honoring speechConfig.languageCode. Probed gemini-3.1-flash-tts-preview on Vertex global (casita-mb ADC) with the same Spanish line three ways:

variant HTTP clip length
no languageCode 200 ~8.5s
es-MX 200 ~8.4s
en-US 200 ~8.1s

Vertex accepts the field (rules out the "rejects it" failure mode) and en-US measurably reshapes delivery.

Accent quality was then confirmed by feeding clips of the same Spanish sentence (rendered through the real provider) to Gemini's native audio understanding, blind, in a side-by-side A/B with the order flipped between runs. The verdict tracked the es-MX clip regardless of position (ruling out positional bias): the fix renders unaspirated dental /t/ /d/, monophthong vowels, and an alveolar-tap r (native Mexican), where the no-languageCode clip has English aspiration, diphthongized vowels, and a retroflex r (gringo). A third comparison confirmed es-MX → Mexican (laminal /s/) vs es-ES → Castilian (apical /s/). Farsi stayed native Persian; English stayed English.

Fix

Add an optional, omit-when-absent languageCode to the shared generateContent body. Omit-when-absent preserves Gemini's script-based auto-selection (the Arabic→Persian path Farsi relies on — Farsi cannot regress). Resolve the code per turn by precedence:

  1. directive / talk override[[tts:language=es-MX]], normalized BCP-47
  2. operator configmessages.tts.providers.google.languageCode
  3. Spanish auto-detection — stopword scorer + inverted punctuation; abstains on ambiguity, never touches non-Latin scripts, strips the audio-profile wrapper before scoring so it sees the reply not the English scaffolding. Emits es-MX by default.
  4. unset → omit → current behavior

Self-contained in the provider, so it fixes both the direct-reply and outbox voice-note paths with no membrane or outbox-processor change. Operators can pin a different locale (e.g. es-ES) or disable detection with detectLanguage: false.

Testing

  • vitest run extensions/google/speech-provider.test.ts34 passed (21 existing + 13 new: detector Spanish/English/Farsi/loanword/empty/wrapped-transcript, BCP-47 normalization, precedence, body-wiring for auto-detect / Farsi-omit / English-omit / config-override / detect-disabled, directive parse).
  • oxlint clean on both files.
  • speech-provider.ts typechecks clean (the remaining extensions/google tsc errors are pre-existing fork↔upstream plugin-sdk drift in unrelated files, same as feat(speech): native audio output via Vertex ADC route (tulgey #247) #17).

Deploy (after ear confirmation)

/opt/openclaw: git reset --hard origin/main && pnpm build && sudo systemctl restart openclaw-gateway, then a Spanish voice note to confirm the accent live.

Refs tulgey openclaw#251, tulgey openclaw#247 / ADR 0024, #17.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added BCP-47 language code support for Google Text-to-Speech synthesis with validation and normalization
    • Implemented automatic Spanish language detection when no explicit language is configured
    • Extended configuration options to support language overrides via directives and provider settings
  • Tests

    • Added comprehensive test coverage for language detection logic, code normalization, and request generation

…o-accented (tulgey openclaw#251)

Without speechConfig.languageCode, Gemini TTS infers the spoken language
from the text's script: Arabic -> Persian (Farsi correct), but Latin
defaults to English, so Latin-script Spanish was read with an English
accent. Add an optional, omit-when-absent languageCode to the shared
generateContent body and resolve it per turn by precedence:
directive/talk override > operator config > Spanish auto-detection > omit.

Omit-when-absent preserves Gemini's script-based auto-selection (the
Arabic-script -> Persian path Farsi relies on). The auto-detector abstains
unless it is confident the text is Spanish (stopword scorer + inverted
punctuation; never touches non-Latin scripts and strips the audio-profile
wrapper before scoring), so Farsi keeps its Persian voice and English stays
English. Default detected locale es-MX; operators can pin a locale via
config/`[[tts:language=...]]` or disable detection with detectLanguage:false.

Refs tulgey openclaw#247 / ADR 0024.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds BCP-47 language code support to Google TTS synthesis with Spanish auto-detection. It introduces language validation, configuration overrides, and intelligent Spanish detection using stopword scoring, then wires languageCode through all synthesis routes and request builders.

Changes

Google TTS Language Code Implementation

Layer / File(s) Summary
Language code types and constants
extensions/google/speech-provider.ts
Adds DEFAULT_GOOGLE_TTS_DETECTED_SPANISH_LANGUAGE constant and extends GoogleTtsProviderConfig and GoogleTtsProviderOverrides with optional languageCode?: string and detectLanguage?: boolean fields.
Language code validation and normalization
extensions/google/speech-provider.ts
Implements normalizeGoogleTtsLanguageCode() for BCP-47 validation and casing canonicalization (e.g., es-mxes-MX), asOptionalBoolean() for parsing boolean flags, and integrates both into config normalization and runtime config reading.
Spanish language auto-detection
extensions/google/speech-provider.ts
Implements detectGoogleTtsLanguageCode() using Spanish/English stopword sets, script detection to abstain on non-Latin text, transcript extraction from wrapped prompts, and confidence scoring to return Spanish locale or undefined.
Directive parsing and precedence resolution
extensions/google/speech-provider.ts
Adds language directive parsing with policy-controlled normalization and implements resolveGoogleTtsLanguageCode() applying precedence: directive/talk override > config > auto-detection > omit (when detectLanguage: false is set).
Request body generation with language code
extensions/google/speech-provider.ts
Extends buildGoogleSpeechGenerateContentBody() signature with optional languageCode parameter and conditionally includes speechConfig.languageCode in the request when present, preserving Gemini's script-based selection when absent.
Synthesis route wiring with language code propagation
extensions/google/speech-provider.ts
Threads languageCode through all TTS synthesis routes: AI-Studio and Vertex PCM synthesis functions (once and wrapper), route selection logic via resolveGoogleTtsPcm(), and main synthesize()/synthesizeTelephony() entry points that compute final languageCode via resolveGoogleTtsLanguageCode().
Testing exports and helpers
extensions/google/speech-provider.ts
Updates testing/__testing export interface to expose Spanish-default constant and language-code resolution/normalization utilities for testing.

Google TTS Language Code Test Coverage

Layer / File(s) Summary
Language detection and normalization tests
extensions/google/speech-provider.test.ts
Validates detectGoogleTtsLanguageCode() returns es-MX for confident Spanish (including inverted punctuation), abstains for English and non-Latin scripts (Farsi), and correctly extracts transcript from wrapped audio-profile prompts. Tests normalizeGoogleTtsLanguageCode() casing normalization and malformed input rejection, plus resolveGoogleTtsLanguageCode() precedence with config, directive, and detectLanguage: false flag.
Request speechConfig.languageCode wiring tests
extensions/google/speech-provider.test.ts
Verifies synthesized requests include generationConfig.speechConfig.languageCode when auto-detecting Spanish, omit it for Farsi and English to preserve voice selection, validate explicit providerConfig.languageCode override casing, suppress via detectLanguage: false, and parse [[tts:language=...]] directives with policy-controlled normalization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • matin/openclaw#17: Introduces the shared buildGoogleSpeechGenerateContentBody() request-body builder that this PR extends with languageCode detection and conditional wiring.

Poem

🐰 A Spanish tongue now hops through Gemini's ears,
with confidence scoring that cheers—
no loanword tricks, no script confusion here,
just precedent rules made crystal clear! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the specific fix (setting speechConfig.languageCode to resolve Spanish TTS accent) and references the tracking issue (tulgey #251), directly matching the changeset's main objective.
Description check ✅ Passed The description comprehensively covers all template sections: root cause analysis, verification proof with measurable data, fix approach with precedence rules, testing results, and deployment steps. All required sections are present and substantively filled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-251-tts-languagecode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
extensions/google/speech-provider.test.ts (1)

792-905: ⚡ Quick win

Consider adding a test for synthesizeTelephony with Spanish to verify languageCode wiring.

The current wiring tests comprehensively cover synthesize() for both audio-file and voice-note targets, but there's no test verifying that synthesizeTelephony() also emits speechConfig.languageCode when detecting Spanish. The stack context confirms telephony synthesis was wired, but an explicit test would increase confidence that the telephony-specific code path correctly propagates languageCode.

📋 Suggested telephony test

Add a test similar to the existing ones but for the telephony path:

+  it("emits languageCode for auto-detected Spanish in telephony synthesis", async () => {
+    const requestMock = installGoogleTtsRequestMock();
+    const provider = buildGoogleSpeechProvider();
+
+    await provider.synthesizeTelephony?.({
+      text: "Hola, hoy es un día perfecto para correr en la montaña.",
+      cfg: {},
+      providerConfig: { apiKey: "google-test-key" },
+      timeoutMs: 10_000,
+    });
+
+    expect(speechConfigFromFirstRequest(requestMock).languageCode).toBe("es-MX");
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@extensions/google/speech-provider.test.ts` around lines 792 - 905, Add a new
unit test that mirrors the existing synthesize Spanish test but calls
provider.synthesizeTelephony(...) instead of synthesize(...); install the same
request mock with installGoogleTtsRequestMock(), build the provider via
buildGoogleSpeechProvider(), call synthesizeTelephony with Spanish text and
providerConfig.apiKey, then assert via
speechConfigFromFirstRequest(requestMock).languageCode === "es-MX" to verify
languageCode is emitted on the telephony code path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@extensions/google/speech-provider.test.ts`:
- Around line 792-905: Add a new unit test that mirrors the existing synthesize
Spanish test but calls provider.synthesizeTelephony(...) instead of
synthesize(...); install the same request mock with
installGoogleTtsRequestMock(), build the provider via
buildGoogleSpeechProvider(), call synthesizeTelephony with Spanish text and
providerConfig.apiKey, then assert via
speechConfigFromFirstRequest(requestMock).languageCode === "es-MX" to verify
languageCode is emitted on the telephony code path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a325db8-16be-4266-93ea-a01a72fac7c3

📥 Commits

Reviewing files that changed from the base of the PR and between 48a00b0 and a0a184f.

📒 Files selected for processing (2)
  • extensions/google/speech-provider.test.ts
  • extensions/google/speech-provider.ts

@matin matin merged commit 85ccf4c into main Jun 9, 2026
134 of 145 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant