fix(speech): set speechConfig.languageCode so Spanish TTS isn't gringo-accented (tulgey #251) by matin · Pull Request #18 · matin/openclaw

matin · 2026-06-09T20:45:15Z

Symptom (prod, membrane)

After the native-audio-output deploy (#17 / tulgey openclaw#247), membrane's voice notes speak Latin-script Spanish with an English ("gringo") accent. Farsi (Arabic script) correctly gets a native Persian voice. Observed 2026-06-09.

Root cause

The native generateContent AUDIO path set only voiceName in speechConfig — no languageCode — so gemini-3.1-flash-tts-preview infers the spoken language from the text's script: Arabic → Persian (Farsi correct), but Latin → English, so Latin-script Spanish is read with an English accent. ADR 0024 (tulgey) anticipated this language non-determinism; explicit languageCode is the deterministic lever.

Provider-level (not membrane-specific): any keyless-Vertex deployment doing native Google TTS for Latin-script non-English hits it. It affects both the direct-reply path and the outbox voice-note path, since both synthesize through this provider.

Verify-first

The whole approach hinges on Vertex actually honoring speechConfig.languageCode. Probed gemini-3.1-flash-tts-preview on Vertex global (casita-mb ADC) with the same Spanish line three ways:

variant	HTTP	clip length
no `languageCode`	200	~8.5s
`es-MX`	200	~8.4s
`en-US`	200	~8.1s

Vertex accepts the field (rules out the "rejects it" failure mode) and en-US measurably reshapes delivery.

Accent quality was then confirmed by feeding clips of the same Spanish sentence (rendered through the real provider) to Gemini's native audio understanding, blind, in a side-by-side A/B with the order flipped between runs. The verdict tracked the es-MX clip regardless of position (ruling out positional bias): the fix renders unaspirated dental /t/ /d/, monophthong vowels, and an alveolar-tap r (native Mexican), where the no-languageCode clip has English aspiration, diphthongized vowels, and a retroflex r (gringo). A third comparison confirmed es-MX → Mexican (laminal /s/) vs es-ES → Castilian (apical /s/). Farsi stayed native Persian; English stayed English.

Fix

Add an optional, omit-when-absent languageCode to the shared generateContent body. Omit-when-absent preserves Gemini's script-based auto-selection (the Arabic→Persian path Farsi relies on — Farsi cannot regress). Resolve the code per turn by precedence:

directive / talk override — [[tts:language=es-MX]], normalized BCP-47
operator config — messages.tts.providers.google.languageCode
Spanish auto-detection — stopword scorer + inverted punctuation; abstains on ambiguity, never touches non-Latin scripts, strips the audio-profile wrapper before scoring so it sees the reply not the English scaffolding. Emits es-MX by default.
unset → omit → current behavior

Self-contained in the provider, so it fixes both the direct-reply and outbox voice-note paths with no membrane or outbox-processor change. Operators can pin a different locale (e.g. es-ES) or disable detection with detectLanguage: false.

Testing

vitest run extensions/google/speech-provider.test.ts — 34 passed (21 existing + 13 new: detector Spanish/English/Farsi/loanword/empty/wrapped-transcript, BCP-47 normalization, precedence, body-wiring for auto-detect / Farsi-omit / English-omit / config-override / detect-disabled, directive parse).
oxlint clean on both files.
speech-provider.ts typechecks clean (the remaining extensions/google tsc errors are pre-existing fork↔upstream plugin-sdk drift in unrelated files, same as feat(speech): native audio output via Vertex ADC route (tulgey #247) #17).

Deploy (after ear confirmation)

/opt/openclaw: git reset --hard origin/main && pnpm build && sudo systemctl restart openclaw-gateway, then a Spanish voice note to confirm the accent live.

Refs tulgey openclaw#251, tulgey openclaw#247 / ADR 0024, #17.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added BCP-47 language code support for Google Text-to-Speech synthesis with validation and normalization
- Implemented automatic Spanish language detection when no explicit language is configured
- Extended configuration options to support language overrides via directives and provider settings
Tests
- Added comprehensive test coverage for language detection logic, code normalization, and request generation

…o-accented (tulgey openclaw#251) Without speechConfig.languageCode, Gemini TTS infers the spoken language from the text's script: Arabic -> Persian (Farsi correct), but Latin defaults to English, so Latin-script Spanish was read with an English accent. Add an optional, omit-when-absent languageCode to the shared generateContent body and resolve it per turn by precedence: directive/talk override > operator config > Spanish auto-detection > omit. Omit-when-absent preserves Gemini's script-based auto-selection (the Arabic-script -> Persian path Farsi relies on). The auto-detector abstains unless it is confident the text is Spanish (stopword scorer + inverted punctuation; never touches non-Latin scripts and strips the audio-profile wrapper before scoring), so Farsi keeps its Persian voice and English stays English. Default detected locale es-MX; operators can pin a locale via config/`[[tts:language=...]]` or disable detection with detectLanguage:false. Refs tulgey openclaw#247 / ADR 0024. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-09T20:46:27Z

📝 Walkthrough

Walkthrough

This PR adds BCP-47 language code support to Google TTS synthesis with Spanish auto-detection. It introduces language validation, configuration overrides, and intelligent Spanish detection using stopword scoring, then wires languageCode through all synthesis routes and request builders.

Changes

Google TTS Language Code Implementation

Layer / File(s)	Summary
Language code types and constants `extensions/google/speech-provider.ts`	Adds `DEFAULT_GOOGLE_TTS_DETECTED_SPANISH_LANGUAGE` constant and extends `GoogleTtsProviderConfig` and `GoogleTtsProviderOverrides` with optional `languageCode?: string` and `detectLanguage?: boolean` fields.
Language code validation and normalization `extensions/google/speech-provider.ts`	Implements `normalizeGoogleTtsLanguageCode()` for BCP-47 validation and casing canonicalization (e.g., `es-mx` → `es-MX`), `asOptionalBoolean()` for parsing boolean flags, and integrates both into config normalization and runtime config reading.
Spanish language auto-detection `extensions/google/speech-provider.ts`	Implements `detectGoogleTtsLanguageCode()` using Spanish/English stopword sets, script detection to abstain on non-Latin text, transcript extraction from wrapped prompts, and confidence scoring to return Spanish locale or undefined.
Directive parsing and precedence resolution `extensions/google/speech-provider.ts`	Adds language directive parsing with policy-controlled normalization and implements `resolveGoogleTtsLanguageCode()` applying precedence: directive/talk override > config > auto-detection > omit (when `detectLanguage: false` is set).
Request body generation with language code `extensions/google/speech-provider.ts`	Extends `buildGoogleSpeechGenerateContentBody()` signature with optional `languageCode` parameter and conditionally includes `speechConfig.languageCode` in the request when present, preserving Gemini's script-based selection when absent.
Synthesis route wiring with language code propagation `extensions/google/speech-provider.ts`	Threads `languageCode` through all TTS synthesis routes: AI-Studio and Vertex PCM synthesis functions (once and wrapper), route selection logic via `resolveGoogleTtsPcm()`, and main `synthesize()`/`synthesizeTelephony()` entry points that compute final languageCode via `resolveGoogleTtsLanguageCode()`.
Testing exports and helpers `extensions/google/speech-provider.ts`	Updates `testing`/`__testing` export interface to expose Spanish-default constant and language-code resolution/normalization utilities for testing.

Google TTS Language Code Test Coverage

Layer / File(s)	Summary
Language detection and normalization tests `extensions/google/speech-provider.test.ts`	Validates `detectGoogleTtsLanguageCode()` returns `es-MX` for confident Spanish (including inverted punctuation), abstains for English and non-Latin scripts (Farsi), and correctly extracts transcript from wrapped audio-profile prompts. Tests `normalizeGoogleTtsLanguageCode()` casing normalization and malformed input rejection, plus `resolveGoogleTtsLanguageCode()` precedence with config, directive, and `detectLanguage: false` flag.
Request speechConfig.languageCode wiring tests `extensions/google/speech-provider.test.ts`	Verifies synthesized requests include `generationConfig.speechConfig.languageCode` when auto-detecting Spanish, omit it for Farsi and English to preserve voice selection, validate explicit `providerConfig.languageCode` override casing, suppress via `detectLanguage: false`, and parse `[[tts:language=...]]` directives with policy-controlled normalization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

matin/openclaw#17: Introduces the shared buildGoogleSpeechGenerateContentBody() request-body builder that this PR extends with languageCode detection and conditional wiring.

Poem

🐰 A Spanish tongue now hops through Gemini's ears,
with confidence scoring that cheers—
no loanword tricks, no script confusion here,
just precedent rules made crystal clear! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the specific fix (setting speechConfig.languageCode to resolve Spanish TTS accent) and references the tracking issue (tulgey `#251`), directly matching the changeset's main objective.
Description check	✅ Passed	The description comprehensively covers all template sections: root cause analysis, verification proof with measurable data, fix approach with precedence rules, testing results, and deployment steps. All required sections are present and substantively filled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-251-tts-languagecode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

extensions/google/speech-provider.test.ts (1)
792-905: ⚡ Quick win

Consider adding a test for synthesizeTelephony with Spanish to verify languageCode wiring.

The current wiring tests comprehensively cover synthesize() for both audio-file and voice-note targets, but there's no test verifying that synthesizeTelephony() also emits speechConfig.languageCode when detecting Spanish. The stack context confirms telephony synthesis was wired, but an explicit test would increase confidence that the telephony-specific code path correctly propagates languageCode.
📋 Suggested telephony test

Add a test similar to the existing ones but for the telephony path:
+  it("emits languageCode for auto-detected Spanish in telephony synthesis", async () => {
+    const requestMock = installGoogleTtsRequestMock();
+    const provider = buildGoogleSpeechProvider();
+
+    await provider.synthesizeTelephony?.({
+      text: "Hola, hoy es un día perfecto para correr en la montaña.",
+      cfg: {},
+      providerConfig: { apiKey: "google-test-key" },
+      timeoutMs: 10_000,
+    });
+
+    expect(speechConfigFromFirstRequest(requestMock).languageCode).toBe("es-MX");
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@extensions/google/speech-provider.test.ts` around lines 792 - 905, Add a new
unit test that mirrors the existing synthesize Spanish test but calls
provider.synthesizeTelephony(...) instead of synthesize(...); install the same
request mock with installGoogleTtsRequestMock(), build the provider via
buildGoogleSpeechProvider(), call synthesizeTelephony with Spanish text and
providerConfig.apiKey, then assert via
speechConfigFromFirstRequest(requestMock).languageCode === "es-MX" to verify
languageCode is emitted on the telephony code path.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@extensions/google/speech-provider.test.ts`:
- Around line 792-905: Add a new unit test that mirrors the existing synthesize
Spanish test but calls provider.synthesizeTelephony(...) instead of
synthesize(...); install the same request mock with
installGoogleTtsRequestMock(), build the provider via
buildGoogleSpeechProvider(), call synthesizeTelephony with Spanish text and
providerConfig.apiKey, then assert via
speechConfigFromFirstRequest(requestMock).languageCode === "es-MX" to verify
languageCode is emitted on the telephony code path.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a325db8-16be-4266-93ea-a01a72fac7c3

📥 Commits

Reviewing files that changed from the base of the PR and between 48a00b0 and a0a184f.

📒 Files selected for processing (2)

extensions/google/speech-provider.test.ts
extensions/google/speech-provider.ts

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

matin merged commit 85ccf4c into main Jun 9, 2026
134 of 145 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(speech): set speechConfig.languageCode so Spanish TTS isn't gringo-accented (tulgey #251)#18

fix(speech): set speechConfig.languageCode so Spanish TTS isn't gringo-accented (tulgey #251)#18
matin merged 1 commit into
mainfrom
issue-251-tts-languagecode

matin commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

matin commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Symptom (prod, membrane)

Root cause

Verify-first

Fix

Testing

Deploy (after ear confirmation)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

matin commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading