fix(speech): set speechConfig.languageCode so Spanish TTS isn't gringo-accented (tulgey #251)#18
Conversation
…o-accented (tulgey openclaw#251) Without speechConfig.languageCode, Gemini TTS infers the spoken language from the text's script: Arabic -> Persian (Farsi correct), but Latin defaults to English, so Latin-script Spanish was read with an English accent. Add an optional, omit-when-absent languageCode to the shared generateContent body and resolve it per turn by precedence: directive/talk override > operator config > Spanish auto-detection > omit. Omit-when-absent preserves Gemini's script-based auto-selection (the Arabic-script -> Persian path Farsi relies on). The auto-detector abstains unless it is confident the text is Spanish (stopword scorer + inverted punctuation; never touches non-Latin scripts and strips the audio-profile wrapper before scoring), so Farsi keeps its Persian voice and English stays English. Default detected locale es-MX; operators can pin a locale via config/`[[tts:language=...]]` or disable detection with detectLanguage:false. Refs tulgey openclaw#247 / ADR 0024. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds BCP-47 language code support to Google TTS synthesis with Spanish auto-detection. It introduces language validation, configuration overrides, and intelligent Spanish detection using stopword scoring, then wires languageCode through all synthesis routes and request builders. ChangesGoogle TTS Language Code Implementation
Google TTS Language Code Test Coverage
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
extensions/google/speech-provider.test.ts (1)
792-905: ⚡ Quick winConsider adding a test for
synthesizeTelephonywith Spanish to verify languageCode wiring.The current wiring tests comprehensively cover
synthesize()for both audio-file and voice-note targets, but there's no test verifying thatsynthesizeTelephony()also emitsspeechConfig.languageCodewhen detecting Spanish. The stack context confirms telephony synthesis was wired, but an explicit test would increase confidence that the telephony-specific code path correctly propagates languageCode.📋 Suggested telephony test
Add a test similar to the existing ones but for the telephony path:
+ it("emits languageCode for auto-detected Spanish in telephony synthesis", async () => { + const requestMock = installGoogleTtsRequestMock(); + const provider = buildGoogleSpeechProvider(); + + await provider.synthesizeTelephony?.({ + text: "Hola, hoy es un día perfecto para correr en la montaña.", + cfg: {}, + providerConfig: { apiKey: "google-test-key" }, + timeoutMs: 10_000, + }); + + expect(speechConfigFromFirstRequest(requestMock).languageCode).toBe("es-MX"); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@extensions/google/speech-provider.test.ts` around lines 792 - 905, Add a new unit test that mirrors the existing synthesize Spanish test but calls provider.synthesizeTelephony(...) instead of synthesize(...); install the same request mock with installGoogleTtsRequestMock(), build the provider via buildGoogleSpeechProvider(), call synthesizeTelephony with Spanish text and providerConfig.apiKey, then assert via speechConfigFromFirstRequest(requestMock).languageCode === "es-MX" to verify languageCode is emitted on the telephony code path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@extensions/google/speech-provider.test.ts`:
- Around line 792-905: Add a new unit test that mirrors the existing synthesize
Spanish test but calls provider.synthesizeTelephony(...) instead of
synthesize(...); install the same request mock with
installGoogleTtsRequestMock(), build the provider via
buildGoogleSpeechProvider(), call synthesizeTelephony with Spanish text and
providerConfig.apiKey, then assert via
speechConfigFromFirstRequest(requestMock).languageCode === "es-MX" to verify
languageCode is emitted on the telephony code path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5a325db8-16be-4266-93ea-a01a72fac7c3
📒 Files selected for processing (2)
extensions/google/speech-provider.test.tsextensions/google/speech-provider.ts
Symptom (prod, membrane)
After the native-audio-output deploy (#17 / tulgey openclaw#247), membrane's voice notes speak Latin-script Spanish with an English ("gringo") accent. Farsi (Arabic script) correctly gets a native Persian voice. Observed 2026-06-09.
Root cause
The native
generateContentAUDIO path set onlyvoiceNameinspeechConfig— nolanguageCode— sogemini-3.1-flash-tts-previewinfers the spoken language from the text's script: Arabic → Persian (Farsi correct), but Latin → English, so Latin-script Spanish is read with an English accent. ADR 0024 (tulgey) anticipated this language non-determinism; explicitlanguageCodeis the deterministic lever.Provider-level (not membrane-specific): any keyless-Vertex deployment doing native Google TTS for Latin-script non-English hits it. It affects both the direct-reply path and the outbox voice-note path, since both synthesize through this provider.
Verify-first
The whole approach hinges on Vertex actually honoring
speechConfig.languageCode. Probedgemini-3.1-flash-tts-previewon Vertexglobal(casita-mb ADC) with the same Spanish line three ways:languageCodees-MXen-USVertex accepts the field (rules out the "rejects it" failure mode) and
en-USmeasurably reshapes delivery.Accent quality was then confirmed by feeding clips of the same Spanish sentence (rendered through the real provider) to Gemini's native audio understanding, blind, in a side-by-side A/B with the order flipped between runs. The verdict tracked the
es-MXclip regardless of position (ruling out positional bias): the fix renders unaspirated dental/t/ /d/, monophthong vowels, and an alveolar-tapr(native Mexican), where the no-languageCodeclip has English aspiration, diphthongized vowels, and a retroflexr(gringo). A third comparison confirmedes-MX→ Mexican (laminal/s/) vses-ES→ Castilian (apical/s/). Farsi stayed native Persian; English stayed English.Fix
Add an optional, omit-when-absent
languageCodeto the sharedgenerateContentbody. Omit-when-absent preserves Gemini's script-based auto-selection (the Arabic→Persian path Farsi relies on — Farsi cannot regress). Resolve the code per turn by precedence:[[tts:language=es-MX]], normalized BCP-47messages.tts.providers.google.languageCodees-MXby default.Self-contained in the provider, so it fixes both the direct-reply and outbox voice-note paths with no membrane or outbox-processor change. Operators can pin a different locale (e.g.
es-ES) or disable detection withdetectLanguage: false.Testing
vitest run extensions/google/speech-provider.test.ts— 34 passed (21 existing + 13 new: detector Spanish/English/Farsi/loanword/empty/wrapped-transcript, BCP-47 normalization, precedence, body-wiring for auto-detect / Farsi-omit / English-omit / config-override / detect-disabled, directive parse).oxlintclean on both files.speech-provider.tstypechecks clean (the remainingextensions/googletsc errors are pre-existing fork↔upstreamplugin-sdkdrift in unrelated files, same as feat(speech): native audio output via Vertex ADC route (tulgey #247) #17).Deploy (after ear confirmation)
/opt/openclaw:git reset --hard origin/main && pnpm build && sudo systemctl restart openclaw-gateway, then a Spanish voice note to confirm the accent live.Refs tulgey openclaw#251, tulgey openclaw#247 / ADR 0024, #17.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests