feat(speech): native audio output via Vertex ADC route (tulgey #247)#17
Conversation
…aw#247) The Google speech provider already emits native generateContent AUDIO (gemini-3.1-flash-tts-preview, responseModalities:['AUDIO'] + speechConfig) and already transcodes to opus-in-ogg for voice-note delivery. The only gap was auth: it knew the AI-Studio key route only and threw "Google API key missing" on a keyless Vertex deployment (tulgey #10). This adds the Vertex ADC route so native output is the primary path on the deployment. - Add a Vertex ADC synthesis route (synthesizeGoogleVertexTtsPcm) that rides resolveGoogleVertexAuthorizedUserHeaders (the same ADC bearer the Google chat/Veo paths use), POSTing to aiplatform.googleapis.com/v1/projects/{P}/locations/{global}/publishers/ google/models/{model}:generateContent. Body, PCM extraction, WAV-wrap, and opus transcode are shared verbatim with the AI-Studio route. - Route selection (resolveGoogleTtsPcm): AI-Studio key route stays primary; fall to the Vertex ADC route when no key but ADC is present; throw with neither so the speech provider-order fallback (Cloud TTS -> text) trips on a detected failure, never a silent degrade (ADR 0024 clause 2). - isConfigured is now ADC-aware so the provider is selected keyless. - Extract buildGoogleSpeechGenerateContentBody (shared by both routes). - Test: Vertex generateContent URL shape (global + regional). Implements the membrane row of tulgey#247 / ADR 0024. Existing AI-Studio tests unaffected (real keys take the unchanged route). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 28 minutes and 16 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR adds keyless Google Vertex ADC text-to-speech as a fallback alternative to the existing AI-Studio API key route. It introduces a shared request-body builder for both backends, implements Vertex-specific URL construction and ADC-authenticated synthesis with timeout and retry handling, adds a route selector function, and updates configuration detection and handler wiring to use the new routing logic. ChangesGoogle Vertex ADC TTS Support
Sequence DiagramsequenceDiagram
participant App
participant GoogleProvider
participant RouteResolver
participant AIStudioRoute
participant VertexADCRoute
participant GoogleAPIs
App->>GoogleProvider: synthesize(text, config)
GoogleProvider->>RouteResolver: resolveGoogleTtsPcm(text, config)
alt API Key Present and Not Vertex Marker
RouteResolver->>AIStudioRoute: use AI-Studio key route
AIStudioRoute->>GoogleAPIs: POST generateContent with API key
GoogleAPIs-->>AIStudioRoute: PCM audio
AIStudioRoute-->>RouteResolver: PCM bytes
else Vertex ADC Available
RouteResolver->>VertexADCRoute: use Vertex ADC route
VertexADCRoute->>GoogleAPIs: POST generateContent with ADC headers
GoogleAPIs-->>VertexADCRoute: PCM audio
VertexADCRoute-->>RouteResolver: PCM bytes
else Neither Available
RouteResolver-->>GoogleProvider: throw error for fallback
end
RouteResolver-->>GoogleProvider: PCM audio
GoogleProvider-->>App: synthesized audio
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
lint:extensions:bundled lints the whole extensions/google package, so these errors (introduced with the Veo REST fallback in #5, never linted since no later PR touched the package) block any PR that touches the extension. Surfaced by the native-audio-output change. - resolveVertexOAuthToken: brace the metadata-token if, type res.json() as { access_token?: string } (drops the unnecessary `as any`), and omit the unused catch binding. - brace the "Force rest fallback for Vertex" guard. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The new Vertex ADC route used a raw fetch(), which trips the no-raw-channel-fetch boundary guard. Route it through postJsonRequest (the same guarded helper the AI-Studio route uses) so SSRF/dispatcher policy and timeout handling apply uniformly; drop the manual AbortController. Also allowlist the pre-existing Veo metadata-server fetch (video-generation-provider.ts:44, http://metadata.google.internal — link-local, must be raw; the SSRF guard intentionally blocks it). It predates this work and was surfaced when the PR first touched the package. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Makes native audio output the primary voice-note path on the keyless-Vertex deployment, implementing the membrane row of imperfect-co/tulgey#247 / ADR 0024.
Why
The voice-note output relay is broken in prod (tulgey#233, "Media failed") and the Google speech provider had no Vertex ADC path — it only knew the AI-Studio-key route and threw
"Google API key missing"on a keyless deployment (tulgey#10).The useful discovery:
extensions/google/speech-provider.tsalready emits nativegenerateContentAUDIO (gemini-3.1-flash-tts-preview,responseModalities:['AUDIO']+speechConfig,Charonin the voice list) and already transcodes PCM → opus-in-ogg fortarget: "voice-note". The only missing piece was auth. So this is a small, additive route — not a rewrite of the synthesis path.What changed
synthesizeGoogleVertexTtsPcm) — POSTs toaiplatform.googleapis.com/v1/projects/{project}/locations/{global}/publishers/google/models/{model}:generateContentusingresolveGoogleVertexAuthorizedUserHeaders(the same ADC bearer the Google chat / Veo paths already use). Request body, PCM extraction, WAV-wrap, and the opus transcode are shared verbatim with the existing AI-Studio route via a newbuildGoogleSpeechGenerateContentBodyhelper.resolveGoogleTtsPcm) — AI-Studio key route stays primary; fall to the Vertex ADC route when no key but ADC is present; throw with neither, so the speech provider-order fallback (Cloud TTS → text) trips on a detected failure rather than a silent degrade (ADR 0024 §2).isConfiguredis now ADC-aware so the provider is selected on a keyless deployment.Verification
vitest run extensions/google/speech-provider.test.ts→ 19/19 pass (17 existing — the AI-Studio path is byte-for-byte unchanged for a real key — + 2 new asserting the VertexgenerateContentURL shape, global + regional).speech-provider.tsis type-clean. (An isolatedtsc -b extensions/googlereports 11 errors, all pre-existing sdk-version-skew in sibling files —vertex-adc.ts,transport-stream.ts,realtime-voice-provider.ts, etc. — none in this diff's files; CI's fullbuild-allbuilds the sdk from source and is authoritative.)Deploy (follow-up, not in this PR)
/opt/openclawto this fork build (the Consolidate membrane VM hot-patches onto fork-main (Veo + companions) #5 / tulgey#218 consolidation pattern, survivesopenclaw update).Refs imperfect-co/tulgey#247, tulgey#10, tulgey#233.
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com
Summary by CodeRabbit
Release Notes
New Features
Tests