Skip to content

feat(speech): native audio output via Vertex ADC route (tulgey #247)#17

Merged
matin merged 3 commits into
mainfrom
native-audio-output
Jun 9, 2026
Merged

feat(speech): native audio output via Vertex ADC route (tulgey #247)#17
matin merged 3 commits into
mainfrom
native-audio-output

Conversation

@matin

@matin matin commented Jun 9, 2026

Copy link
Copy Markdown
Owner

What

Makes native audio output the primary voice-note path on the keyless-Vertex deployment, implementing the membrane row of imperfect-co/tulgey#247 / ADR 0024.

Why

The voice-note output relay is broken in prod (tulgey#233, "Media failed") and the Google speech provider had no Vertex ADC path — it only knew the AI-Studio-key route and threw "Google API key missing" on a keyless deployment (tulgey#10).

The useful discovery: extensions/google/speech-provider.ts already emits native generateContent AUDIO (gemini-3.1-flash-tts-preview, responseModalities:['AUDIO'] + speechConfig, Charon in the voice list) and already transcodes PCM → opus-in-ogg for target: "voice-note". The only missing piece was auth. So this is a small, additive route — not a rewrite of the synthesis path.

What changed

  • Vertex ADC route (synthesizeGoogleVertexTtsPcm) — POSTs to aiplatform.googleapis.com/v1/projects/{project}/locations/{global}/publishers/google/models/{model}:generateContent using resolveGoogleVertexAuthorizedUserHeaders (the same ADC bearer the Google chat / Veo paths already use). Request body, PCM extraction, WAV-wrap, and the opus transcode are shared verbatim with the existing AI-Studio route via a new buildGoogleSpeechGenerateContentBody helper.
  • Route selection (resolveGoogleTtsPcm) — AI-Studio key route stays primary; fall to the Vertex ADC route when no key but ADC is present; throw with neither, so the speech provider-order fallback (Cloud TTS → text) trips on a detected failure rather than a silent degrade (ADR 0024 §2).
  • isConfigured is now ADC-aware so the provider is selected on a keyless deployment.

Verification

  • vitest run extensions/google/speech-provider.test.ts19/19 pass (17 existing — the AI-Studio path is byte-for-byte unchanged for a real key — + 2 new asserting the Vertex generateContent URL shape, global + regional).
  • speech-provider.ts is type-clean. (An isolated tsc -b extensions/google reports 11 errors, all pre-existing sdk-version-skew in sibling files — vertex-adc.ts, transport-stream.ts, realtime-voice-provider.ts, etc. — none in this diff's files; CI's full build-all builds the sdk from source and is authoritative.)

Deploy (follow-up, not in this PR)

  1. Cut /opt/openclaw to this fork build (the Consolidate membrane VM hot-patches onto fork-main (Veo + companions) #5 / tulgey#218 consolidation pattern, survives openclaw update).
  2. Live-verify a WhatsApp voice turn produces native opus audio.
  3. tulgey#234 (native ingestion not engaging) is the symmetric input-side fix needed for the full audio-in → audio-out loop.

Refs imperfect-co/tulgey#247, tulgey#10, tulgey#233.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Google Vertex Application Default Credentials as an alternative authentication method for text-to-speech.
    • Improved authentication handling with automatic fallback between API key and credential-based authentication methods.
  • Tests

    • Expanded test coverage for Google Vertex TTS URL routing.

…aw#247)

The Google speech provider already emits native generateContent AUDIO
(gemini-3.1-flash-tts-preview, responseModalities:['AUDIO'] + speechConfig)
and already transcodes to opus-in-ogg for voice-note delivery. The only
gap was auth: it knew the AI-Studio key route only and threw "Google API
key missing" on a keyless Vertex deployment (tulgey #10). This adds the
Vertex ADC route so native output is the primary path on the deployment.

- Add a Vertex ADC synthesis route (synthesizeGoogleVertexTtsPcm) that
  rides resolveGoogleVertexAuthorizedUserHeaders (the same ADC bearer the
  Google chat/Veo paths use), POSTing to
  aiplatform.googleapis.com/v1/projects/{P}/locations/{global}/publishers/
  google/models/{model}:generateContent. Body, PCM extraction, WAV-wrap,
  and opus transcode are shared verbatim with the AI-Studio route.
- Route selection (resolveGoogleTtsPcm): AI-Studio key route stays primary;
  fall to the Vertex ADC route when no key but ADC is present; throw with
  neither so the speech provider-order fallback (Cloud TTS -> text) trips
  on a detected failure, never a silent degrade (ADR 0024 clause 2).
- isConfigured is now ADC-aware so the provider is selected keyless.
- Extract buildGoogleSpeechGenerateContentBody (shared by both routes).
- Test: Vertex generateContent URL shape (global + regional).

Implements the membrane row of tulgey#247 / ADR 0024. Existing AI-Studio
tests unaffected (real keys take the unchanged route).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@matin, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 16 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41ba5306-7fd6-4535-a83f-9a8e7443bcc4

📥 Commits

Reviewing files that changed from the base of the PR and between 4910740 and b5a447e.

📒 Files selected for processing (3)
  • extensions/google/speech-provider.ts
  • extensions/google/video-generation-provider.ts
  • scripts/check-no-raw-channel-fetch.mjs
📝 Walkthrough

Walkthrough

The PR adds keyless Google Vertex ADC text-to-speech as a fallback alternative to the existing AI-Studio API key route. It introduces a shared request-body builder for both backends, implements Vertex-specific URL construction and ADC-authenticated synthesis with timeout and retry handling, adds a route selector function, and updates configuration detection and handler wiring to use the new routing logic.

Changes

Google Vertex ADC TTS Support

Layer / File(s) Summary
Shared generateContent request builder
extensions/google/speech-provider.ts
Introduces buildGoogleSpeechGenerateContentBody to centralize request payload construction for text composition and audio config, used by both AI-Studio and Vertex routes.
Vertex ADC synthesis with route selection
extensions/google/speech-provider.ts
Adds Vertex ADC detection imports, builds the regional/global generateContent endpoint URL, implements ADC-authenticated POST synthesis with abort/timeout and retry classification, and introduces resolveGoogleTtsPcm to select between AI-Studio-key and Vertex-ADC routes.
Handler and configuration updates
extensions/google/speech-provider.ts
Updates isConfigured to report configured when either a valid AI-Studio key (non-Vertex-marker) or detectable Vertex ADC is available. Refactors synthesize and synthesizeTelephony to call resolveGoogleTtsPcm instead of checking keys directly.
Testing exports and test suite
extensions/google/speech-provider.ts, extensions/google/speech-provider.test.ts
Extends __testing export with buildGoogleVertexTtsUrl and googleVertexTtsAdcAvailable helpers. Adds test cases verifying URL construction for global and regional locations.

Sequence Diagram

sequenceDiagram
  participant App
  participant GoogleProvider
  participant RouteResolver
  participant AIStudioRoute
  participant VertexADCRoute
  participant GoogleAPIs
  
  App->>GoogleProvider: synthesize(text, config)
  GoogleProvider->>RouteResolver: resolveGoogleTtsPcm(text, config)
  
  alt API Key Present and Not Vertex Marker
    RouteResolver->>AIStudioRoute: use AI-Studio key route
    AIStudioRoute->>GoogleAPIs: POST generateContent with API key
    GoogleAPIs-->>AIStudioRoute: PCM audio
    AIStudioRoute-->>RouteResolver: PCM bytes
  else Vertex ADC Available
    RouteResolver->>VertexADCRoute: use Vertex ADC route
    VertexADCRoute->>GoogleAPIs: POST generateContent with ADC headers
    GoogleAPIs-->>VertexADCRoute: PCM audio
    VertexADCRoute-->>RouteResolver: PCM bytes
  else Neither Available
    RouteResolver-->>GoogleProvider: throw error for fallback
  end
  
  RouteResolver-->>GoogleProvider: PCM audio
  GoogleProvider-->>App: synthesized audio
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hops through Vertex gates,
No API keys to hesitate,
ADC credentials light the way,
Two routes now serve the TTS day! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding native audio output via Vertex ADC route for voice-note synthesis on keyless deployments, with a direct reference to the GitHub issue.
Description check ✅ Passed The PR description covers most required sections including summary, context, changes, and verification, but lacks formal completion of the template sections like risk checklist and current review state.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch native-audio-output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

matin and others added 2 commits June 8, 2026 19:43
lint:extensions:bundled lints the whole extensions/google package, so
these errors (introduced with the Veo REST fallback in #5, never linted
since no later PR touched the package) block any PR that touches the
extension. Surfaced by the native-audio-output change.

- resolveVertexOAuthToken: brace the metadata-token if, type res.json()
  as { access_token?: string } (drops the unnecessary `as any`), and
  omit the unused catch binding.
- brace the "Force rest fallback for Vertex" guard.

No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The new Vertex ADC route used a raw fetch(), which trips the
no-raw-channel-fetch boundary guard. Route it through postJsonRequest
(the same guarded helper the AI-Studio route uses) so SSRF/dispatcher
policy and timeout handling apply uniformly; drop the manual
AbortController.

Also allowlist the pre-existing Veo metadata-server fetch
(video-generation-provider.ts:44, http://metadata.google.internal —
link-local, must be raw; the SSRF guard intentionally blocks it). It
predates this work and was surfaced when the PR first touched the
package.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@matin matin merged commit 48a00b0 into main Jun 9, 2026
128 of 138 checks passed
@matin matin deleted the native-audio-output branch June 9, 2026 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant