Skip to content

Improve Gemini realtime voice parity for Twilio Meet joins#77064

Merged
steipete merged 6 commits intoopenclaw:mainfrom
scoootscooob:codex/gmeet-gemini-paradigm-parity
May 4, 2026
Merged

Improve Gemini realtime voice parity for Twilio Meet joins#77064
steipete merged 6 commits intoopenclaw:mainfrom
scoootscooob:codex/gmeet-gemini-paradigm-parity

Conversation

@scoootscooob
Copy link
Copy Markdown
Contributor

What bug / behavior this fixes

Google Meet joins over Twilio were noticeably laggier than the Paradigm-style Gemini Live path. The main gaps were outbound model audio being dumped into Twilio faster than telephony playback, barge-in waiting on provider interruption instead of clearing local queued audio immediately, and Gemini Live sessions not using the newer resumption/compression controls by default.

What changed

  • Pace outbound Twilio realtime audio as 20 ms G.711 frames and send marks only after queued audio has been flushed.
  • Add a lightweight inbound mu-law speech-start detector so caller barge-in clears the local Twilio playback queue immediately before forwarding audio to the realtime provider.
  • Default Google Gemini Live calls to faster silence endpointing, session resumption, and sliding-window context compression, with opt-out config covered by tests.
  • Preserve voicecall.start conversation mode for GMeet dial-in flows.
  • Update Google Meet and Voice Call docs with the plugin allow/config shape and Gemini realtime tuning knobs.

Evidence

  • pnpm test extensions/google/realtime-voice-provider.test.ts
  • pnpm test extensions/voice-call/src/webhook/realtime-audio-pacer.test.ts extensions/voice-call/src/webhook/realtime-handler.test.ts extensions/voice-call/index.test.ts
  • pnpm check:changed reached extension typecheck and extension test typecheck successfully, then failed in lint:extensions on an unrelated existing lint finding in extensions/qa-lab/src/mantis/slack-desktop-smoke.runtime.test.ts:105; that file is not in this branch diff versus origin/main.

Notes

  • I did not run a live Twilio/Google Meet smoke because no live dial-in target/PIN was provided in this thread.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: voice-call Channel integration: voice-call plugin: google-meet size: M maintainer Maintainer-authored PR labels May 4, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 4, 2026

Codex review: needs changes before merge.

Summary
The PR paces Twilio realtime outbound audio, clears queued playback on local mu-law speech starts, adds Gemini Live resumption/compression defaults with opt-outs, scopes voice-call gateway methods, and updates Google Meet and Voice Call docs/tests.

Reproducibility: no. high-confidence live reproduction was established. Source inspection does show current main directly sends Twilio realtime media/clear/mark messages and lacks the Gemini resumption/compression defaults that the PR addresses.

Next step before merge
The repairable blocker is a narrow missing changelog entry; runtime validation and maintainer approval should remain on the normal PR review path.

Security
Cleared: The diff touches plugin runtime code, docs, and tests without adding dependencies, workflows, lockfile changes, package scripts, or broader secret-handling paths.

Review findings

  • [P3] Add the required changelog entry — docs/plugins/voice-call.md:250-253
Review details

Best possible solution:

Add a focused changelog entry, keep the runtime changes within the voice-call and Google provider plugins, then land after maintainer review and preferably a live Twilio Google Meet smoke.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction was established. Source inspection does show current main directly sends Twilio realtime media/clear/mark messages and lacks the Gemini resumption/compression defaults that the PR addresses.

Is this the best way to solve the issue?

Yes, the proposed direction is the narrow maintainable split: Twilio transport pacing stays in the voice-call plugin, and Gemini Live controls stay in the Google provider. It is not merge-ready until the changelog entry is added and maintainer/live validation accepts the behavior.

Full review comments:

  • [P3] Add the required changelog entry — docs/plugins/voice-call.md:250-253
    This PR changes user-facing Google Meet/Voice Call realtime behavior and docs, but the branch does not add a CHANGELOG.md entry under the active release section. Repo policy requires one for user-facing fix/perf work before merge.
    Confidence: 0.94

Overall correctness: patch is correct
Overall confidence: 0.82

Acceptance criteria:

  • git diff --check

What I checked:

  • Current main lacks Twilio pacing: Current main sends realtime provider audio, clear, and mark messages directly over the Twilio WebSocket, with no audio pacer or local speech-start detector in the realtime handler. (extensions/voice-call/src/webhook/realtime-handler.ts:281, 2949171fcc15)
  • Current main lacks Gemini resumption/compression defaults: Current main has a 700 ms Gemini Live silence threshold and no sessionResumption/contextWindowCompression provider config fields or defaults in buildGoogleLiveConnectConfig. (extensions/google/realtime-voice-provider.ts:50, 2949171fcc15)
  • PR diff remains user-facing without changelog: The fetched PR diff changes Voice Call/Google Meet docs and runtime behavior, but contains no CHANGELOG.md diff. Public docs: docs/plugins/voice-call.md. (docs/plugins/voice-call.md:250, 0e393cea4a27)
  • Changelog active section exists: CHANGELOG.md has active Unreleased Changes/Fixes sections, but no entry for this PR's Google Meet/Voice Call realtime behavior. (CHANGELOG.md:5, 2949171fcc15)
  • Google dependency contract supports the config direction: @google/genai 1.51.0 types expose LiveConnectConfig.sessionResumption, contextWindowCompression, and LiveServerMessage.sessionResumptionUpdate. (pnpm-lock.yaml:2387, 2949171fcc15)
  • Twilio Media Streams contract supports buffering, marks, and clear: Twilio documents outbound media as audio/x-mulaw at 8000 Hz, buffered in order, with mark messages for playback completion and clear messages to empty buffered audio.

Likely related people:

  • steipete: Local history shows recent Google Meet realtime/provider behavior commits by Peter Steinberger, including the provider split, audio-buffer clamp, realtime alias handling, and default talk-back mode touched by this PR's Google Meet path. (role: recent maintainer and feature-history owner; confidence: high; commits: 11c600cf1993, c956946b263d, 30b201eff0ff; files: extensions/google-meet/src/voice-call-gateway.ts, docs/plugins/google-meet.md)
  • VACInc: The prior ClawSweeper review context identifies recent merged work on Google Live consult responses touching the Google realtime provider and voice-call realtime handler tests near this PR's bridge behavior. (role: adjacent Google Live and voice-call realtime contributor; confidence: medium; commits: 614a2846a257; files: extensions/google/realtime-voice-provider.ts, extensions/google/realtime-voice-provider.test.ts, extensions/voice-call/src/webhook/realtime-handler.ts)
  • vincentkoc: The current checkout's available blame/log data attributes the shallow-root versions of the voice-call and Google provider files to Vincent Koc, and recent changelog history shows repeated plugin maintenance contributions; this is useful but weaker routing evidence than the Google Meet-specific commits. (role: adjacent plugin/docs maintainer; confidence: low; commits: 90c0edcb61cd; files: extensions/voice-call/src/webhook/realtime-handler.ts, extensions/google/realtime-voice-provider.ts, docs/plugins/voice-call.md)

Remaining risk / open question:

  • No live Twilio Google Meet smoke is attached, so the latency and barge-in improvements are source-supported but not end-to-end proven.
  • The missing changelog entry is a repo-policy blocker for this user-facing fix/perf/docs change.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 2949171fcc15.

@steipete steipete force-pushed the codex/gmeet-gemini-paradigm-parity branch from 0b95bd3 to d339439 Compare May 4, 2026 04:37
@steipete steipete merged commit 8d6db59 into openclaw:main May 4, 2026
86 checks passed
steipete added a commit that referenced this pull request May 4, 2026
@steipete
Copy link
Copy Markdown
Contributor

steipete commented May 4, 2026

Landed via rebase onto main.

  • Gate: local git diff --check, pnpm docs:list, and targeted pnpm test extensions/google/realtime-voice-provider.test.ts extensions/voice-call/src/webhook/realtime-audio-pacer.test.ts extensions/voice-call/src/webhook/realtime-handler.test.ts extensions/voice-call/index.test.ts extensions/google-meet/src/voice-call-gateway.test.ts; GitHub CI passed on exact head a922a1989ae06c1847597b224c458fdcbaae9ae3.
  • Source head before landing: a922a19
  • Landed commits: 7fc9a82, 309ff6b, 0c1df35, b2f2185, 7d98e7f, 8d6db59
  • Merge commit: 8d6db59

Thanks @scoootscooob!

arieldiego73 pushed a commit to arieldiego73/openclaw that referenced this pull request May 5, 2026
arieldiego73 pushed a commit to arieldiego73/openclaw that referenced this pull request May 5, 2026
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: voice-call Channel integration: voice-call docs Improvements or additions to documentation maintainer Maintainer-authored PR plugin: google-meet size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants