fix(voice-call): await STT readiness before initial greeting (#75197) by PfanP · Pull Request #75257 · openclaw/openclaw

PfanP · 2026-04-30T20:36:37Z

The Twilio media-stream startup raced TTS playback against the OpenAI realtime transcription WebSocket handshake: handleStart called onConnect (which fires manager.speakInitialMessage immediately) and then started sttSession.connect() fire-and-forget. Under event-loop contention from TTS startup the STT WS handshake timed out at 10s, leaving the call half-functional - greeting played, caller speech never reached the agent - while a direct OpenAI realtime WebSocket probe from the same host succeeded in ~1.1s.

Establish STT readiness before firing onConnect so TTS startup cannot starve the STT handshake. When the STT connect rejects, close the STT session, end the Twilio media stream with a 1011 close code, and fire onDisconnect so the voice-call manager hangs up the call on the existing grace path instead of silently leaving the caller on a deaf stream.

Fixes #75197.

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause:
Missing detection / guardrail:
Contributing context (if known):

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
Scenario the test should lock in:
Why this is the smallest reliable guardrail:
Existing test that already covers this (if any):
If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

Before:
[user action] -> [old state]

After:
[user action] -> [new state] -> [result]

Security Impact (required)

New permissions/capabilities? (Yes/No)
Secrets/tokens handling changed? (Yes/No)
New/changed network calls? (Yes/No)
Command/tool execution surface changed? (Yes/No)
Data access scope changed? (Yes/No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS:
Runtime/container:
Model/provider:
Integration/channel (if any):
Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
Edge cases checked:
What you did not verify:

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No)
Config/env changes? (Yes/No)
Migration needed? (Yes/No)
If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- Mitigation:

clawsweeper · 2026-04-30T20:40:20Z

Codex review: needs maintainer review before merge.

What this changes:

This PR updates Voice Call's Twilio media-stream startup to register accepted streams immediately, defer the initial greeting until realtime transcription is ready, close failed STT startups, and add docs, changelog, and regression tests.

Maintainer follow-up before merge:

No repair lane is needed: I found no discrete automated blocker in the current head, and the remaining action is normal maintainer review plus required CI completion.

Security review:

Security review cleared: The diff changes Voice Call startup sequencing, tests, docs, and changelog only; it does not add dependencies, CI execution, package resolution, permissions, secrets handling, or new endpoints.

Review details

Best possible solution:

Land this PR, or an equivalent narrow replacement, after normal maintainer review and required CI complete; keep the linked bug open until the fixing PR merges.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked bug includes concrete Twilio/OpenAI setup, redacted config, exact timeout logs, and persisted call-state evidence, and current main still shows the same startup order in source.

Is this the best way to solve the issue?

Yes. The PR keeps Twilio stream registration immediate for routing and disconnect grace, delays only the initial greeting until STT readiness, relies on the existing audio queue, and closes failed STT startups instead of leaving a deaf stream.

What I checked:

Current main starts greeting before STT readiness: On current main, handleStart stores the session, calls onConnect, then starts sttSession.connect() asynchronously, matching the reported race. (extensions/voice-call/src/media-stream.ts:319, 4ea0556f6428)
Current main webhook speaks from onConnect: The webhook onConnect callback registers the Twilio stream and immediately calls manager.speakInitialMessage, so TTS can start while STT is still pending. (extensions/voice-call/src/webhook.ts:378, 4ea0556f6428)
PR head delays greeting until transcription readiness: At PR head, onConnect only registers the stream, while the new onTranscriptionReady callback calls speakInitialMessage. (extensions/voice-call/src/webhook.ts:378, aaae4665632b)
PR head starts STT asynchronously but notifies readiness after connect: At PR head, accepted streams clear pending state immediately, connectTranscriptionAndNotify awaits STT connection, closes the media stream on STT failure, and calls onTranscriptionReady only after the session is still current and the socket is open. (extensions/voice-call/src/media-stream.ts:320, aaae4665632b)
Regression tests cover the race and failure path: The PR adds tests for slow STT readiness beyond the pre-start timeout, early media before readiness, and STT startup failure closing the stream with one disconnect notification. (extensions/voice-call/src/media-stream.test.ts:519, aaae4665632b)
Early audio has an existing queueing contract: The shared realtime transcription WebSocket session queues sendAudio before the socket is open and ready, then flushes queued audio when connection readiness completes. (src/realtime-transcription/websocket-session.ts:103, 4ea0556f6428)

Likely related people:

steipete: Local history and API metadata show Peter Steinberger restored and recently maintained the central Voice Call media stream, webhook lifecycle, and related PR head changes. (role: recent maintainer and likely follow-up owner; confidence: high; commits: 42c17adb5e4d, 1d8968c8a821, 9f691099dbd9; files: extensions/voice-call/src/media-stream.ts, extensions/voice-call/src/webhook.ts, docs/plugins/voice-call.md)
joshavant: Commit metadata shows Josh Avant authored a broad Voice Call spoken-output and stream TTS regression fix touching the same media-stream, webhook, and initial spoken-output area. (role: adjacent owner; confidence: medium; commits: 3f7f2c8dc96e; files: extensions/voice-call/src/media-stream.ts, extensions/voice-call/src/webhook.ts, extensions/voice-call/src/manager/outbound.ts)
eleqtrizit: Commit metadata shows Agustin Rivera recently tightened voice stream ingress guards in the media stream and webhook paths involved in this startup lifecycle. (role: recent adjacent maintainer; confidence: medium; commits: 692438cbb22e; files: extensions/voice-call/src/media-stream.ts, extensions/voice-call/src/webhook.ts)
dguido: Commit metadata shows Dan Guido worked on the Voice Call TTS queue path adjacent to the greeting and stream playback behavior affected here. (role: adjacent media-stream contributor; confidence: medium; commits: 101d0f451f23; files: extensions/voice-call/src/media-stream.ts, extensions/voice-call/src/webhook.ts, extensions/voice-call/src/providers/twilio.ts)

Remaining risk / open question:

Some broader PR checks were still in progress at review time, so merge should still wait for the required CI set to finish.
This read-only review did not rerun a live Twilio/OpenAI/Tailscale call; the reproduction confidence comes from the linked live logs, current-main source shape, and added focused tests.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 4ea0556f6428.

steipete · 2026-05-01T05:25:45Z

Landed via squash merge onto main.

Gate: targeted voice-call/STT tests, docs/changelog checks, Testbox OPENCLAW_TESTBOX=1 pnpm check:changed, and GitHub CI on the final head SHA.
Final PR head: aaae466
Squash commit: e8f9c3e

Thanks @PfanP!

@PfanP

Fix Twilio voice-call startup so accepted media streams register immediately, realtime transcription readiness gates only the initial greeting, and early inbound media is preserved while STT connects. Fixes openclaw#75197. Thanks @PfanP and @donkeykong91.

@PfanP

Fix Twilio voice-call startup so accepted media streams register immediately, realtime transcription readiness gates only the initial greeting, and early inbound media is preserved while STT connects. Fixes openclaw#75197. Thanks @PfanP and @donkeykong91.

openclaw-barnacle Bot added channel: voice-call Channel integration: voice-call size: S triage: blank-template Candidate: PR template appears mostly untouched. labels Apr 30, 2026

clawsweeper Bot mentioned this pull request Apr 30, 2026

[Bug]: voice-call OpenAI realtime transcription times out during Twilio media stream while direct WebSocket succeeds #75197

Closed

fix: stabilize Twilio STT startup (openclaw#75257) (thanks @PfanP)

b97b780

steipete force-pushed the fix/voice-call-stt-startup-readiness branch from f5f778a to b97b780 Compare May 1, 2026 05:14

openclaw-barnacle Bot added docs Improvements or additions to documentation size: M and removed size: S labels May 1, 2026

docs: credit Twilio STT PR author (openclaw#75257) (thanks @PfanP)

aaae466

steipete merged commit e8f9c3e into openclaw:main May 1, 2026
79 checks passed

clawsweeper Bot mentioned this pull request May 1, 2026

fix: One actionable issue found #75453

Merged

github-actions Bot mentioned this pull request May 1, 2026

📡 Upstream Digest — 2026-05-01 06:31 UTC curtismercier/openclaw-mods#732

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(voice-call): await STT readiness before initial greeting (#75197)#75257

fix(voice-call): await STT readiness before initial greeting (#75197)#75257
steipete merged 2 commits intoopenclaw:mainfrom
PfanP:fix/voice-call-stt-startup-readiness

PfanP commented Apr 30, 2026

Uh oh!

clawsweeper Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

steipete commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

PfanP commented Apr 30, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

steipete commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented Apr 30, 2026 •

edited

Loading