fix(voice-call): await STT readiness before initial greeting (#75197)#75257
fix(voice-call): await STT readiness before initial greeting (#75197)#75257steipete merged 2 commits intoopenclaw:mainfrom
Conversation
|
Codex review: needs maintainer review before merge. What this changes: This PR updates Voice Call's Twilio media-stream startup to register accepted streams immediately, defer the initial greeting until realtime transcription is ready, close failed STT startups, and add docs, changelog, and regression tests. Maintainer follow-up before merge: No repair lane is needed: I found no discrete automated blocker in the current head, and the remaining action is normal maintainer review plus required CI completion. Security review: Security review cleared: The diff changes Voice Call startup sequencing, tests, docs, and changelog only; it does not add dependencies, CI execution, package resolution, permissions, secrets handling, or new endpoints. Review detailsBest possible solution: Land this PR, or an equivalent narrow replacement, after normal maintainer review and required CI complete; keep the linked bug open until the fixing PR merges. Do we have a high-confidence way to reproduce the issue? Yes. The linked bug includes concrete Twilio/OpenAI setup, redacted config, exact timeout logs, and persisted call-state evidence, and current main still shows the same startup order in source. Is this the best way to solve the issue? Yes. The PR keeps Twilio stream registration immediate for routing and disconnect grace, delays only the initial greeting until STT readiness, relies on the existing audio queue, and closes failed STT startups instead of leaving a deaf stream. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 4ea0556f6428. |
f5f778a to
b97b780
Compare
Fix Twilio voice-call startup so accepted media streams register immediately, realtime transcription readiness gates only the initial greeting, and early inbound media is preserved while STT connects. Fixes openclaw#75197. Thanks @PfanP and @donkeykong91.
Fix Twilio voice-call startup so accepted media streams register immediately, realtime transcription readiness gates only the initial greeting, and early inbound media is preserved while STT connects. Fixes openclaw#75197. Thanks @PfanP and @donkeykong91.
The Twilio media-stream startup raced TTS playback against the OpenAI realtime transcription WebSocket handshake: handleStart called onConnect (which fires manager.speakInitialMessage immediately) and then started sttSession.connect() fire-and-forget. Under event-loop contention from TTS startup the STT WS handshake timed out at 10s, leaving the call half-functional - greeting played, caller speech never reached the agent - while a direct OpenAI realtime WebSocket probe from the same host succeeded in ~1.1s.
Establish STT readiness before firing onConnect so TTS startup cannot starve the STT handshake. When the STT connect rejects, close the STT session, end the Twilio media stream with a 1011 close code, and fire onDisconnect so the voice-call manager hangs up the call on the existing grace path instead of silently leaving the caller on a deaf stream.
Fixes #75197.
Summary
Describe the problem and fix in 2–5 bullets:
If this PR fixes a plugin beta-release blocker, title it
fix(<plugin-id>): beta blocker - <summary>and link the matchingBeta blocker: <plugin-name> - <summary>issue labeledbeta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write
N/A. If the cause is unclear, writeUnknown.Regression Test Plan (if applicable)
For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write
N/A.User-visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write
None.Diagram (if applicable)
For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write
N/A.Security Impact (required)
Yes/No)Yes/No)Yes/No)Yes/No)Yes/No)Yes, explain risk + mitigation:Repro + Verification
Environment
Steps
Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes/No)Yes/No)Yes/No)Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.