[Bug]: voice-call OpenAI realtime transcription times out during Twilio media stream while direct WebSocket succeeds

### Bug type

Behavior bug (incorrect output/state without crash)

### Beta release blocker

No

### Summary

Twilio inbound voice-call media streams connect and initial TTS plays, but OpenAI realtime transcription times out during the live call, so caller speech is never transcribed or routed to the agent.

### Steps to reproduce

Steps to reproduce:
1. Start OpenClaw 2026.4.27 on Ubuntu 24.04 with `voice-call` enabled.
2. Configure `voice-call` with Twilio, `streaming.enabled: true`, `streaming.provider: "openai"`, and `streaming.providers.openai.model: "gpt-4o-transcribe"`.
3. Configure TTS with OpenAI `gpt-4o-mini-tts`.
4. Call the configured Twilio number from an allowlisted caller.
5. Observe the Twilio media stream connect.
6. Speak during or after the initial greeting.

### Expected behavior

After Twilio media stream connects, OpenAI realtime transcription should connect successfully, caller speech should be transcribed, and the transcript should be routed to the voice-call agent response path.

### Actual behavior

The Twilio media stream connects and the initial greeting eventually plays, but STT fails with `OpenAI realtime transcription connection timeout`. No user transcript is recorded, and the call remains effectively deaf until disconnect/end.

### OpenClaw version

2026.4.27

### Operating system

Ubuntu 24.04.4 LTS / Linux 6.8.0-110-generic x86_64

### Install method

npm global

### Model

Voice-call streaming STT: openai/gpt-4o-transcribe, Voice-call TTS: openai/gpt-4o-mini-tts, agent model codex-5.5

### Provider / routing chain

Twilio inbound call -> Tailscale Funnel HTTPS/WSS -> OpenClaw voice-call webhook/media stream -> OpenAI Realtime transcription API

### Additional provider/model setup details

Relevant redacted voice-call config:

```json
{
  "provider": "twilio",
  "publicUrl": "https://<tailscale-host>/voice/webhook",
  "serve": {
    "port": 3334,
    "bind": "127.0.0.1",
    "path": "/voice/webhook"
  },
  "inboundPolicy": "allowlist",
  "streaming": {
    "enabled": true,
    "provider": "openai",
    "streamPath": "/voice/stream",
    "providers": {
      "openai": {
        "apiKey": "***",
        "model": "gpt-4o-transcribe",
        "silenceDurationMs": 800,
        "vadThreshold": 0.5
      }
    }
  },
  "realtime": {
    "enabled": false
  },
  "tts": {
    "provider": "openai",
    "providers": {
      "openai": {
        "apiKey": "***",
        "model": "gpt-4o-mini-tts",
        "voice": "alloy"
      }
    },
    "timeoutMs": 30000
  }
}
```

Direct probes from the same machine succeeded:

    OpenAI gpt-4o-mini-tts request returned 200 in about 1.2s.
    Direct OpenAI realtime transcription WebSocket opened and returned transcription_session.created in about 1.1s.


### Logs, screenshots, and evidence

```shell
07:55:06 [voice-call] Inbound call accepted: +<PHONE_NUMBER_REDACTED> is in allowlist
07:55:06 [voice-call] Created inbound call record: 41be546b-d1db-4f1a-b613-b4155a8821db from +<PHONE_NUMBER_REDACTED>
07:55:07 [MediaStream] Twilio connected
07:55:07 [MediaStream] Stream started: MZd0ddb4a2aa6561e185e88e481c1523b0 (call: CA0c67464cb2ddbccd522404560efbe0e5)
07:55:07 [voice-call] Media stream connected: CA0c67464cb2ddbccd522404560efbe0e5 -> MZd0ddb4a2aa6561e185e88e481c1523b0
07:55:07 [voice-call] Speaking initial message for call 41be546b-d1db-4f1a-b613-b4155a8821db (mode: conversation)
07:55:19 [MediaStream] Transcription session error: OpenAI realtime transcription connection timeout
07:55:19 [MediaStream] STT connection failed (TTS still works): OpenAI realtime transcription connection timeout
07:57:04 [MediaStream] Stream stopped: MZd0ddb4a2aa6561e185e88e481c1523b0
07:57:04 [voice-call] Media stream disconnected: CA0c67464cb2ddbccd522404560efbe0e5 (MZd0ddb4a2aa6561e185e88e481c1523b0)
07:57:05 [MediaStream] WebSocket closed (code: 1005, reason: none)
07:57:06 [voice-call] Auto-ending call 41be546b-d1db-4f1a-b613-b4155a8821db after stream disconnect grace

Persisted call record evidence shows only the bot greeting transcript, with no user transcript:

{
  "callId": "41be546b-d1db-4f1a-b613-b4155a8821db",
  "state": "speaking",
  "transcript": [
    {
      "speaker": "bot",
      "text": "Hello! How can I help you today?",
      "isFinal": true
    }
  ]
}
```

### Impact and severity

Affected: voice-call plugin users using Twilio inbound calls with OpenAI realtime transcription.
Severity: High; inbound conversation mode is unusable because caller speech is not transcribed.
Frequency: Observed repeatedly across multiple inbound call attempts in this setup.
Consequence: Calls connect and may play the greeting, but the assistant cannot hear/respond to the caller.

### Additional information

A direct OpenAI realtime transcription WebSocket probe from the same host succeeds quickly, so this does not appear to be basic OpenAI network reachability. The failure appears specific to the live voice-call media stream runtime path.

Potentially relevant observation: the initial greeting begins immediately after media stream connect, while STT connection is still pending. In observed calls, STT times out and user speech is never captured.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: voice-call OpenAI realtime transcription times out during Twilio media stream while direct WebSocket succeeds #75197

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: voice-call OpenAI realtime transcription times out during Twilio media stream while direct WebSocket succeeds #75197

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions