Skip to content

bug(voice-call): embedded runs expose tts tool, model always calls it instead of returning plain text #27025

@carl-jeffrolc

Description

@carl-jeffrolc

Bug Description

The voice-call plugin's embedded agent runs include the gateway's built-in tts tool in the available toolset. The LLM consistently calls tts instead of returning plain text, even when the responseSystemPrompt explicitly says "Never call the tts tool" and "Always answer with plain spoken text."

This results in completely silent voice calls — the caller hears the initial greeting (static TTS) but never gets an AI response. The embedded run completes successfully (isError=false) but the voice-call plugin has no plain text to route through ElevenLabs telephony TTS.

Reproduction

  1. Configure voice-call plugin with responseModel (Sonnet or Haiku)
  2. Set agents.defaults.thinkingDefault: "high" (common for coding agents)
  3. Call the Twilio number
  4. Speak after the greeting
  5. Wait — no response, call eventually times out

Logs

Every call shows the same pattern:

embedded run start: provider=anthropic model=claude-sonnet-4-6 thinking=high messageChannel=voice
embedded run tool start: tool=tts    ← model calls tts instead of text
embedded run tool end: tool=tts      ← completes in ~200ms
embedded run agent end: isError=false ← "succeeds" with no spoken output

No [voice-call] AI response: line is ever logged because the response is a tool call, not text.

What doesn't work

  • tools.deny: ["tts"] on the voice agent — accepted by schema, dynamically reloaded, but the voice-call plugin's embedded runs bypass the agent's tool config
  • Switching models (Haiku, Sonnet) — both call tts
  • Resetting/purging voice session — happens on fresh sessions too
  • Changing thinkingDefault — model calls tts at every thinking level
  • System prompt already says "Never call the tts tool" — model ignores it

Expected Behavior

The voice-call plugin should either:

  1. Strip the tts tool from embedded run toolsets (preferred — voice calls should never use gateway TTS)
  2. Intercept tts tool calls and route them through the telephony TTS pipeline (ElevenLabs/etc)
  3. Respect per-agent tools.deny in embedded runs

Environment

  • OpenClaw v2026.2.24 (stable)
  • macOS, LaunchAgent-managed gateway
  • Voice pipeline: Twilio → OpenAI Realtime STT → LLM → ElevenLabs TTS
  • Tested with both anthropic/claude-sonnet-4-6 and anthropic/claude-haiku-4-5

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions