Skip to content

Feature: auto-echo audio transcription before agent processing #32102

@bigideagames

Description

@bigideagames

Problem

When a user sends a voice message, OpenClaw transcribes it (via whisper or configured provider) and includes the transcript in the [Audio] User text: ... Transcript: prefix. However, there's no way to automatically echo the transcription back to the chat before the agent processes and responds.

This means the user has to trust the agent will echo the transcript — but this is a prompt-based behavior that agents forget inconsistently. The user wants to confirm what was heard before the agent acts on it.

Proposed Solution

A config option under tools.media.audio (or agents.defaults):

{
  "audio": {
    "echoTranscript": true,
    "echoFormat": "📝 *\"{transcript}\"*"
  }
}

When echoTranscript is true, the gateway automatically sends the transcription text back to the originating chat as a reply/quote before forwarding the message to the agent session. This is infrastructure-level — zero agent involvement, 100% reliable.

Why This Matters

  • Voice messages are inherently ambiguous — the user needs to verify what was transcribed
  • Agent prompt instructions ("always echo transcription first") are unreliable — agents forget
  • This is a common workflow: voice → confirm transcript → agent processes
  • Making it a gateway-level feature means it works for all agents, all channels, automatically

Alternatives Considered

  • Prompt instructions in AGENTS.md — unreliable, agents forget
  • Skills/hooks — OpenClaw doesn't have message preprocessing middleware
  • Custom commands — doesn't help with organic voice messages

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions