-
-
Notifications
You must be signed in to change notification settings - Fork 54.3k
Description
Problem
When a user sends a voice message, OpenClaw transcribes it (via whisper or configured provider) and includes the transcript in the [Audio] User text: ... Transcript: prefix. However, there's no way to automatically echo the transcription back to the chat before the agent processes and responds.
This means the user has to trust the agent will echo the transcript — but this is a prompt-based behavior that agents forget inconsistently. The user wants to confirm what was heard before the agent acts on it.
Proposed Solution
A config option under tools.media.audio (or agents.defaults):
{
"audio": {
"echoTranscript": true,
"echoFormat": "📝 *\"{transcript}\"*"
}
}When echoTranscript is true, the gateway automatically sends the transcription text back to the originating chat as a reply/quote before forwarding the message to the agent session. This is infrastructure-level — zero agent involvement, 100% reliable.
Why This Matters
- Voice messages are inherently ambiguous — the user needs to verify what was transcribed
- Agent prompt instructions ("always echo transcription first") are unreliable — agents forget
- This is a common workflow: voice → confirm transcript → agent processes
- Making it a gateway-level feature means it works for all agents, all channels, automatically
Alternatives Considered
- Prompt instructions in AGENTS.md — unreliable, agents forget
- Skills/hooks — OpenClaw doesn't have message preprocessing middleware
- Custom commands — doesn't help with organic voice messages