Skip to content

feat(audio): auto-echo transcription to chat before agent processing#32150

Merged
steipete merged 2 commits intoopenclaw:mainfrom
AytuncYildizli:feat/audio-echo-transcript
Mar 2, 2026
Merged

feat(audio): auto-echo transcription to chat before agent processing#32150
steipete merged 2 commits intoopenclaw:mainfrom
AytuncYildizli:feat/audio-echo-transcript

Conversation

@AytuncYildizli
Copy link
Contributor

Summary

When a user sends a voice message, the transcription is now echoed back to the chat before the agent processes it — so users can immediately verify what was heard.

What changed

Config (tools.media.audio)

Two new optional fields on MediaUnderstandingConfig:

{
  "tools": {
    "media": {
      "audio": {
        "echoTranscript": true,          // default: false
        "echoFormat": "🎙️ Heard: {transcript}"  // default: '📝 "{transcript}"'
      }
    }
  }
}

Echo logic (src/media-understanding/apply.ts)

After transcription succeeds and ctx.Transcript is set, if echoTranscript: true:

  1. Resolves channel and recipient from ctx.Provider/ctx.Surface and ctx.OriginatingTo/ctx.From
  2. Guards on isDeliverableMessageChannel — silently skips for internal/non-deliverable channels
  3. Calls deliverOutboundPayloads with bestEffort: true — delivery failure never blocks transcription
  4. Formats the message via the configurable echoFormat template ({transcript} placeholder)

Schema docs

  • src/config/schema.help.ts — help text for both new fields
  • src/config/schema.labels.ts — UI labels for both new fields

Tests (src/media-understanding/apply.echo-transcript.test.ts)

10 test cases:

Case Expected
echoTranscript: false (explicit) no echo
echoTranscript absent (default) no echo
echoTranscript: true echo sent with default format
custom echoFormat echo uses custom template
no audio attachment no echo
transcription fails no echo
channel not deliverable no echo
ctx has no From/OriginatingTo no echo
OriginatingTo present, From absent echo uses OriginatingTo
echo delivery throws transcription still succeeds, no throw

Checklist

  • TypeScript strict — no any
  • ESLint clean (0 errors)
  • Tests pass (bun test)
  • Follows existing code style
  • Import paths use .js extension
  • Changes minimal and focused

Closes #32102

When echoTranscript is enabled in tools.media.audio config, the
transcription text is sent back to the originating chat immediately
after successful audio transcription — before the agent processes it.
This lets users verify what was heard from their voice note.

Changes:
- config/types.tools.ts: add echoTranscript (bool) and echoFormat
  (string template) to MediaUnderstandingConfig
- media-understanding/apply.ts: sendTranscriptEcho() helper that
  resolves channel/to from ctx, guards on isDeliverableMessageChannel,
  and calls deliverOutboundPayloads best-effort
- config/schema.help.ts: help text for both new fields
- config/schema.labels.ts: labels for both new fields
- media-understanding/apply.echo-transcript.test.ts: 10 vitest cases
  covering disabled/enabled/custom-format/no-audio/failed-transcription/
  non-deliverable-channel/missing-from/OriginatingTo/delivery-failure

Default echoFormat: '📝 "{transcript}"'

Closes openclaw#32102
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad765098d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +99 to +103
echoTranscript?: boolean;
/**
* Format string for the echoed transcript. Use `{transcript}` as placeholder.
* Default: '📝 "{transcript}"'
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add audio echo options to validated config schema

The new tools.media.audio.echoTranscript / echoFormat settings are declared and documented here, but they were not added to the runtime Zod schema (src/config/zod-schema.core.ts, ToolsMediaUnderstandingSchema at lines 674-685), which is .strict(). As a result, any config that follows this commit’s docs and sets these keys will fail validation as unrecognized fields, so the feature cannot be enabled through normal config loading.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR adds a new optional feature to echo audio transcriptions back to the chat before agent processing, allowing users to verify what was heard from their voice messages.

Key changes:

  • Added two new optional config fields: echoTranscript (default: false) and echoFormat (default: '📝 "{transcript}"')
  • Implemented echo logic in src/media-understanding/apply.ts that sends the transcript via deliverOutboundPayloads with bestEffort: true
  • Echo is only sent when transcription succeeds, the channel is deliverable, and recipient info is available
  • Comprehensive test coverage with 10 test cases covering all edge cases including delivery failures, missing context, and non-deliverable channels

Implementation quality:

  • Follows existing codebase patterns and style
  • Properly defensive with guards for missing channel/recipient and non-deliverable channels
  • Best-effort delivery ensures echo failures never block transcription
  • Dynamic import of deliverOutboundPayloads avoids potential circular dependencies
  • Config schema documentation is consistent and complete

The implementation is clean, well-tested, and handles edge cases appropriately. No issues found.

Confidence Score: 5/5

  • This PR is safe to merge with no identified risks
  • Score reflects excellent code quality with comprehensive test coverage (10 test cases), proper error handling, defensive programming patterns, and minimal scope of changes. The feature is opt-in by default, uses best-effort delivery to avoid breaking existing functionality, and all edge cases are properly handled and tested.
  • No files require special attention

Last reviewed commit: ad76509

@steipete steipete merged commit 8f995df into openclaw:main Mar 2, 2026
24 of 26 checks passed
@steipete
Copy link
Contributor

steipete commented Mar 2, 2026

Landed via temp rebase onto main.

  • Gate: pnpm vitest src/media-understanding/apply.echo-transcript.test.ts src/media-understanding/apply.test.ts
  • Land commit: 327de97da1a2ea68dd60e90b4bc7f8cec8e24214
  • Merge commit: 8f995df

Thanks @AytuncYildizli!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: auto-echo audio transcription before agent processing

2 participants