fix(tts): propagate audioAsVoice from tool results to delivery payload by azade-c · Pull Request #46535 · openclaw/openclaw

azade-c · 2026-03-14T19:42:27Z

Problem

When the TTS tool returns [[audio_as_voice]] alongside a MEDIA: path in its tool result, the voice-bubble flag is lost during tool result extraction. This causes TTS audio to be sent as a file attachment instead of a voice bubble on Telegram (and other voice-capable channels like WhatsApp/Feishu).

Root cause

extractToolResultMediaPaths() only extracts file paths from tool results, discarding the audioAsVoice field that splitMediaFromOutput() correctly parses from [[audio_as_voice]] tags.

The extracted media is then passed to onToolResult({ mediaUrls }) without the voice flag, so Telegram's sendVoice path is never triggered — it falls through to sendAudio (file attachment).

Two code paths are affected:

Non-verbose path (emitToolResultOutput in handlers.tools.ts): uses extractToolResultMediaPaths → loses audioAsVoice
Verbose path (emitToolResultMessage in pi-embedded-subscribe.ts): calls parseReplyDirectives which returns audioAsVoice, but the field was not propagated to the onToolResult payload

Fix

Add extractToolResultMedia() that returns { paths: string[], audioAsVoice?: boolean }
Keep extractToolResultMediaPaths() as a backward-compatible wrapper
Propagate audioAsVoice to the onToolResult payload in both code paths
Add tests covering [[audio_as_voice]] detection in tool results

Testing

All 29 existing + 4 new tests pass in pi-embedded-subscribe.tools.media.test.ts
Manually verified: TTS tool result with [[audio_as_voice]]\nMEDIA:path.mp3 was arriving as file attachment on Telegram; after fix, audioAsVoice: true is correctly set on the delivery payload

When the TTS tool returns [[audio_as_voice]] alongside a MEDIA: path, the voice-bubble flag was lost during tool result extraction. The extractToolResultMediaPaths function only returned file paths, discarding the audioAsVoice directive parsed by splitMediaFromOutput. This caused TTS audio to be sent as a file attachment instead of a voice bubble on Telegram (and other voice-capable channels). Changes: - Add extractToolResultMedia() returning { paths, audioAsVoice? } - Keep extractToolResultMediaPaths() as backward-compatible wrapper - Propagate audioAsVoice in emitToolResultOutput (non-verbose path) - Propagate audioAsVoice in emitToolResultMessage (verbose path) - Add tests for extractToolResultMedia audioAsVoice detection

greptile-apps · 2026-03-14T19:47:17Z

Greptile Summary

This PR fixes a bug where the audioAsVoice flag (emitted by the TTS tool via [[audio_as_voice]] tags) was lost during tool result processing, causing TTS audio to be delivered as a file attachment rather than a voice bubble on Telegram and other voice-capable channels.

Changes:

Introduces extractToolResultMedia() in pi-embedded-subscribe.tools.ts that returns both paths and audioAsVoice, with extractToolResultMediaPaths() retained as a backward-compatible wrapper — clean, minimal API surface change.
Non-verbose path (handlers.tools.ts): switches from extractToolResultMediaPaths to extractToolResultMedia and propagates audioAsVoice to the onToolResult payload.
Verbose path (pi-embedded-subscribe.ts): adds audioAsVoice to the parseReplyDirectives destructuring and propagates it to onToolResult.
Adds 4 targeted unit tests covering tag detection, tag absence, multi-block detection, and null input.

Minor observations:

The old JSDoc that described the extraction strategy for extractToolResultMediaPaths is now positioned above the ExtractedToolResultMedia interface after the refactor, making it read as interface documentation when it was originally function documentation. The new extractToolResultMedia function has its own correct JSDoc, so the block above the interface is redundant.
The audioAsVoice flag is silently dropped in the image details.path fallback branch when no MEDIA: paths are found. This is not a real issue for TTS (which always emits a MEDIA: path alongside the tag), but it is a latent inconsistency for future callers.

Confidence Score: 4/5

Safe to merge — the fix is well-scoped, both affected code paths are addressed, and existing tests pass alongside 4 new targeted tests.
The core logic change is correct and minimal. The non-verbose and verbose paths are both fixed consistently. The only concerns are a misplaced JSDoc comment (cosmetic) and a missing audioAsVoice propagation in an unlikely image-fallback edge case, neither of which affects the primary TTS use case described in the PR.
src/agents/pi-embedded-subscribe.tools.ts — minor: orphaned JSDoc above ExtractedToolResultMedia and audioAsVoice not carried through the details.path fallback branch.

Comments Outside Diff (1)

src/agents/pi-embedded-subscribe.tools.ts, line 185-199 (link)

Orphaned JSDoc comment above the interface

The docblock that was originally written for extractToolResultMediaPaths (describing the extraction strategy and the "returns an empty array" note) now sits above ExtractedToolResultMedia rather than above either function. The new extractToolResultMedia function already has its own correct JSDoc, so the block above the interface is now redundant and misleading — it reads as documentation for the interface when it was really describing function behavior.

Consider removing the old comment from above the interface (or relocating it), leaving only the JSDoc that's already above extractToolResultMedia.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.tools.ts
Line: 185-199

Comment:
**Orphaned JSDoc comment above the interface**

The docblock that was originally written for `extractToolResultMediaPaths` (describing the extraction strategy and the "returns an empty array" note) now sits above `ExtractedToolResultMedia` rather than above either function. The new `extractToolResultMedia` function already has its own correct JSDoc, so the block above the interface is now redundant and misleading — it reads as documentation for the interface when it was really describing function behavior.

Consider removing the old comment from above the interface (or relocating it), leaving only the JSDoc that's already above `extractToolResultMedia`.



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.tools.ts
Line: 185-199

Comment:
**Orphaned JSDoc comment above the interface**

The docblock that was originally written for `extractToolResultMediaPaths` (describing the extraction strategy and the "returns an empty array" note) now sits above `ExtractedToolResultMedia` rather than above either function. The new `extractToolResultMedia` function already has its own correct JSDoc, so the block above the interface is now redundant and misleading — it reads as documentation for the interface when it was really describing function behavior.

Consider removing the old comment from above the interface (or relocating it), leaving only the JSDoc that's already above `extractToolResultMedia`.

```suggestion
export interface ExtractedToolResultMedia {
  paths: string[];
  audioAsVoice?: boolean;
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.tools.ts
Line: 250-264

Comment:
**`audioAsVoice` silently dropped in the image-fallback path**

When `paths.length === 0` but `audioAsVoice` was detected (e.g. a content block contains `[[audio_as_voice]]` but no valid `MEDIA:` token), the code falls through to the `details.path` image-fallback branch. Both `return { paths: [p] }` and the final `return { paths: [] }` discard the `audioAsVoice` flag.

The TTS tool always emits a `MEDIA:` path alongside the tag, so this won't bite in today's usage. But as an invariant it's fragile — future tools that emit `[[audio_as_voice]]` without a `MEDIA:` line (or where the `MEDIA:` token fails validation) would silently lose the flag. A defensive approach would be to carry `audioAsVoice` through:

```typescript
  if (paths.length > 0) {
    return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) };
  }

  // Fall back to details.path when image content exists but no MEDIA: text.
  if (hasImageContent) {
    const details = record.details as Record<string, unknown> | undefined;
    const p = typeof details?.path === "string" ? details.path.trim() : "";
    if (p) {
      return { paths: [p], ...(audioAsVoice ? { audioAsVoice: true } : {}) };
    }
  }

  return { paths: [], ...(audioAsVoice ? { audioAsVoice: true } : {}) };
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 5133827}

greptile-apps · 2026-03-14T19:47:24Z

  if (paths.length > 0) {
-    return paths;
+    return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) };
  }

  // Fall back to details.path when image content exists but no MEDIA: text.
  if (hasImageContent) {
    const details = record.details as Record<string, unknown> | undefined;
    const p = typeof details?.path === "string" ? details.path.trim() : "";
    if (p) {
-      return [p];
+      return { paths: [p] };
    }
  }

-  return [];
+  return { paths: [] };
 }


audioAsVoice silently dropped in the image-fallback path

When paths.length === 0 but audioAsVoice was detected (e.g. a content block contains [[audio_as_voice]] but no valid MEDIA: token), the code falls through to the details.path image-fallback branch. Both return { paths: [p] } and the final return { paths: [] } discard the audioAsVoice flag.

The TTS tool always emits a MEDIA: path alongside the tag, so this won't bite in today's usage. But as an invariant it's fragile — future tools that emit [[audio_as_voice]] without a MEDIA: line (or where the MEDIA: token fails validation) would silently lose the flag. A defensive approach would be to carry audioAsVoice through:

if (paths.length > 0) { return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) }; } // Fall back to details.path when image content exists but no MEDIA: text. if (hasImageContent) { const details = record.details as Record<string, unknown> | undefined; const p = typeof details?.path === "string" ? details.path.trim() : ""; if (p) { return { paths: [p], ...(audioAsVoice ? { audioAsVoice: true } : {}) }; } } return { paths: [], ...(audioAsVoice ? { audioAsVoice: true } : {}) };

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/pi-embedded-subscribe.tools.ts Line: 250-264 Comment: **`audioAsVoice` silently dropped in the image-fallback path** When `paths.length === 0` but `audioAsVoice` was detected (e.g. a content block contains `[[audio_as_voice]]` but no valid `MEDIA:` token), the code falls through to the `details.path` image-fallback branch. Both `return { paths: [p] }` and the final `return { paths: [] }` discard the `audioAsVoice` flag. The TTS tool always emits a `MEDIA:` path alongside the tag, so this won't bite in today's usage. But as an invariant it's fragile — future tools that emit `[[audio_as_voice]]` without a `MEDIA:` line (or where the `MEDIA:` token fails validation) would silently lose the flag. A defensive approach would be to carry `audioAsVoice` through: ```typescript if (paths.length > 0) { return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) }; } // Fall back to details.path when image content exists but no MEDIA: text. if (hasImageContent) { const details = record.details as Record<string, unknown> | undefined; const p = typeof details?.path === "string" ? details.path.trim() : ""; if (p) { return { paths: [p], ...(audioAsVoice ? { audioAsVoice: true } : {}) }; } } return { paths: [], ...(audioAsVoice ? { audioAsVoice: true } : {}) }; ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5133827a52

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-14T19:48:43Z

-    void ctx.params.onToolResult({ mediaUrls: mediaPaths });
+    void ctx.params.onToolResult({
+      mediaUrls: mediaPaths,
+      ...(extracted.audioAsVoice ? { audioAsVoice: true } : {}),


Gate audioAsVoice on surviving media after URL filtering

audioAsVoice is propagated even though media URLs are filtered separately, so the flag can survive while its intended audio file is removed. In a mixed tool result (for example [[audio_as_voice]] with both a filtered local path and a remaining remote image URL), the remaining non-audio media is still marked as voice and downstream delivery can take the wrong path; Discord unconditionally routes audioAsVoice payloads through its voice API (extensions/discord/src/monitor/reply-delivery.ts:352), which can fail for non-audio media. Please clear/recompute audioAsVoice after filtering so only retained audio media keep the voice flag.

Useful? React with 👍 / 👎.

steipete · 2026-04-25T06:43:54Z

Maintainer deep-review update from current main:

This is mostly superseded for the built-in tts tool, but not exactly identical to the original generic legacy-path fix.

Current code now returns structured media from the TTS tool: src/agents/tools/tts-tool.ts puts the audio path under details.media.mediaUrl, with trustedLocalMedia: true and audioAsVoice: true when the synthesized result is voice-compatible. src/agents/pi-embedded-subscribe.tools.ts reads that structured media in extractToolResultMediaArtifact(), including audioAsVoice. The focused agent tests pass on current main:

pnpm test src/agents/pi-embedded-subscribe.handlers.tools.media.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts

Result: 2 files / 46 tests passed.

Caveat: the legacy text-only path still calls splitMediaFromOutput(entry.text) but currently keeps only parsed.mediaUrls; it does not propagate parsed.audioAsVoice. So if this PR is meant to support arbitrary legacy tool-result text like [[audio_as_voice]]\nMEDIA:/tmp/file.opus, that part is still not fully covered by main.

Recommended next step: either close this as superseded by structured TTS media if the original production issue was only the built-in tts tool, or narrow/rebase it to add the missing legacy parsed.audioAsVoice propagation plus one focused regression test.

steipete · 2026-04-25T16:58:26Z

Thanks @azade-c. I deep-reviewed this against current main and landed the remaining real gap directly:

60f9358348

What landed:

extractToolResultMediaArtifact() now preserves [[audio_as_voice]] from legacy trusted tool-result text that also emits MEDIA:.
The flag is also preserved when the voice hint and media path arrive in separate text blocks.
The delivery handler now has regression coverage proving trusted tts legacy MEDIA: output queues pendingToolAudioAsVoice=true.
Rich output protocol docs and changelog were updated.

Why I closed instead of merging this PR: current main already had the structured details.media.audioAsVoice path, and the only remaining gap was the legacy text parser path. The landed patch keeps the fix scoped to the existing parser contract instead of adding a parallel ad hoc path.

Validation:

pnpm test src/agents/pi-embedded-subscribe.tools.media.test.ts src/agents/pi-embedded-subscribe.handlers.tools.media.test.ts -> 2 files / 63 tests passed
pnpm test src/agents/pi-embedded-subscribe.tools.media.test.ts src/agents/pi-embedded-subscribe.handlers.tools.media.test.ts src/agents/pi-embedded-subscribe.handlers.messages.test.ts -> 3 files / 86 tests passed
pnpm tsgo:core and pnpm tsgo:core:test completed in pnpm check:changed before the known local oxlint lock wedge
pnpm lint:core
pnpm check:docs
git diff --check
live .profile OpenAI TTS smoke: pnpm openclaw infer tts convert --local --json --model openai/gpt-4o-mini-tts --voice alloy ... succeeded and wrote a 58K MP3

Note: pnpm check:changed still self-deadlocked at lint:core because its oxlint child waited on the heavy-check lock held by the wrapper. I decomposed and ran the same relevant lanes directly.

steipete · 2026-04-25T16:58:31Z

Superseded by current main in 60f9358348.

openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 14, 2026

greptile-apps Bot reviewed Mar 14, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 14, 2026

View reviewed changes

jetd1 mentioned this pull request Apr 9, 2026

fix(tts): allow OpenClaw temp directory paths in reply media normalizer #63511

Merged

steipete closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tts): propagate audioAsVoice from tool results to delivery payload#46535

fix(tts): propagate audioAsVoice from tool results to delivery payload#46535
azade-c wants to merge 1 commit intoopenclaw:mainfrom
azade-c:fix/tts-tool-audio-as-voice

azade-c commented Mar 14, 2026

Uh oh!

greptile-apps Bot commented Mar 14, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Mar 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 14, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

azade-c commented Mar 14, 2026

Problem

Root cause

Fix

Testing

Uh oh!

greptile-apps Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

steipete commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Mar 14, 2026 •

edited

Loading