fix(tts): propagate audioAsVoice from tool results to delivery payload#46535
fix(tts): propagate audioAsVoice from tool results to delivery payload#46535azade-c wants to merge 1 commit intoopenclaw:mainfrom
Conversation
When the TTS tool returns [[audio_as_voice]] alongside a MEDIA: path,
the voice-bubble flag was lost during tool result extraction. The
extractToolResultMediaPaths function only returned file paths, discarding
the audioAsVoice directive parsed by splitMediaFromOutput.
This caused TTS audio to be sent as a file attachment instead of a voice
bubble on Telegram (and other voice-capable channels).
Changes:
- Add extractToolResultMedia() returning { paths, audioAsVoice? }
- Keep extractToolResultMediaPaths() as backward-compatible wrapper
- Propagate audioAsVoice in emitToolResultOutput (non-verbose path)
- Propagate audioAsVoice in emitToolResultMessage (verbose path)
- Add tests for extractToolResultMedia audioAsVoice detection
Greptile SummaryThis PR fixes a bug where the Changes:
Minor observations:
Confidence Score: 4/5
|
| if (paths.length > 0) { | ||
| return paths; | ||
| return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) }; | ||
| } | ||
|
|
||
| // Fall back to details.path when image content exists but no MEDIA: text. | ||
| if (hasImageContent) { | ||
| const details = record.details as Record<string, unknown> | undefined; | ||
| const p = typeof details?.path === "string" ? details.path.trim() : ""; | ||
| if (p) { | ||
| return [p]; | ||
| return { paths: [p] }; | ||
| } | ||
| } | ||
|
|
||
| return []; | ||
| return { paths: [] }; | ||
| } |
There was a problem hiding this comment.
audioAsVoice silently dropped in the image-fallback path
When paths.length === 0 but audioAsVoice was detected (e.g. a content block contains [[audio_as_voice]] but no valid MEDIA: token), the code falls through to the details.path image-fallback branch. Both return { paths: [p] } and the final return { paths: [] } discard the audioAsVoice flag.
The TTS tool always emits a MEDIA: path alongside the tag, so this won't bite in today's usage. But as an invariant it's fragile — future tools that emit [[audio_as_voice]] without a MEDIA: line (or where the MEDIA: token fails validation) would silently lose the flag. A defensive approach would be to carry audioAsVoice through:
if (paths.length > 0) {
return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) };
}
// Fall back to details.path when image content exists but no MEDIA: text.
if (hasImageContent) {
const details = record.details as Record<string, unknown> | undefined;
const p = typeof details?.path === "string" ? details.path.trim() : "";
if (p) {
return { paths: [p], ...(audioAsVoice ? { audioAsVoice: true } : {}) };
}
}
return { paths: [], ...(audioAsVoice ? { audioAsVoice: true } : {}) };Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.tools.ts
Line: 250-264
Comment:
**`audioAsVoice` silently dropped in the image-fallback path**
When `paths.length === 0` but `audioAsVoice` was detected (e.g. a content block contains `[[audio_as_voice]]` but no valid `MEDIA:` token), the code falls through to the `details.path` image-fallback branch. Both `return { paths: [p] }` and the final `return { paths: [] }` discard the `audioAsVoice` flag.
The TTS tool always emits a `MEDIA:` path alongside the tag, so this won't bite in today's usage. But as an invariant it's fragile — future tools that emit `[[audio_as_voice]]` without a `MEDIA:` line (or where the `MEDIA:` token fails validation) would silently lose the flag. A defensive approach would be to carry `audioAsVoice` through:
```typescript
if (paths.length > 0) {
return { paths, ...(audioAsVoice ? { audioAsVoice: true } : {}) };
}
// Fall back to details.path when image content exists but no MEDIA: text.
if (hasImageContent) {
const details = record.details as Record<string, unknown> | undefined;
const p = typeof details?.path === "string" ? details.path.trim() : "";
if (p) {
return { paths: [p], ...(audioAsVoice ? { audioAsVoice: true } : {}) };
}
}
return { paths: [], ...(audioAsVoice ? { audioAsVoice: true } : {}) };
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5133827a52
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| void ctx.params.onToolResult({ mediaUrls: mediaPaths }); | ||
| void ctx.params.onToolResult({ | ||
| mediaUrls: mediaPaths, | ||
| ...(extracted.audioAsVoice ? { audioAsVoice: true } : {}), |
There was a problem hiding this comment.
Gate audioAsVoice on surviving media after URL filtering
audioAsVoice is propagated even though media URLs are filtered separately, so the flag can survive while its intended audio file is removed. In a mixed tool result (for example [[audio_as_voice]] with both a filtered local path and a remaining remote image URL), the remaining non-audio media is still marked as voice and downstream delivery can take the wrong path; Discord unconditionally routes audioAsVoice payloads through its voice API (extensions/discord/src/monitor/reply-delivery.ts:352), which can fail for non-audio media. Please clear/recompute audioAsVoice after filtering so only retained audio media keep the voice flag.
Useful? React with 👍 / 👎.
|
Maintainer deep-review update from current This is mostly superseded for the built-in Current code now returns structured media from the TTS tool:
Result: 2 files / 46 tests passed. Caveat: the legacy text-only path still calls Recommended next step: either close this as superseded by structured TTS media if the original production issue was only the built-in |
|
Thanks @azade-c. I deep-reviewed this against current What landed:
Why I closed instead of merging this PR: current Validation:
Note: |
|
Superseded by current main in 60f9358348. |
Problem
When the TTS tool returns
[[audio_as_voice]]alongside aMEDIA:path in its tool result, the voice-bubble flag is lost during tool result extraction. This causes TTS audio to be sent as a file attachment instead of a voice bubble on Telegram (and other voice-capable channels like WhatsApp/Feishu).Root cause
extractToolResultMediaPaths()only extracts file paths from tool results, discarding theaudioAsVoicefield thatsplitMediaFromOutput()correctly parses from[[audio_as_voice]]tags.The extracted media is then passed to
onToolResult({ mediaUrls })without the voice flag, so Telegram'ssendVoicepath is never triggered — it falls through tosendAudio(file attachment).Two code paths are affected:
emitToolResultOutputinhandlers.tools.ts): usesextractToolResultMediaPaths→ losesaudioAsVoiceemitToolResultMessageinpi-embedded-subscribe.ts): callsparseReplyDirectiveswhich returnsaudioAsVoice, but the field was not propagated to theonToolResultpayloadFix
extractToolResultMedia()that returns{ paths: string[], audioAsVoice?: boolean }extractToolResultMediaPaths()as a backward-compatible wrapperaudioAsVoiceto theonToolResultpayload in both code paths[[audio_as_voice]]detection in tool resultsTesting
pi-embedded-subscribe.tools.media.test.ts[[audio_as_voice]]\nMEDIA:path.mp3was arriving as file attachment on Telegram; after fix,audioAsVoice: trueis correctly set on the delivery payload