Skip to content

Audio media understanding skipped for followup-queued messages #44682

@joeykrug

Description

@joeykrug

Summary

Voice notes (and likely other media) that arrive while the agent is mid-turn are queued as "followup" messages. The followup runner (createFollowupRunner) calls runEmbeddedPiAgent directly without first calling applyMediaUnderstanding. This means audio transcription, image understanding, and video understanding are silently skipped for all queued messages.

Steps to Reproduce

  1. Configure tools.media.audio with a provider model (e.g., openai/gpt-4o-transcribe)
  2. Send a text message to the agent via Signal (or any channel)
  3. While the agent is still generating its reply, send a voice note
  4. The voice note arrives as a followup-queued message
  5. The agent receives <media:audio> but no transcriptapplyMediaUnderstanding was never called

Expected Behavior

applyMediaUnderstanding should run on followup-queued messages before they are passed to runEmbeddedPiAgent, just as it does in the primary getReplyFromConfig path.

Root Cause

In the source (e.g., sessions-DRG4gFa3.js):

  • Line ~124182: applyMediaUnderstanding is called in getReplyFromConfig for the initial message ✅
  • Line ~125007: A second call exists in the ACP dispatch path with a guard (if (!params.ctx.MediaUnderstanding?.length)) ✅
  • createFollowupRunner (~line 121676): Calls runEmbeddedPiAgent with queued.prompt directly — no applyMediaUnderstanding call

Suggested Fix

Add an applyMediaUnderstanding call inside the followup runner before runEmbeddedPiAgent is invoked, similar to the guard pattern used in the ACP path:

if (!queued.ctx?.MediaUnderstanding?.length) {
  await applyMediaUnderstanding({
    ctx: queued.ctx,  // or reconstruct from queued.run
    cfg: queued.run.config,
  });
}

Workaround

Manually transcribe audio using the OpenAI Whisper API when <media:audio> is received without a transcript.

Environment

  • OpenClaw version: 2026.3.11
  • Channel: Signal
  • Audio model: openai/gpt-4o-transcribe
  • Agent model: anthropic/claude-opus-4-6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions