feat(middleware): propagate multimodal media through ChannelBridge and auto-reply

## Problem

Even with multimodal AgentRuntime contract (#385 ✅) and per-runtime implementations (#386), media still won't flow end-to-end because the middleware layers don't propagate it:

1. **Inbound**: Channel plugins produce media URLs/paths → \`ChannelMessage.mediaUrls\` exists but was never populated by \`buildChannelMessage\` (#384 ✅, now fixed) → \`ChannelBridge\` never passes media to \`AgentExecuteParams.media\`
2. **Outbound**: \`AgentRunResult\` will have \`media\` field (#385 ✅) → but \`ChannelBridge\` only extracts \`text\` → \`ReplyPayload\` only gets text, media is lost

## Scope

### Inbound path (channel → runtime)

1. **ChannelBridge media resolution**: When \`ChannelMessage.mediaUrls\` is populated:
   - Download/resolve media URLs to local file paths (temp files)
   - Build \`MediaAttachment[]\` with MIME type detection
   - Check runtime's \`mediaCapabilities.acceptsInbound\`
   - For supported types: pass through as \`AgentExecuteParams.media\`
   - For unsupported types: delegate to middleware fallback (STT for audio, vision API for images — see below)

2. **Middleware fallback layer**: For runtimes that can't handle certain media types:
   - **Audio → STT**: Use \`src/stt/\` module (#424). Convert voice messages to text, prepend to prompt. This is a middleware concern — every runtime needs text.
   - **Image/video → text description**: Thin fallback using the runtime's own API key (from auth profiles). Only for runtimes that declare no image support.

### Outbound path (runtime → channel)

1. **ChannelBridge**: Handle \`AgentMediaEvent\` and \`AgentRunResult.media\`:
   - Convert \`MediaAttachment\` to \`ReplyPayload.mediaUrl\` / \`ReplyPayload.mediaUrls\`
   - Temp file management: serve from local path, clean up after delivery

2. **Auto-reply delivery**: Already handles \`ReplyPayload.mediaUrl\` — should work once ChannelBridge populates it

3. **TTS integration point**: Outbound audio can come from either:
   - AgentRuntime (native media emission — future)
   - TTS module (text → speech conversion — existing)
   - Both paths produce \`ReplyPayload.mediaUrl\` — delivery is unified

## Architecture diagram

\`\`\`
Inbound:
  Channel plugin → ChannelMessage { mediaUrls }
    → ChannelBridge
      → runtime.mediaCapabilities check
        → supported: MediaAttachment[] → AgentExecuteParams.media
        → unsupported audio: STT middleware → text in prompt
        → unsupported image/video: fallback vision API → text in prompt
    → runtime.execute(params)

Outbound:
  runtime.execute() yields AgentMediaEvent / AgentRunResult.media
    → ChannelBridge
      → ReplyPayload { mediaUrl, mediaUrls }
    → channel delivery
\`\`\`

## Depends on

- #385 — AgentRuntime multimodal contract (done ✅)
- #384 — \`buildChannelMessage\` mediaUrls fix (done ✅)
- #397 — Gemini runtime multimodal (at least one runtime needed to test end-to-end)

## Related

- #415 — implementation plan (parent, Phase 3)
- #386 — per-runtime multimodal tracking
- #424 — STT extraction (provides audio fallback)
- #400 — limitation notices (when no handler available)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(middleware): propagate multimodal media through ChannelBridge and auto-reply #387

Problem

Scope

Inbound path (channel → runtime)

Outbound path (runtime → channel)

Architecture diagram

Depends on

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(middleware): propagate multimodal media through ChannelBridge and auto-reply #387

Description

Problem

Scope

Inbound path (channel → runtime)

Outbound path (runtime → channel)

Architecture diagram

Depends on

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions