Problem
The AgentRuntime interface is text-only. Both inbound (AgentExecuteParams.prompt: string) and outbound (AgentRunResult.text: string) are plain strings. There is no way to pass media (images, audio, video, documents) to or from CLI runtimes at the contract level.
This forces media conversion to happen outside the runtime contract:
- Inbound:
applyMediaUnderstanding() converts media to text descriptions before the prompt reaches the runtime
- Outbound: Media is only emitted via MCP side effects (
sentMediaUrls), not as first-class runtime output
Proposed contract changes
Inbound: AgentExecuteParams
export type MediaAttachment = {
/** MIME type (e.g., "image/jpeg", "audio/ogg", "video/mp4"). */
mimeType: string;
/** Local file path to the media (preferred for CLI runtimes that accept file paths). */
filePath?: string;
/** Base64-encoded content (for runtimes that accept inline data). */
base64?: string;
/** Original URL (for reference/logging; runtimes should prefer filePath or base64). */
sourceUrl?: string;
/** Original filename (for display/logging). */
fileName?: string;
};
export type AgentExecuteParams = {
prompt: string;
/** Media attachments to include with the prompt. */
media?: MediaAttachment[];
// ... existing fields
};
Outbound: AgentEvent / AgentRunResult
export type AgentMediaEvent = {
type: "media";
media: MediaAttachment;
};
export type AgentEvent =
| AgentTextEvent
| AgentMediaEvent // ← new
| AgentToolUseEvent
| AgentToolResultEvent
| AgentErrorEvent
| AgentDoneEvent;
export type AgentRunResult = {
text: string;
/** Media attachments produced by the agent (non-MCP path). */
media?: MediaAttachment[];
// ... existing fields
};
Runtime capability declaration
export interface AgentRuntime {
execute(params: AgentExecuteParams): AsyncIterable<AgentEvent>;
/** Declare which media types this runtime can handle natively. */
readonly mediaCapabilities?: {
/** MIME type prefixes accepted as inbound media (e.g., ["image/", "audio/", "video/"]). */
acceptsInbound?: string[];
/** Whether the runtime can emit media in responses. */
emitsOutbound?: boolean;
};
}
Design notes
- Runtimes that don't support media can ignore the
media field — prompt text still works as before
ChannelBridge uses mediaCapabilities to decide: pass media through natively vs. fall back to text description (STT for audio, vision API for images)
filePath is preferred over base64 for disk-based CLIs (Gemini uses @path, Claude could use file references)
base64 is available for runtimes that need inline data (Claude's --input-format stream-json)
- Outbound
AgentMediaEvent enables runtimes to produce media directly (e.g., generated images) without relying on MCP tools
Related
Problem
The
AgentRuntimeinterface is text-only. Both inbound (AgentExecuteParams.prompt: string) and outbound (AgentRunResult.text: string) are plain strings. There is no way to pass media (images, audio, video, documents) to or from CLI runtimes at the contract level.This forces media conversion to happen outside the runtime contract:
applyMediaUnderstanding()converts media to text descriptions before the prompt reaches the runtimesentMediaUrls), not as first-class runtime outputProposed contract changes
Inbound:
AgentExecuteParamsOutbound:
AgentEvent/AgentRunResultRuntime capability declaration
Design notes
mediafield — prompt text still works as beforeChannelBridgeusesmediaCapabilitiesto decide: pass media through natively vs. fall back to text description (STT for audio, vision API for images)filePathis preferred overbase64for disk-based CLIs (Gemini uses@path, Claude could use file references)base64is available for runtimes that need inline data (Claude's--input-format stream-json)AgentMediaEventenables runtimes to produce media directly (e.g., generated images) without relying on MCP toolsRelated
buildChannelMessagenever populatesmediaUrlsruntimeEnvconfig field