Skip to content

feat(runtime/claude): multimodal I/O support #396

@alexey-pelykh

Description

@alexey-pelykh

Problem

Claude CLI runtime (src/middleware/runtimes/claude.ts) currently only handles text prompts. It needs to support inbound image attachments via the --input-format stream-json stdin protocol.

Current state

  • Uses --print flag for prompt delivery — no media support
  • buildEnv() returns {}
  • Output parsing via --output-format stream-json --verbose --include-partial-messages

Implementation

Inbound (images only)

  • Switch from --print to --input-format stream-json stdin mode when params.media is present
  • Construct tool_result image content blocks:
    { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "..." } }
  • Only images are supported — audio/video must be handled by middleware STT/fallback
  • Declare capability: acceptsInbound: ["image/"]

Outbound

  • No native media emission capability
  • emitsOutbound: false

Capability declaration

readonly mediaCapabilities = {
  acceptsInbound: ["image/"],
  emitsOutbound: false,
};

Tests

Unit tests (src/middleware/runtimes/claude.test.ts)

  • execute() with media: [{ mimeType: "image/jpeg", base64: "..." }] constructs correct stdin JSON
  • execute() with unsupported media type (audio/video) ignores attachment
  • execute() without media uses existing --print path (backwards compat)
  • mediaCapabilities reports correct values

Live smoke tests (src/middleware/__smoke__/claude.live.test.ts)

  • Send image attachment, verify Claude describes the image content
  • Send text-only prompt, verify existing behavior unchanged

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions