Skip to content

feat(runtime/opencode): multimodal I/O support (blocked upstream) #399

@alexey-pelykh

Description

@alexey-pelykh

Problem

OpenCode CLI runtime (src/middleware/runtimes/opencode.ts) has a --file flag for file input, but it hardcodes text/plain MIME type for all files (source). Binary media (images, audio, video) is sent as text, making multimodal input non-functional.

Current state

  • Uses run --format json for invocation
  • --file <path> flag exists but hardcodes text/plain MIME type
  • All binary media would be corrupted/ignored

Implementation

While upstream bug persists

  • Declare: acceptsInbound: [] (no media support)
  • All media must be handled by middleware fallback (STT for audio, vision API for images)
  • emitsOutbound: false
readonly mediaCapabilities = {
  acceptsInbound: [],
  emitsOutbound: false,
};

When upstream fixes MIME detection

  • Pass --file <path> for media attachments with correct MIME types
  • Temp file management similar to Gemini
  • Update capability based on which MIME types OpenCode supports

Tests

Unit tests (src/middleware/runtimes/opencode.test.ts)

  • execute() with media present does NOT pass --file flag (upstream bug workaround)
  • mediaCapabilities reports empty acceptsInbound

Future tests (when upstream is fixed)

  • execute() with image attachment passes --file flag
  • Live smoke test: send image, verify OpenCode describes content

Upstream tracking

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions