Skip to content

feat(runtime/gemini): multimodal I/O support #397

@alexey-pelykh

Description

@alexey-pelykh

Problem

Gemini CLI runtime (src/middleware/runtimes/gemini.ts) currently only handles text prompts. Gemini has the best multimodal support of all runtimes — it can natively handle images, audio, video, and PDF via @path inline syntax.

Current state

  • Uses --output-format stream-json --prompt for invocation
  • buildEnv() returns {}
  • No media handling

Implementation

Inbound (images, audio, video, PDF)

  • Save MediaAttachment content to temp files when params.media is present
  • Inject @/path/to/media.jpg references inline in the prompt text
  • Gemini CLI resolves @path references and includes the media natively
  • Declare capability: acceptsInbound: ["image/", "audio/", "video/", "application/pdf"]

Outbound

  • No native media emission capability
  • emitsOutbound: false

Capability declaration

readonly mediaCapabilities = {
  acceptsInbound: ["image/", "audio/", "video/", "application/pdf"],
  emitsOutbound: false,
};

Temp file management

  • Create temp directory before execution
  • Write media attachments as temp files with appropriate extensions
  • Clean up temp directory after done event (or on error)

Tests

Unit tests (src/middleware/runtimes/gemini.test.ts)

  • execute() with image attachment creates temp file and injects @path in prompt
  • execute() with audio attachment creates temp file and injects @path
  • execute() with multiple attachments creates multiple temp files
  • execute() without media uses existing prompt path (backwards compat)
  • Temp files are cleaned up after execution completes
  • Temp files are cleaned up on execution error
  • mediaCapabilities reports correct values

Live smoke tests (src/middleware/__smoke__/gemini.live.test.ts)

  • Send image attachment, verify Gemini describes the image content
  • Send audio attachment, verify Gemini transcribes/describes audio
  • Send text-only prompt, verify existing behavior unchanged

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions