Skip to content

feat(middleware): communicate multimodal limitations to users when runtime can't handle media #400

@alexey-pelykh

Description

@alexey-pelykh

Problem

When a user sends an image or voice message on WhatsApp/Telegram and the configured runtime doesn't support that media type (e.g., Codex/OpenCode for any media, Claude for audio/video), the system should inform the user rather than silently dropping or degrading the media.

Currently, unsupported media is either:

  • Silently converted to a text description via applyMediaUnderstanding() (if API keys are configured)
  • Silently dropped (if no fallback is configured)

Neither path tells the user what happened.

Proposed behavior

When runtime handles media natively

No user-visible change — media flows through to the runtime.

When middleware fallback handles media (STT, vision API)

Transparent to user — media is processed, result is included in prompt. Optionally note the conversion in a status/debug channel.

When no handler is available for the media type

Inform the user with a clear message. Examples:

  • Image with no support: "⚠️ Your image was received but the current runtime (codex) doesn't support image input. The image content was not included in the conversation."
  • Voice message with no STT: "⚠️ Voice message received but speech-to-text is not configured. Please send your message as text."
  • Video with no support: "⚠️ Video attachments are not supported by the current runtime (claude). Only the text caption was included."

Where to communicate

The notification should be:

  1. Appended to the ReplyPayload as a prefix/suffix warning (visible in the agent's reply)
  2. OR sent as a separate message before the agent reply
  3. Configurable: agents.defaults.mediaFallbackNotice: "inline" | "separate" | "silent" (default: "inline")

Implementation

ChannelBridge media routing decision point

After checking runtime.mediaCapabilities:

for (const attachment of message.media) {
  const supported = runtime.mediaCapabilities?.acceptsInbound?.some(
    prefix => attachment.mimeType.startsWith(prefix)
  );
  
  if (supported) {
    // pass through to runtime
    nativeMedia.push(attachment);
  } else if (hasFallback(attachment.mimeType)) {
    // middleware handles (STT, vision API)
    fallbackMedia.push(attachment);
  } else {
    // no handler — notify user
    unsupportedMedia.push(attachment);
  }
}

Notification format

Keep notifications concise and actionable:

  • State what was received
  • State why it couldn't be processed
  • Suggest an alternative if available

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions