Skip to content

media: hardcoded image size limits (bytes/input-pixels) are not user-configurable across sanitize layers #67031

@gucasbrg

Description

@gucasbrg

Bug type

Feature request (configurability gap)

Summary

OpenClaw has at least 10 hardcoded image size limits scattered across different bundles, and only one of them (DEFAULT_IMAGE_MAX_DIMENSION_PX in src/agents/tool-images.ts) is user-configurable (via agents.defaults.imageMaxDimensionPx). The corresponding byte limits for tool-result image sanitization, the generic media/mime layer, and the OpenAI-compat HTTP client outbound path are all hardcoded, so operators running OpenClaw against vision models that comfortably handle high-resolution photos (Qwen3-VL, GPT-4o, etc.) still get rejection errors on routine inputs like smartphone camera photos.

This is a companion request to the sandbox-only #40880 / #40950 which only covers src/media/store.ts.

Hardcoded limits I could find in dist/ (v2026.4.14)

File Constant Value Scope
mime-*.js MAX_IMAGE_BYTES 6 MB generic channel inbound media (src/media/constants.ts)
tool-images-*.js DEFAULT_IMAGE_MAX_BYTES 5 MB agent tool-result image sanitize output target
tool-images-*.js DEFAULT_IMAGE_MAX_DIMENSION_PX 1200 already configurable via agents.defaults.imageMaxDimensionPx
image-ops-*.js MAX_IMAGE_INPUT_PIXELS 25e6 sharp limitInputPixels hard cap (rejects input before resize)
store-*.js MEDIA_MAX_BYTES 5 MB sandbox media staging — covered by #40880 / #40950
constants-*.js DEFAULT_WEB_MEDIA_BYTES 5 MB web channel attachment
input-files-*.js DEFAULT_INPUT_IMAGE_MAX_BYTES 10 MB per-image limit used by OpenAI-compat client input pipeline
openai-http-*.js DEFAULT_OPENAI_MAX_TOTAL_IMAGE_BYTES 20 MB total image payload across one OpenAI chat.completions request
dispatch-acp-*.js ACP_ATTACHMENT_MAX_BYTES 10 MB ACP protocol attachments
subagent-spawn-*.js attachments.maxTotalBytes fallback 5 MB subagent spawn (existing attachments.maxTotalBytes config)

Why the current situation is painful

  • 25 MP sharp cap in image-ops — iPhone 14 Pro Max (48 MP), Samsung S24 Ultra (200 MP), Xiaomi/OPPO/Vivo flagship (50–200 MP) routinely exceed 25 MP. These photos get rejected by exceedsImagePixelLimit() in image-ops.ts before any resize is attempted.

  • 5 MB DEFAULT_IMAGE_MAX_BYTES in tool-images — the resize loop (resizeImageBase64IfNeeded) targets 5 MB post-resize; the iteration over sideGrid × IMAGE_REDUCE_QUALITY_STEPS always fits typical JPEGs, but there is no config to loosen the target when the downstream provider can handle 10–20 MB (GPT-4o, Gemini 1.5, Qwen3-VL).

  • 1200 px DEFAULT_IMAGE_MAX_DIMENSION_PX — a reasonable default but too small for vision models that prefer 2048 px native input (Qwen3-VL / Qwen2-VL mmproj reports image_max_pixels: 4194304 ≈ 2048×2048). This one is already configurable, but discoverability is poor — there's no docs/reference/config.md entry I could find.

  • Outbound openai-http limits (10 MB per image, 20 MB total) — when OpenClaw acts as an OpenAI-compat client and forwards image_url parts to the underlying provider, these limits trigger with error strings like Total image payload too large (…; limit 20971520). There is no obvious config path; gateway.http.endpoints.chatCompletions.images accepts allowUrl but not maxBytes / maxTotalBytes.

Proposed schema

A single media.image.* top-level block in openclaw.json that's threaded to all the above layers (similar to how #40950 adds media.maxBytes). Per-scope overrides for advanced users:

{
  "media": {
    "image": {
      // applies to every image-sizing layer unless overridden below
      "maxBytes": 33554432,            // 32 MB
      "maxInputPixels": 150000000,     // 150 MP (covers modern smartphone cameras)
      "maxDimensionPx": 2048,          // post-resize target
      "maxTotalBytesPerRequest": 67108864, // 64 MB

      // per-scope overrides (optional)
      "scopes": {
        "channel":     { "maxBytes": 33554432 },   // replaces MAX_IMAGE_BYTES
        "toolResult":  { "maxBytes": 16777216, "maxDimensionPx": 2048 },
        "openaiHttp":  { "maxBytes": 33554432, "maxTotalBytes": 67108864 },
        "sandbox":     { "maxBytes": 33554432 }    // overlaps with #40950 `media.maxBytes`
      }
    }
  }
}

Minimum ask: expose media.image.maxBytes + media.image.maxInputPixels + media.image.maxDimensionPx and have each layer read them at boot with sensible defaults (current hardcoded values). agents.defaults.imageMaxDimensionPx can be kept as an alias for back-compat.

Why this matters to us

We're running OpenClaw against a local Qwen3-VL 8B (llama.cpp + mmproj) that happily decodes up to 48 MP JPEG inputs when called directly. But OpenClaw's built-in tool-images sanitizer forces all agent-observable images through a 5 MB / 1200 px funnel, so our agents see low-resolution thumbnails when the model could see the real pixels. We worked around it by sed-patching the bundles after every docker compose build, but that's obviously not sustainable.

Related

Environment

  • OpenClaw v2026.4.14 (via ghcr.io/openclaw/openclaw:2026.4.14)
  • Docker deployment on macOS 18.8 host
  • Provider: local Qwen3-VL 8B via llama.cpp llama-server (mmproj image_max_pixels: 4194304)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions