Skip to content

Oversized image (>8000px) permanently bricks a conversation thread — image guards check bytes, never pixel dimensions #37677

@KodoRe

Description

@KodoRe

Summary

A single oversized image (specifically one exceeding Anthropic's 8000px per-side dimension cap) can permanently brick a conversation thread. Once such an image is baked into history (e.g. a browser_vision / vision_analyze tool-result), every subsequent turn replays it, Anthropic returns a non-retryable HTTP 400, and the session is wedged forever — the user sees ⚠️ The model provider failed after retries on every message and the only recovery is manually un-pinning the session on disk.

This bit a live multi-agent deployment: one full-page dashboard screenshot (591×1280 on disk, but the inline vision tool-result was a tall full-page capture >8000px) bricked an agent's DM thread until the session was surgically un-pinned in sessions.json.

Root cause: every image guard reasons about BYTES, never PIXELS

Anthropic enforces two independent ceilings per image:

  1. 5 MB encoded byte size → "image exceeds 5 MB maximum"
  2. 8000px longest side → "image dimensions exceed max allowed size: 8000 pixels"

Hermes only ever guards #1. A tall full-page screenshot can be well under 5 MB yet far over 8000px — e.g. 1200×12000 at 0.06 MB — so it slips through every guard:

  • error_classifier._IMAGE_TOO_LARGE_PATTERNS matches "image exceeds" but NOT the dimension message → the 400 is never classified image_too_large → the shrink/retry path never fires → falls through to a generic non-retryable error → brick.
  • conversation_compression.try_shrink_image_parts_in_messages (the reactive recovery) early-returns when len(url) <= target_bytes (4 MB). A tall <4 MB image is judged "not the oversized one", shrink returns False, and the retry re-sends the identical payload → still 400 → brick.
  • vision_tools._resize_image_for_vision resizes purely by byte budget; it never inspects pixel dimensions, so even when invoked it wouldn't fix a dimension violation.
  • The embed-time proactive cap in vision_tools only triggers on len(data_url) > _EMBED_TARGET_BYTES — a byte test — so a tall small-byte image is baked into immutable history un-resized.

Secondary finding: Pillow was not installed in the gateway venv and is an undeclared soft dependency. Without it, all image-resize recovery (old byte path included) silently no-ops.

Reproduction

  1. Capture/attach an image taller than 8000px but under 5 MB (any full-page screenshot of a long page).
  2. It embeds into history un-resized.
  3. Next message → Anthropic 400 image dimensions exceed max allowed size: 8000 pixels.
  4. Classifier doesn't recognize it → non-retryable → thread bricked on every subsequent message.

Fix (4 coordinated patches)

  1. error_classifier.py — add dimension-cap patterns ("dimensions exceed max allowed size", "image dimensions exceed", "max allowed size: 8000") to _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified image_too_large and routed into the existing shrink/retry recovery.
  2. vision_tools.py — add _MAX_IMAGE_DIMENSION = 7900 + _image_exceeds_dimension() helper; make _resize_image_for_vision enforce a pixel ceiling (pre-cap longest side before the byte loop, preserving aspect ratio) in addition to the byte ceiling.
  3. vision_tools.py (embed site) — trigger the proactive resize when the image exceeds the byte budget OR the pixel cap, so tall small-byte screenshots are shrunk before being baked into immutable history.
  4. conversation_compression.py — make the reactive shrinker decode the image and shrink when either ceiling is exceeded, instead of gating purely on len(url) > 4 MB.

Plus: install Pillow into the gateway venv (and it should be a declared dependency, since without it none of this recovery functions).

Verification

  • 1200×12000 @ 0.06 MB → resized to 790×7900 (under cap). ✅
  • 800×600 small image → untouched. ✅
  • Reactive shrinker on a 0.07 MB / 1000×11000 tool-result (old gate would skip) → shrinks to 718×7900, retry succeeds. ✅
  • Classifier matches both the 8000px message and the legacy 5 MB byte message. ✅
  • All three modules py_compile clean.

Impact: prevents one oversized image from bricking a thread for any agent on Anthropic models (the 8000px cap is non-retryable and immutable once in history).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundduplicateThis issue or pull request already existstool/visionVision analysis and image generationtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions