Oversized image (>8000px) permanently bricks a conversation thread — image guards check bytes, never pixel dimensions

## Summary

A single oversized image (specifically one exceeding Anthropic's **8000px per-side dimension cap**) can permanently brick a conversation thread. Once such an image is baked into history (e.g. a `browser_vision` / `vision_analyze` tool-result), every subsequent turn replays it, Anthropic returns a **non-retryable HTTP 400**, and the session is wedged forever — the user sees `⚠️ The model provider failed after retries` on every message and the only recovery is manually un-pinning the session on disk.

This bit a live multi-agent deployment: one full-page dashboard screenshot (591×1280 *on disk*, but the inline vision tool-result was a tall full-page capture >8000px) bricked an agent's DM thread until the session was surgically un-pinned in `sessions.json`.

## Root cause: every image guard reasons about BYTES, never PIXELS

Anthropic enforces **two independent ceilings** per image:
1. **5 MB** encoded byte size → `"image exceeds 5 MB maximum"`
2. **8000px** longest side → `"image dimensions exceed max allowed size: 8000 pixels"`

Hermes only ever guards #1. A tall full-page screenshot can be **well under 5 MB yet far over 8000px** — e.g. 1200×12000 at 0.06 MB — so it slips through every guard:

- **`error_classifier._IMAGE_TOO_LARGE_PATTERNS`** matches `"image exceeds"` but NOT the dimension message → the 400 is never classified `image_too_large` → the shrink/retry path never fires → falls through to a generic non-retryable error → brick.
- **`conversation_compression.try_shrink_image_parts_in_messages`** (the reactive recovery) early-returns when `len(url) <= target_bytes` (4 MB). A tall <4 MB image is judged "not the oversized one", shrink returns `False`, and the retry re-sends the identical payload → still 400 → brick.
- **`vision_tools._resize_image_for_vision`** resizes purely by byte budget; it never inspects pixel dimensions, so even when invoked it wouldn't fix a dimension violation.
- **The embed-time proactive cap** in `vision_tools` only triggers on `len(data_url) > _EMBED_TARGET_BYTES` — a byte test — so a tall small-byte image is baked into immutable history un-resized.

Secondary finding: **Pillow was not installed in the gateway venv** and is an undeclared soft dependency. Without it, *all* image-resize recovery (old byte path included) silently no-ops.

## Reproduction

1. Capture/attach an image taller than 8000px but under 5 MB (any full-page screenshot of a long page).
2. It embeds into history un-resized.
3. Next message → Anthropic 400 `image dimensions exceed max allowed size: 8000 pixels`.
4. Classifier doesn't recognize it → non-retryable → thread bricked on every subsequent message.

## Fix (4 coordinated patches)

1. **`error_classifier.py`** — add dimension-cap patterns (`"dimensions exceed max allowed size"`, `"image dimensions exceed"`, `"max allowed size: 8000"`) to `_IMAGE_TOO_LARGE_PATTERNS` so the 400 is classified `image_too_large` and routed into the existing shrink/retry recovery.
2. **`vision_tools.py`** — add `_MAX_IMAGE_DIMENSION = 7900` + `_image_exceeds_dimension()` helper; make `_resize_image_for_vision` enforce a **pixel** ceiling (pre-cap longest side before the byte loop, preserving aspect ratio) in addition to the byte ceiling.
3. **`vision_tools.py`** (embed site) — trigger the proactive resize when the image exceeds the byte budget **OR** the pixel cap, so tall small-byte screenshots are shrunk before being baked into immutable history.
4. **`conversation_compression.py`** — make the reactive shrinker decode the image and shrink when **either** ceiling is exceeded, instead of gating purely on `len(url) > 4 MB`.

Plus: install Pillow into the gateway venv (and it should be a declared dependency, since without it none of this recovery functions).

## Verification

- 1200×12000 @ 0.06 MB → resized to **790×7900** (under cap). ✅
- 800×600 small image → untouched. ✅
- Reactive shrinker on a 0.07 MB / 1000×11000 tool-result (old gate would skip) → shrinks to **718×7900**, retry succeeds. ✅
- Classifier matches both the 8000px message and the legacy 5 MB byte message. ✅
- All three modules `py_compile` clean.

Impact: prevents one oversized image from bricking a thread for any agent on Anthropic models (the 8000px cap is non-retryable and immutable once in history).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oversized image (>8000px) permanently bricks a conversation thread — image guards check bytes, never pixel dimensions #37677

Summary

Root cause: every image guard reasons about BYTES, never PIXELS

Reproduction

Fix (4 coordinated patches)

Verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Oversized image (>8000px) permanently bricks a conversation thread — image guards check bytes, never pixel dimensions #37677

Description

Summary

Root cause: every image guard reasons about BYTES, never PIXELS

Reproduction

Fix (4 coordinated patches)

Verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions