Skip to content

fix(vision): guard image pixel dimensions (8000px cap), not just bytes#39033

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-d189e023
Jun 4, 2026
Merged

fix(vision): guard image pixel dimensions (8000px cap), not just bytes#39033
teknium1 merged 2 commits into
mainfrom
hermes/hermes-d189e023

Conversation

@teknium1

@teknium1 teknium1 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

One oversized image no longer permanently bricks a conversation thread on Anthropic. A tall screenshot (e.g. 1200×12000 @ 0.06 MB) passes every byte check but trips Anthropic's independent 8000px-per-side cap → non-retryable HTTP 400 → every subsequent turn replays the immutable image and re-fails.

Salvages @kyssta-exe's #37727 (cherry-picked, authorship preserved) + adds the proactive embed-time fix #37727 missed.

Changes

  • agent/error_classifier.py: add "image dimensions exceed" + two wording variants to _IMAGE_TOO_LARGE_PATTERNS so the dimension 400 is classified image_too_large and routed into the shrink/retry path. (contributor + follow-up)
  • tools/vision_tools.py:
    • _resize_image_for_vision gains a max_dimension param (pre-cap + loop check). (contributor)
    • NEW _image_exceeds_dimension() + _EMBED_MAX_DIMENSION=7900; the proactive embed cap now fires on bytes OR pixels and passes max_dimension, so a tall small-byte image is shrunk before it's baked into immutable history — not just reactively after a failed round-trip. (follow-up — the gap fix(vision): guard image pixel dimensions, not just bytes #37677 #37727 left open)
    • best-effort lazy-install of Pillow in the resize ImportError fallback. (follow-up)
  • agent/conversation_compression.py: reactive shrinker decodes + checks pixels when bytes are fine. (contributor)
  • pyproject.toml + tools/lazy_deps.py + uv.lock: declare Pillow as the [vision] extra / tool.vision lazy dep — it was undeclared everywhere; without it ALL resize recovery silently no-ops. (follow-up)
  • tests/: contributor's shrink-recovery tests + 5 new _image_exceeds_dimension cases.

Validation

Path Result
_image_exceeds_dimension(1200×12000) True; (800×600) False; edge/no-Pillow/corrupt → False
_resize_image_for_vision(tall, max_dim=7900) 1200×12000 → 600×6000, under both caps; small image untouched
reactive shrinker on 1000×11000 / 74 KB data URL (old byte gate would skip) shrinks to 500×5500
classifier matches all dimension-message variants + legacy 5 MB byte message
test_vision_tools.py / test_image_shrink_recovery.py / test_error_classifier.py 78 / 24 / 145 pass
uv lock --check clean (only pillow added)

Closes #37677. Supersedes #37727 (cherry-picked with authorship preserved).

Infographic

two-ceilings-not-one

@teknium1 teknium1 requested a review from a team June 4, 2026 12:49
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-d189e023 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9837 on HEAD, 9829 on base (🆕 +8)

🆕 New issues (1):

Rule Count
unresolved-import 1
First entries
agent/conversation_compression.py:679: [unresolved-import] unresolved-import: Cannot resolve imported module `PIL`

✅ Fixed issues: none

Unchanged: 5100 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/vision Vision analysis and image generation labels Jun 4, 2026
kyssta-exe and others added 2 commits June 4, 2026 06:16
Anthropic enforces two independent ceilings per image:
1. 5 MB encoded byte size
2. 8000 px longest side

Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB)
passes every byte check but fails the pixel check, returning a
non-retryable HTTP 400 that permanently bricks the conversation thread.

Fixes:
- error_classifier: add 'image dimensions exceed' pattern to
  _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large
  and triggers the shrink/retry path instead of falling through to
  non-retryable error.
- conversation_compression: check pixel dimensions (via Pillow) even
  when byte size is under the 4 MB target. If max(dims) > 8000, force
  shrink.
- vision_tools._resize_image_for_vision: add optional max_dimension param.
  When set, images exceeding the pixel cap are downscaled even if they're
  under the byte budget. The resize loop now checks both byte AND pixel
  limits before accepting a candidate.

Closes #37677
… Pillow

Follow-up to the salvaged #37727. That PR fixed the reactive recovery path
(classifier + post-failure shrinker) but left the PROACTIVE embed-time guard
in vision_tools byte-only — a tall small-byte screenshot (e.g. 1200x12000 at
0.06 MB) still baked into immutable history un-resized, relying on a failed
round-trip to trigger reactive shrink.

- vision_tools: add _image_exceeds_dimension() + _EMBED_MAX_DIMENSION (7900px);
  the embed-time cap now fires on bytes OR pixels and passes max_dimension to
  the resizer, so tall small-byte images are shrunk before they're embedded.
- vision_tools: best-effort lazy-install of Pillow (tool.vision) in the resize
  ImportError fallback so the soft dep self-heals (respects allow_lazy_installs).
- error_classifier: add two more Anthropic dimension-cap wording variants.
- pyproject + lazy_deps: declare Pillow as the [vision] extra / tool.vision
  lazy dep (it was undeclared everywhere; without it ALL resize recovery no-ops).
- tests: cover _image_exceeds_dimension (tall/small/edge/no-Pillow/corrupt).

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
@teknium1 teknium1 force-pushed the hermes/hermes-d189e023 branch from a4b2208 to 820804b Compare June 4, 2026 13:16
@teknium1 teknium1 merged commit dd4ba4c into main Jun 4, 2026
19 checks passed
@teknium1 teknium1 deleted the hermes/hermes-d189e023 branch June 4, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Oversized image (>8000px) permanently bricks a conversation thread — image guards check bytes, never pixel dimensions

3 participants