Skip to content

fix(vision): cap image pixel dimensions at 7900px on embed + reactive recover#37701

Closed
Fearvox wants to merge 2 commits into
NousResearch:mainfrom
Fearvox:fix/oversized-image-dimension-guard
Closed

fix(vision): cap image pixel dimensions at 7900px on embed + reactive recover#37701
Fearvox wants to merge 2 commits into
NousResearch:mainfrom
Fearvox:fix/oversized-image-dimension-guard

Conversation

@Fearvox

@Fearvox Fearvox commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Problem

Anthropic enforces two independent per-image ceilings:

  1. 5 MB encoded base64 byte size
  2. 8000px longest side

Pre-fix, every image-oversize guard in Hermes reasoned about bytes only. A tall full-page screenshot (e.g. 1200×12000 at 0.06 MB) is well under 5 MB but vastly over 8000px, slipped through every guard, got baked into immutable conversation history, and bricked the session on every subsequent replay with a non-retryable HTTP 400. Recovery required manually un-pinning the session in sessions.json. See #25837 and #37677 for the original reports.

Fix (four coordinated patches, each covers a different layer)

  1. agent/error_classifier.py_IMAGE_TOO_LARGE_PATTERNS adds the dimension-cap patterns ("image dimensions exceed", "dimensions exceed max allowed", "max allowed size: 8000") alongside the existing 5 MB byte patterns. Without this the 400 falls through to a generic non-retryable bucket and the shrink/retry path never fires.

  2. tools/vision_tools.py — new _MAX_IMAGE_DIMENSION = 7900 constant + _image_exceeds_dimension() helper. Plus a new pre-cap pass in _resize_image_for_vision that decodes the image dimensions before the byte-sizing fast-path and downscales the longest side to 7900px (100px headroom under Anthropic's hard 8000 cap) preserving aspect ratio. The pre-cap writes a downscaled tempfile and rebinds image_path so the subsequent byte pass operates on the resized version. The byte fast-path was previously bailing out on a tiny-but-tall image before the resize logic could run; the pre-cap fixes that ordering bug.

  3. tools/vision_tools.py — new proactive dimension check at the vision_analyze_tool embed site, before the existing embed-time byte cap. Decodes the image with PIL (soft dependency, no-ops gracefully when missing — the reactive recovery in step 4 is the safety net), resizes via _resize_image_for_vision when either axis exceeds 7900px. Prevents the oversized image from ever being baked into history.

  4. agent/conversation_compression.py — reactive shrinker decodes the image dimensions and shrinks on dimension violation, not just on byte violation. Old gate was if len(url) <= target_bytes: return None which silently returned False for a 0.07 MB / 1000×11000 tool-result. New gate is if not dim_exceeds and len(url) <= target_bytes: return None. Falls back to byte-only behaviour if Pillow is unavailable, so the pre-fix regression cannot recur.

Plus: Pillow as a vision extra

pyproject.toml [project.optional-dependencies] adds a new vision extra pinning Pillow==11.1.0. Pillow is the soft dependency that powers steps 2/3/4; without it the dimension-aware paths silently no-op. Per the existing "smaller dependencies = smaller blast radius" rule, Pillow stays out of core dependencies and is opt-in via pip install hermes-agent[vision].

Tests

tests/tools/test_oversized_image_dimension.py — 11 regression tests covering all four fix sites plus legacy byte-pattern guard. Uses real PIL.Image.new() / save() to construct actual 1200×12000 / 1000×11000 fixtures so the dimension-decode path is exercised end-to-end. Pinned 8000px image is over our 7900 target so the resize target doesn't drift.

$ python3 -m pytest tests/tools/test_oversized_image_dimension.py -v
============================= 11 passed in 0.58s ==============================

Diff stat: 5 files changed, 543 insertions(+), 6 deletions(-) — including a 213-line test file that exercises the real bug reproducer.

Refs

@Fearvox Fearvox requested review from a team and Copilot June 2, 2026 22:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR prevents Anthropic vision requests from being permanently “bricked” by tall-but-small images that exceed the provider’s 8000px longest-side cap, adding proactive and reactive resizing plus regression tests.

Changes:

  • Add a pixel-dimension cap helper and apply dimension-based pre-resizing in vision image handling.
  • Extend error classification to detect Anthropic’s dimension-cap 400 responses as image_too_large.
  • Add regression tests and introduce an opt-in vision extra to install Pillow.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/vision_tools.py Adds dimension-cap logic and pre-resizing to keep images under Anthropic’s pixel limit.
agent/error_classifier.py Recognizes dimension-cap error messages as image-too-large for proper recovery.
agent/conversation_compression.py Extends shrinker to decode images and shrink on dimension violations, not only bytes.
tests/tools/test_oversized_image_dimension.py Adds regression tests covering classifier patterns, dimension helper, and resizing paths.
pyproject.toml Adds opt-in vision extra with Pillow for dimension-aware resizing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/vision_tools.py
Comment on lines +421 to +432
import tempfile as _tf
_precap_path = Path(_tf.NamedTemporaryFile(
prefix="hermes_precap_", suffix=".png", delete=False,
).name)
with _PILImage_pre.open(image_path) as _img:
_img = _img.resize((new_w, new_h), _PILImage_pre.LANCZOS)
# Preserve mime if it was PNG; otherwise convert to RGB JPEG.
_fmt = "PNG" if (mime_type or "").endswith("png") else "JPEG"
if _fmt == "JPEG" and _img.mode in {"RGBA", "P"}:
_img = _img.convert("RGB")
_img.save(str(_precap_path), format=_fmt)
image_path = _precap_path
Comment thread tools/vision_tools.py
Comment on lines +407 to +436
_precap_path: Optional[Path] = None
try:
from PIL import Image as _PILImage_pre # type: ignore
with _PILImage_pre.open(image_path) as _dim_probe:
_w, _h = _dim_probe.size
if _image_exceeds_dimension(_w, _h):
scale = _MAX_IMAGE_DIMENSION / max(_w, _h)
new_w = max(int(_w * scale), 1)
new_h = max(int(_h * scale), 1)
logger.info(
"Image %dx%d exceeds %dpx cap, pre-resizing to %dx%d "
"before byte-sizing pass",
_w, _h, _MAX_IMAGE_DIMENSION, new_w, new_h,
)
import tempfile as _tf
_precap_path = Path(_tf.NamedTemporaryFile(
prefix="hermes_precap_", suffix=".png", delete=False,
).name)
with _PILImage_pre.open(image_path) as _img:
_img = _img.resize((new_w, new_h), _PILImage_pre.LANCZOS)
# Preserve mime if it was PNG; otherwise convert to RGB JPEG.
_fmt = "PNG" if (mime_type or "").endswith("png") else "JPEG"
if _fmt == "JPEG" and _img.mode in {"RGBA", "P"}:
_img = _img.convert("RGB")
_img.save(str(_precap_path), format=_fmt)
image_path = _precap_path
except ImportError:
pass
except Exception as _exc:
logger.debug("pre-cap decode failed for %s: %s", image_path, _exc)
Comment thread agent/conversation_compression.py Outdated
Comment on lines +671 to +688
decode_error: Exception | None = None
dim_exceeds = False
try:
from tools.vision_tools import _image_exceeds_dimension as _dim_check
except ImportError:
_dim_check = None # type: ignore
if _dim_check is not None:
try:
import base64 as _b64_dim
_header, _, _data = url.partition(",")
_raw = _b64_dim.b64decode(_data)
import io as _io
from PIL import Image as _PILImage # type: ignore
with _PILImage.open(_io.BytesIO(_raw)) as _img:
_w, _h = _img.size
if _dim_check(_w, _h):
dim_exceeds = True
except ImportError:
# the resize itself will be a no-op below. Defer to
# the byte-only gate in that case.
pass
except Exception as exc:
Comment thread agent/conversation_compression.py Outdated
Comment on lines +677 to +687
if _dim_check is not None:
try:
import base64 as _b64_dim
_header, _, _data = url.partition(",")
_raw = _b64_dim.b64decode(_data)
import io as _io
from PIL import Image as _PILImage # type: ignore
with _PILImage.open(_io.BytesIO(_raw)) as _img:
_w, _h = _img.size
if _dim_check(_w, _h):
dim_exceeds = True
Comment thread tools/vision_tools.py Outdated
Comment on lines +333 to +336
# Anthropic also enforces a per-image **pixel-dimension** ceiling of 8000
# on the longest side. A tall full-page screenshot (e.g. 1200×12000) can
# be well under 5 MB yet vastly over 8000px, and once baked into history
# it brick the session on every subsequent replay (the 400 is
Comment on lines +31 to +32
from typing import Optional
from unittest.mock import patch
… recover

Anthropic enforces two independent per-image ceilings:
  1. 5 MB encoded base64 byte size
  2. 8000px longest side
Pre-fix, every image-oversize guard in Hermes reasoned about
BYTES ONLY. A tall full-page screenshot (e.g. 1200×12000 at
0.06 MB) is well under 5 MB but vastly over 8000px, slipped
through every guard, got baked into immutable conversation
history, and bricked the session on every subsequent replay
with a non-retryable HTTP 400. Recovery required manually
un-pinning the session in sessions.json. See NousResearch#25837 and
NousResearch#37677.

The fix is four coordinated patches, each covering one layer
of the guard chain so any one catches the regression even if
another is bypassed:

1. agent/error_classifier.py — _IMAGE_TOO_LARGE_PATTERNS
   Add the dimension-cap patterns ('image dimensions exceed',
   'dimensions exceed max allowed', 'max allowed size: 8000')
   alongside the existing 5 MB byte patterns. Without this the
   400 falls through to a generic non-retryable bucket and the
   shrink/retry path never fires.

2. tools/vision_tools.py — _MAX_IMAGE_DIMENSION = 7900
   constant + _image_exceeds_dimension() helper. Plus a NEW
   pre-cap pass in _resize_image_for_vision that decodes the
   image dimensions BEFORE the byte-sizing fast-path and
   downscales the longest side to 7900px (100px headroom under
   Anthropic's hard 8000 cap) preserving aspect ratio. The
   pre-cap writes a downscaled tempfile and rebinds image_path
   so the subsequent byte pass operates on the resized version.
   The byte fast-path was previously bailing out on a tiny-but-
   tall image before the resize logic could run; the pre-cap
   fixes that ordering bug.

3. tools/vision_tools.py — embed-time dimension check
   A new proactive check at the vision_analyze_tool / vision
   site, BEFORE the existing embed-time byte cap. Decodes the
   image with PIL (soft dependency, no-ops gracefully when
   Pillow is missing — the reactive recovery in step 4 is the
   safety net), resizes via _resize_image_for_vision when
   either axis exceeds 7900px. This prevents the oversized
   image from ever being baked into history in the first place.

4. agent/conversation_compression.py — reactive shrinker
   Decodes the image dimensions and shrinks on dimension
   violation, not just on byte violation. The old gate was
   'if len(url) <= target_bytes: return None' which silently
   returned False for a 0.07 MB / 1000×11000 tool-result. Now
   the gate is 'if not dim_exceeds and len(url) <= target_bytes:
   return None'. Falls back to byte-only behaviour if Pillow
   is unavailable, so the pre-fix regression cannot recur.

Plus: a new pyproject.toml [project.optional-dependencies] 'vision'
extra pins Pillow==11.1.0. Pillow is the soft dependency that
powers steps 2/3/4; without it the dimension-aware paths
silently no-op. Per the existing 'smaller dependencies =
smaller blast radius' rule, Pillow stays out of core
dependencies and is opt-in via 'pip install
hermes-agent[vision]'.

Tests: tests/tools/test_oversized_image_dimension.py has 11
regression tests covering all four fix sites plus the legacy
byte-pattern guard. Uses real PIL Image.new() / save() to
construct actual 1200×12000 / 1000×11000 fixtures so the
dimension-decode path is exercised end-to-end. The previously
missing assertion that '8000px image is over our 7900 target'
is also pinned to lock the resize target against accidental
drift.

Refs: NousResearch#25837, NousResearch#37677
@Fearvox Fearvox force-pushed the fix/oversized-image-dimension-guard branch from 9b3bebb to 21c30b9 Compare June 2, 2026 22:55
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder tool/vision Vision analysis and image generation labels Jun 2, 2026
Seven Copilot inline review comments on NousResearch#37701, all worth landing
in a polish pass before merge:

1. tools/vision_tools.py:432 — temp file leak + Windows handle.
   Switched tempfile.NamedTemporaryFile(...).name (which leaves
   a stale open file handle on Windows) to tempfile.mkstemp +
   explicit os.close(fd) before PIL reopens the path. The
   hard-coded .png suffix is now derived from the actual format
   we save (PNG vs JPEG), so downstream MIME detection that reads
   the file extension doesn't get lied to.

2. tools/vision_tools.py:436 — pre-cap tempfile orphaned on
   Pillow-resize path. Pre-fix, the cleanup only ran on the early
   fast-path return; the Pillow-resize loop returned a candidate
   with the tempfile still on disk, accumulating orphans. Fix uses
   contextlib.ExitStack: register the unlink as a callback when
   the tempfile is created, run the entire function body in a
   try/finally that calls _precap_stack.__exit__() exactly
   once at the end (regardless of which return path was taken).
   Failure paths also pop_all() the stack to avoid double-unlink.

3. agent/conversation_compression.py:688 — decode_error was
   assigned but never used. Now passed to logger.debug() so the
   operator has a breadcrumb when dimension decode fails. Removed
   the variable name from the public docstring contract.

4. agent/conversation_compression.py:694 — same as NousResearch#3 (duplicate
   on a different line of the pre-fix version). Same fix.

5. agent/conversation_compression.py:687 — short-circuit by byte
   condition first, only perform dimension decode when URL is
   within target_bytes (the tiny-but-tall case). Polarity was
   initially inverted in my first polish pass; corrected so the
   decode fires when the byte gate alone would miss the violation
   (the exact case NousResearch#25837/NousResearch#37677 reported).

6. tools/vision_tools.py:336 — grammar: "it brick the session"
   → "it bricks the session". One-character fix.

7. tests/tools/test_oversized_image_dimension.py:32 — unused
   imports Optional and patch. Removed.

All 252 tests on this branch pass (11 vision + 29 reconnect + 47
minimax + 165 api_key_providers; test_platform_reconnect_fd_leak
lives on the separate NousResearch#37679 fd-leak branch).

Diff stat: 3 files changed, 205 insertions, 135 deletions.
@Fearvox

Fearvox commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Addressed all 7 Copilot review comments in a single polish commit (381095a6e):

#1 — tools/vision_tools.py:432 — Switched NamedTemporaryFile(...).name to tempfile.mkstemp + explicit os.close(fd) before PIL reopens the path. The hard-coded .png suffix is now derived from the actual format we save.

#2 — tools/vision_tools.py:436 — Pre-cap tempfile was orphaned on Pillow-resize path. Fixed via contextlib.ExitStack: register the unlink as a callback when the tempfile is created, run the entire function body in try/finally that runs the stack's exit exactly once at the end regardless of which return path was taken. Failure paths also pop_all() to avoid double-unlink.

#3, #4 — agent/conversation_compression.py:688, 694 — decode_error was assigned-but-unused. Now passed to logger.debug() for a breadcrumb when dimension decode fails. Removed from the public contract.

#5 — agent/conversation_compression.py:687 — Short-circuit: dimension decode only when len(url) <= target_bytes (the tiny-but-tall case where the byte gate alone would miss the violation). Polarity was initially inverted in my first polish attempt; corrected to fire on the right side per the comment.

#6 — tools/vision_tools.py:336 — bricks grammar.

#7 — tests/tools/test_oversized_image_dimension.py:32 — Removed unused imports Optional and patch.

Local test runs:

$ pytest tests/tools/test_oversized_image_dimension.py
============================= 11 passed in 0.60s ==============================

All 7 comments addressed; no skips.

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: REQUEST CHANGES — superseded by main, and regresses the Pillow dependency

Thanks for the thorough write-up and the dedicated regression test file — the original root-cause analysis (Anthropic enforces a per-side 8000px cap independently of the 5 MB byte cap, and a tall-but-small screenshot bricks immutable history on a non-retryable 400) is exactly right.

Unfortunately this branch was cut before the equivalent fix already landed on main, so all four code patches are now duplicates of existing main code — in a divergent, conflicting form — and the pyproject.toml change is an outright regression. Recommending this be closed (or rebased down to just the test file).

1. error_classifier.py — already on main

main's _IMAGE_TOO_LARGE_PATTERNS already contains all three dimension patterns this PR adds ("image dimensions exceed", "dimensions exceed max allowed size", "max allowed size: 8000"). The PR re-adds them in a different order/formatting → pure merge conflict, zero new behaviour.

2. tools/vision_tools.py — conflicting reimplementation

main already ships _EMBED_MAX_DIMENSION = 7900, an _image_exceeds_dimension(image_path, max_dimension) helper, and a max_dimension-aware _resize_image_for_vision that pre-checks pixels before the byte loop. This PR introduces a second, divergent implementation: a differently-named _MAX_IMAGE_DIMENSION, a different _image_exceeds_dimension(width, height, max_dim=...) signature, and a tempfile.mkstemp/ExitStack pre-cap pass. Two incompatible versions of the same helper — these cannot both exist.

3. agent/conversation_compression.py — already on main

main's reactive _shrink_data_url already decodes pixels and forces a shrink on dimension violation (needs_shrink, max_dimension = 8000). The PR's dim_exceeds rewrite is a divergent duplicate.

4. pyproject.tomlactive regression (blocking)

This is the most important point. main deliberately promoted Pillow==12.2.0 to a CORE dependency and reduced vision = [] to a no-op back-compat alias, specifically because the optional/lazy-install design deadlocked the CLI under prompt_toolkit (#40490) and left the shrink paths silently no-op'ing on the default vision path. This PR reverts that decision:

  • Re-gates Pillow behind the optional [vision] extra (reintroducing the #40490 deadlock class and the silent-no-op failure mode main fixed), and
  • Downgrades Pillow 12.2.0 → 11.1.0.

Merging this would undo a deliberate fix.

5. scripts/release.py — already on main + undisclosed

The fearvox1015@gmail.com AUTHOR_MAP entry is already present on main. It's also not mentioned anywhere in the PR description.

Minor integrity note

PR body claims "5 files changed, 543 insertions(+), 6 deletions(-)" but the actual diff is 6 files, 701 insertions(+), 93 deletions(-), and the embedded pytest "11 passed" output predates the current main.

Salvageable

The one thing main lacks is the dedicated tests/tools/test_oversized_image_dimension.py. If the contributor wants, that test file could be rebased onto main and adapted to main's actual API (_EMBED_MAX_DIMENSION, _image_exceeds_dimension(image_path, max_dimension)) — that would be a welcome standalone PR. Everything else here should be dropped.

Verdict: Request changes / close — the fix is already in main, and the dependency change regresses #40490.

@Fearvox

Fearvox commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Review: REQUEST CHANGES — superseded by main, and regresses the Pillow dependency

Thanks for the thorough write-up and the dedicated regression test file — the original root-cause analysis (Anthropic enforces a per-side 8000px cap independently of the 5 MB byte cap, and a tall-but-small screenshot bricks immutable history on a non-retryable 400) is exactly right.

Unfortunately this branch was cut before the equivalent fix already landed on main, so all four code patches are now duplicates of existing main code — in a divergent, conflicting form — and the pyproject.toml change is an outright regression. Recommending this be closed (or rebased down to just the test file).

1. error_classifier.py — already on main

main's _IMAGE_TOO_LARGE_PATTERNS already contains all three dimension patterns this PR adds ("image dimensions exceed", "dimensions exceed max allowed size", "max allowed size: 8000"). The PR re-adds them in a different order/formatting → pure merge conflict, zero new behaviour.

2. tools/vision_tools.py — conflicting reimplementation

main already ships _EMBED_MAX_DIMENSION = 7900, an _image_exceeds_dimension(image_path, max_dimension) helper, and a max_dimension-aware _resize_image_for_vision that pre-checks pixels before the byte loop. This PR introduces a second, divergent implementation: a differently-named _MAX_IMAGE_DIMENSION, a different _image_exceeds_dimension(width, height, max_dim=...) signature, and a tempfile.mkstemp/ExitStack pre-cap pass. Two incompatible versions of the same helper — these cannot both exist.

3. agent/conversation_compression.py — already on main

main's reactive _shrink_data_url already decodes pixels and forces a shrink on dimension violation (needs_shrink, max_dimension = 8000). The PR's dim_exceeds rewrite is a divergent duplicate.

4. pyproject.tomlactive regression (blocking)

This is the most important point. main deliberately promoted Pillow==12.2.0 to a CORE dependency and reduced vision = [] to a no-op back-compat alias, specifically because the optional/lazy-install design deadlocked the CLI under prompt_toolkit (#40490) and left the shrink paths silently no-op'ing on the default vision path. This PR reverts that decision:

Merging this would undo a deliberate fix.

5. scripts/release.py — already on main + undisclosed

The fearvox1015@gmail.com AUTHOR_MAP entry is already present on main. It's also not mentioned anywhere in the PR description.

Minor integrity note

PR body claims "5 files changed, 543 insertions(+), 6 deletions(-)" but the actual diff is 6 files, 701 insertions(+), 93 deletions(-), and the embedded pytest "11 passed" output predates the current main.

Salvageable

The one thing main lacks is the dedicated tests/tools/test_oversized_image_dimension.py. If the contributor wants, that test file could be rebased onto main and adapted to main's actual API (_EMBED_MAX_DIMENSION, _image_exceeds_dimension(image_path, max_dimension)) — that would be a welcome standalone PR. Everything else here should be dropped.

Verdict: Request changes / close — the fix is already in main, and the dependency change regresses #40490.

Thank you Austin! I believe the reason why this looks like a duplicate PR is that this fix was shipped but this PR remained opened. Closing it now and please see https://github.com/NousResearch/hermes-agent/issues/37677 for reference!

@Fearvox Fearvox closed this Jun 10, 2026
@Fearvox

Fearvox commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

#37677

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

4 participants