fix(vision): cap image pixel dimensions at 7900px on embed + reactive recover#37701
fix(vision): cap image pixel dimensions at 7900px on embed + reactive recover#37701Fearvox wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR prevents Anthropic vision requests from being permanently “bricked” by tall-but-small images that exceed the provider’s 8000px longest-side cap, adding proactive and reactive resizing plus regression tests.
Changes:
- Add a pixel-dimension cap helper and apply dimension-based pre-resizing in vision image handling.
- Extend error classification to detect Anthropic’s dimension-cap 400 responses as
image_too_large. - Add regression tests and introduce an opt-in
visionextra to install Pillow.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/vision_tools.py | Adds dimension-cap logic and pre-resizing to keep images under Anthropic’s pixel limit. |
| agent/error_classifier.py | Recognizes dimension-cap error messages as image-too-large for proper recovery. |
| agent/conversation_compression.py | Extends shrinker to decode images and shrink on dimension violations, not only bytes. |
| tests/tools/test_oversized_image_dimension.py | Adds regression tests covering classifier patterns, dimension helper, and resizing paths. |
| pyproject.toml | Adds opt-in vision extra with Pillow for dimension-aware resizing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import tempfile as _tf | ||
| _precap_path = Path(_tf.NamedTemporaryFile( | ||
| prefix="hermes_precap_", suffix=".png", delete=False, | ||
| ).name) | ||
| with _PILImage_pre.open(image_path) as _img: | ||
| _img = _img.resize((new_w, new_h), _PILImage_pre.LANCZOS) | ||
| # Preserve mime if it was PNG; otherwise convert to RGB JPEG. | ||
| _fmt = "PNG" if (mime_type or "").endswith("png") else "JPEG" | ||
| if _fmt == "JPEG" and _img.mode in {"RGBA", "P"}: | ||
| _img = _img.convert("RGB") | ||
| _img.save(str(_precap_path), format=_fmt) | ||
| image_path = _precap_path |
| _precap_path: Optional[Path] = None | ||
| try: | ||
| from PIL import Image as _PILImage_pre # type: ignore | ||
| with _PILImage_pre.open(image_path) as _dim_probe: | ||
| _w, _h = _dim_probe.size | ||
| if _image_exceeds_dimension(_w, _h): | ||
| scale = _MAX_IMAGE_DIMENSION / max(_w, _h) | ||
| new_w = max(int(_w * scale), 1) | ||
| new_h = max(int(_h * scale), 1) | ||
| logger.info( | ||
| "Image %dx%d exceeds %dpx cap, pre-resizing to %dx%d " | ||
| "before byte-sizing pass", | ||
| _w, _h, _MAX_IMAGE_DIMENSION, new_w, new_h, | ||
| ) | ||
| import tempfile as _tf | ||
| _precap_path = Path(_tf.NamedTemporaryFile( | ||
| prefix="hermes_precap_", suffix=".png", delete=False, | ||
| ).name) | ||
| with _PILImage_pre.open(image_path) as _img: | ||
| _img = _img.resize((new_w, new_h), _PILImage_pre.LANCZOS) | ||
| # Preserve mime if it was PNG; otherwise convert to RGB JPEG. | ||
| _fmt = "PNG" if (mime_type or "").endswith("png") else "JPEG" | ||
| if _fmt == "JPEG" and _img.mode in {"RGBA", "P"}: | ||
| _img = _img.convert("RGB") | ||
| _img.save(str(_precap_path), format=_fmt) | ||
| image_path = _precap_path | ||
| except ImportError: | ||
| pass | ||
| except Exception as _exc: | ||
| logger.debug("pre-cap decode failed for %s: %s", image_path, _exc) |
| decode_error: Exception | None = None | ||
| dim_exceeds = False | ||
| try: | ||
| from tools.vision_tools import _image_exceeds_dimension as _dim_check | ||
| except ImportError: | ||
| _dim_check = None # type: ignore | ||
| if _dim_check is not None: | ||
| try: | ||
| import base64 as _b64_dim | ||
| _header, _, _data = url.partition(",") | ||
| _raw = _b64_dim.b64decode(_data) | ||
| import io as _io | ||
| from PIL import Image as _PILImage # type: ignore | ||
| with _PILImage.open(_io.BytesIO(_raw)) as _img: | ||
| _w, _h = _img.size | ||
| if _dim_check(_w, _h): | ||
| dim_exceeds = True | ||
| except ImportError: |
| # the resize itself will be a no-op below. Defer to | ||
| # the byte-only gate in that case. | ||
| pass | ||
| except Exception as exc: |
| if _dim_check is not None: | ||
| try: | ||
| import base64 as _b64_dim | ||
| _header, _, _data = url.partition(",") | ||
| _raw = _b64_dim.b64decode(_data) | ||
| import io as _io | ||
| from PIL import Image as _PILImage # type: ignore | ||
| with _PILImage.open(_io.BytesIO(_raw)) as _img: | ||
| _w, _h = _img.size | ||
| if _dim_check(_w, _h): | ||
| dim_exceeds = True |
| # Anthropic also enforces a per-image **pixel-dimension** ceiling of 8000 | ||
| # on the longest side. A tall full-page screenshot (e.g. 1200×12000) can | ||
| # be well under 5 MB yet vastly over 8000px, and once baked into history | ||
| # it brick the session on every subsequent replay (the 400 is |
| from typing import Optional | ||
| from unittest.mock import patch |
… recover Anthropic enforces two independent per-image ceilings: 1. 5 MB encoded base64 byte size 2. 8000px longest side Pre-fix, every image-oversize guard in Hermes reasoned about BYTES ONLY. A tall full-page screenshot (e.g. 1200×12000 at 0.06 MB) is well under 5 MB but vastly over 8000px, slipped through every guard, got baked into immutable conversation history, and bricked the session on every subsequent replay with a non-retryable HTTP 400. Recovery required manually un-pinning the session in sessions.json. See NousResearch#25837 and NousResearch#37677. The fix is four coordinated patches, each covering one layer of the guard chain so any one catches the regression even if another is bypassed: 1. agent/error_classifier.py — _IMAGE_TOO_LARGE_PATTERNS Add the dimension-cap patterns ('image dimensions exceed', 'dimensions exceed max allowed', 'max allowed size: 8000') alongside the existing 5 MB byte patterns. Without this the 400 falls through to a generic non-retryable bucket and the shrink/retry path never fires. 2. tools/vision_tools.py — _MAX_IMAGE_DIMENSION = 7900 constant + _image_exceeds_dimension() helper. Plus a NEW pre-cap pass in _resize_image_for_vision that decodes the image dimensions BEFORE the byte-sizing fast-path and downscales the longest side to 7900px (100px headroom under Anthropic's hard 8000 cap) preserving aspect ratio. The pre-cap writes a downscaled tempfile and rebinds image_path so the subsequent byte pass operates on the resized version. The byte fast-path was previously bailing out on a tiny-but- tall image before the resize logic could run; the pre-cap fixes that ordering bug. 3. tools/vision_tools.py — embed-time dimension check A new proactive check at the vision_analyze_tool / vision site, BEFORE the existing embed-time byte cap. Decodes the image with PIL (soft dependency, no-ops gracefully when Pillow is missing — the reactive recovery in step 4 is the safety net), resizes via _resize_image_for_vision when either axis exceeds 7900px. This prevents the oversized image from ever being baked into history in the first place. 4. agent/conversation_compression.py — reactive shrinker Decodes the image dimensions and shrinks on dimension violation, not just on byte violation. The old gate was 'if len(url) <= target_bytes: return None' which silently returned False for a 0.07 MB / 1000×11000 tool-result. Now the gate is 'if not dim_exceeds and len(url) <= target_bytes: return None'. Falls back to byte-only behaviour if Pillow is unavailable, so the pre-fix regression cannot recur. Plus: a new pyproject.toml [project.optional-dependencies] 'vision' extra pins Pillow==11.1.0. Pillow is the soft dependency that powers steps 2/3/4; without it the dimension-aware paths silently no-op. Per the existing 'smaller dependencies = smaller blast radius' rule, Pillow stays out of core dependencies and is opt-in via 'pip install hermes-agent[vision]'. Tests: tests/tools/test_oversized_image_dimension.py has 11 regression tests covering all four fix sites plus the legacy byte-pattern guard. Uses real PIL Image.new() / save() to construct actual 1200×12000 / 1000×11000 fixtures so the dimension-decode path is exercised end-to-end. The previously missing assertion that '8000px image is over our 7900 target' is also pinned to lock the resize target against accidental drift. Refs: NousResearch#25837, NousResearch#37677
9b3bebb to
21c30b9
Compare
Seven Copilot inline review comments on NousResearch#37701, all worth landing in a polish pass before merge: 1. tools/vision_tools.py:432 — temp file leak + Windows handle. Switched tempfile.NamedTemporaryFile(...).name (which leaves a stale open file handle on Windows) to tempfile.mkstemp + explicit os.close(fd) before PIL reopens the path. The hard-coded .png suffix is now derived from the actual format we save (PNG vs JPEG), so downstream MIME detection that reads the file extension doesn't get lied to. 2. tools/vision_tools.py:436 — pre-cap tempfile orphaned on Pillow-resize path. Pre-fix, the cleanup only ran on the early fast-path return; the Pillow-resize loop returned a candidate with the tempfile still on disk, accumulating orphans. Fix uses contextlib.ExitStack: register the unlink as a callback when the tempfile is created, run the entire function body in a try/finally that calls _precap_stack.__exit__() exactly once at the end (regardless of which return path was taken). Failure paths also pop_all() the stack to avoid double-unlink. 3. agent/conversation_compression.py:688 — decode_error was assigned but never used. Now passed to logger.debug() so the operator has a breadcrumb when dimension decode fails. Removed the variable name from the public docstring contract. 4. agent/conversation_compression.py:694 — same as NousResearch#3 (duplicate on a different line of the pre-fix version). Same fix. 5. agent/conversation_compression.py:687 — short-circuit by byte condition first, only perform dimension decode when URL is within target_bytes (the tiny-but-tall case). Polarity was initially inverted in my first polish pass; corrected so the decode fires when the byte gate alone would miss the violation (the exact case NousResearch#25837/NousResearch#37677 reported). 6. tools/vision_tools.py:336 — grammar: "it brick the session" → "it bricks the session". One-character fix. 7. tests/tools/test_oversized_image_dimension.py:32 — unused imports Optional and patch. Removed. All 252 tests on this branch pass (11 vision + 29 reconnect + 47 minimax + 165 api_key_providers; test_platform_reconnect_fd_leak lives on the separate NousResearch#37679 fd-leak branch). Diff stat: 3 files changed, 205 insertions, 135 deletions.
|
Addressed all 7 Copilot review comments in a single polish commit ( ✅ #1 — tools/vision_tools.py:432 — Switched ✅ #2 — tools/vision_tools.py:436 — Pre-cap tempfile was orphaned on Pillow-resize path. Fixed via ✅ #3, #4 — agent/conversation_compression.py:688, 694 — ✅ #5 — agent/conversation_compression.py:687 — Short-circuit: dimension decode only when ✅ #6 — tools/vision_tools.py:336 — ✅ #7 — tests/tools/test_oversized_image_dimension.py:32 — Removed unused imports Local test runs: All 7 comments addressed; no skips. |
austinpickett
left a comment
There was a problem hiding this comment.
Review: REQUEST CHANGES — superseded by main, and regresses the Pillow dependency
Thanks for the thorough write-up and the dedicated regression test file — the original root-cause analysis (Anthropic enforces a per-side 8000px cap independently of the 5 MB byte cap, and a tall-but-small screenshot bricks immutable history on a non-retryable 400) is exactly right.
Unfortunately this branch was cut before the equivalent fix already landed on main, so all four code patches are now duplicates of existing main code — in a divergent, conflicting form — and the pyproject.toml change is an outright regression. Recommending this be closed (or rebased down to just the test file).
1. error_classifier.py — already on main
main's _IMAGE_TOO_LARGE_PATTERNS already contains all three dimension patterns this PR adds ("image dimensions exceed", "dimensions exceed max allowed size", "max allowed size: 8000"). The PR re-adds them in a different order/formatting → pure merge conflict, zero new behaviour.
2. tools/vision_tools.py — conflicting reimplementation
main already ships _EMBED_MAX_DIMENSION = 7900, an _image_exceeds_dimension(image_path, max_dimension) helper, and a max_dimension-aware _resize_image_for_vision that pre-checks pixels before the byte loop. This PR introduces a second, divergent implementation: a differently-named _MAX_IMAGE_DIMENSION, a different _image_exceeds_dimension(width, height, max_dim=...) signature, and a tempfile.mkstemp/ExitStack pre-cap pass. Two incompatible versions of the same helper — these cannot both exist.
3. agent/conversation_compression.py — already on main
main's reactive _shrink_data_url already decodes pixels and forces a shrink on dimension violation (needs_shrink, max_dimension = 8000). The PR's dim_exceeds rewrite is a divergent duplicate.
4. pyproject.toml — active regression (blocking)
This is the most important point. main deliberately promoted Pillow==12.2.0 to a CORE dependency and reduced vision = [] to a no-op back-compat alias, specifically because the optional/lazy-install design deadlocked the CLI under prompt_toolkit (#40490) and left the shrink paths silently no-op'ing on the default vision path. This PR reverts that decision:
- Re-gates Pillow behind the optional
[vision]extra (reintroducing the #40490 deadlock class and the silent-no-op failure modemainfixed), and - Downgrades
Pillow 12.2.0 → 11.1.0.
Merging this would undo a deliberate fix.
5. scripts/release.py — already on main + undisclosed
The fearvox1015@gmail.com AUTHOR_MAP entry is already present on main. It's also not mentioned anywhere in the PR description.
Minor integrity note
PR body claims "5 files changed, 543 insertions(+), 6 deletions(-)" but the actual diff is 6 files, 701 insertions(+), 93 deletions(-), and the embedded pytest "11 passed" output predates the current main.
Salvageable
The one thing main lacks is the dedicated tests/tools/test_oversized_image_dimension.py. If the contributor wants, that test file could be rebased onto main and adapted to main's actual API (_EMBED_MAX_DIMENSION, _image_exceeds_dimension(image_path, max_dimension)) — that would be a welcome standalone PR. Everything else here should be dropped.
Verdict: Request changes / close — the fix is already in main, and the dependency change regresses #40490.
Thank you Austin! I believe the reason why this looks like a duplicate PR is that this fix was shipped but this PR remained opened. Closing it now and please see |
Problem
Anthropic enforces two independent per-image ceilings:
Pre-fix, every image-oversize guard in Hermes reasoned about bytes only. A tall full-page screenshot (e.g. 1200×12000 at 0.06 MB) is well under 5 MB but vastly over 8000px, slipped through every guard, got baked into immutable conversation history, and bricked the session on every subsequent replay with a non-retryable HTTP 400. Recovery required manually un-pinning the session in
sessions.json. See #25837 and #37677 for the original reports.Fix (four coordinated patches, each covers a different layer)
agent/error_classifier.py—_IMAGE_TOO_LARGE_PATTERNSadds the dimension-cap patterns ("image dimensions exceed","dimensions exceed max allowed","max allowed size: 8000") alongside the existing 5 MB byte patterns. Without this the 400 falls through to a generic non-retryable bucket and the shrink/retry path never fires.tools/vision_tools.py— new_MAX_IMAGE_DIMENSION = 7900constant +_image_exceeds_dimension()helper. Plus a new pre-cap pass in_resize_image_for_visionthat decodes the image dimensions before the byte-sizing fast-path and downscales the longest side to 7900px (100px headroom under Anthropic's hard 8000 cap) preserving aspect ratio. The pre-cap writes a downscaled tempfile and rebindsimage_pathso the subsequent byte pass operates on the resized version. The byte fast-path was previously bailing out on a tiny-but-tall image before the resize logic could run; the pre-cap fixes that ordering bug.tools/vision_tools.py— new proactive dimension check at thevision_analyze_toolembed site, before the existing embed-time byte cap. Decodes the image with PIL (soft dependency, no-ops gracefully when missing — the reactive recovery in step 4 is the safety net), resizes via_resize_image_for_visionwhen either axis exceeds 7900px. Prevents the oversized image from ever being baked into history.agent/conversation_compression.py— reactive shrinker decodes the image dimensions and shrinks on dimension violation, not just on byte violation. Old gate wasif len(url) <= target_bytes: return Nonewhich silently returned False for a 0.07 MB / 1000×11000 tool-result. New gate isif not dim_exceeds and len(url) <= target_bytes: return None. Falls back to byte-only behaviour if Pillow is unavailable, so the pre-fix regression cannot recur.Plus: Pillow as a vision extra
pyproject.toml[project.optional-dependencies]adds a newvisionextra pinningPillow==11.1.0. Pillow is the soft dependency that powers steps 2/3/4; without it the dimension-aware paths silently no-op. Per the existing "smaller dependencies = smaller blast radius" rule, Pillow stays out of core dependencies and is opt-in viapip install hermes-agent[vision].Tests
tests/tools/test_oversized_image_dimension.py— 11 regression tests covering all four fix sites plus legacy byte-pattern guard. Uses realPIL.Image.new()/save()to construct actual 1200×12000 / 1000×11000 fixtures so the dimension-decode path is exercised end-to-end. Pinned8000px image is over our 7900 targetso the resize target doesn't drift.Diff stat:
5 files changed, 543 insertions(+), 6 deletions(-)— including a 213-line test file that exercises the real bug reproducer.Refs