Skip to content

fix(vision): clamp image dimensions before inline base64 encode#25838

Open
yoniebans wants to merge 8 commits into
mainfrom
fix/vision-dimension-cap
Open

fix(vision): clamp image dimensions before inline base64 encode#25838
yoniebans wants to merge 8 commits into
mainfrom
fix/vision-dimension-cap

Conversation

@yoniebans

@yoniebans yoniebans commented May 14, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes a session-bricking bug on Anthropic vision. The Messages API rejects images >8000 px on either axis with a non-retryable 400:

At least one of the image dimensions exceed max allowed size: 8000 pixels

The native vision fast path inlined oversized images (e.g. tall page screenshots) into the tool-result envelope before any size check. Once the bad image lands in message history, every subsequent call hits the same error — session unrecoverable without manually editing the JSON.

The fix proportionally clamps oversized images to 7999 px before sending. Critically, the clamp is gated on Anthropic-shaped providers only (native Anthropic + aliases + Claude-routing aggregators: openrouter, nous, vertex, bedrock, anthropic-vertex, google-vertex). Other providers (OpenAI, Gemini, custom hosts) auto-downscale server-side; clamping universally would silently degrade them.

Related Issue

Fixes #25837

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • tools/vision_tools.py_MAX_IMAGE_DIMENSION = 7999; _get_image_dimensions / _image_exceeds_pixel_cap helpers (header-only Pillow read); _ANTHROPIC_IMAGE_PROVIDERS frozenset + _is_anthropic_provider() predicate; _resize_image_for_vision(..., clamp_dimensions=False) kwarg that proportionally shrinks before the byte-size halving loop
  • tools/vision_tools.py — three call sites in _vision_analyze_native and vision_analyze_tool now pass clamp_dimensions=_is_anthropic_provider()
  • tools/browser_tool.py — screenshot resize now gated on Anthropic provider
  • agent/conversation_compression.py — image-shrink recovery path now gated on Anthropic provider
  • tests/tools/test_vision_tools.py — 33 new tests (cap helpers, proportional resize, parametrized provider matrix, wiring regressions)
  • tests/run_agent/test_image_shrink_recovery.py — mock signature aligned with the new kwarg

How to Test

  1. With a vision-capable Anthropic model as the main model, agent calls browser_vision on any page producing a screenshot taller/wider than 8000 px (long articles, wide dashboards).
  2. Before fix: 400 on first call, session permanently broken — every retry replays the bad image.
  3. After fix: image clamped to 7999 px before send; Anthropic accepts the request.

Local verification: 188/188 pass across test_vision_tools.py, test_image_shrink_recovery.py, test_image_rejection_fallback.py, test_image_routing.py. End-to-end live-verified against the Anthropic Messages API (raw 8500×100 PNG rejected with the expected error; clamped 7999×94 accepted).

Checklist

Code

  • Conventional Commits in all commit messages (fix(vision):, test(vision):, refactor(vision):)
  • PR contains only changes related to this fix
  • Tests added for the fix (cap helpers, provider gating, wiring regressions)
  • Tested on Linux (Ubuntu, Python 3.11)

Documentation & Housekeeping

  • No config key changes — N/A cli-config.yaml.example
  • No architecture/workflow changes — N/A CONTRIBUTING.md / AGENTS.md
  • No tool schema changes — vision_analyze / browser_vision external behaviour unchanged
  • Cross-platform: Pillow remains a soft dep (same pattern as other Pillow consumers in the codebase); fix is pure Python

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/vision-dimension-cap vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9027 on HEAD, 9017 on base (🆕 +10)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4766 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Comment thread tools/vision_tools.py
with Image.open(image_path) as img:
return (img.width, img.height)
except Exception as exc:
logger.debug("Could not read image dimensions for %s: %s", image_path, exc)
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists tool/vision Vision analysis and image generation provider/anthropic Anthropic native Messages API labels May 14, 2026
@yoniebans yoniebans force-pushed the fix/vision-dimension-cap branch from 7650717 to ffa991c Compare May 20, 2026 14:50
Anthropic's Messages API rejects any image whose width or height
exceeds 8000 px with a non_retryable_client_error 400:

  messages.N.content.M.image.source.base64.data:
    At least one of the image dimensions exceed max allowed size: 8000 pixels

The native vision fast path inlined oversized screenshots (e.g. tall
or panoramic captures from browser_vision / vision_analyze) directly
into the tool-result envelope before any size check.  Once present in
the message history, every subsequent request replayed the same
oversized image and got the same 400 — permanently bricking the
session, since the error is non-retryable.  Recovery required manually
editing the session JSON to drop the poisoned tool result.

Fix:

  * Add _MAX_IMAGE_DIMENSION = 7999 (one px under Anthropic's cap).
  * Add _get_image_dimensions / _image_exceeds_pixel_cap helpers
    (header-only Pillow read, no full decode).
  * _resize_image_for_vision now clamps proportionally to the cap
    before any byte-size work.
  * Three call sites (native fast path + legacy path initial check)
    trigger resize on dimension overflow as well as byte overflow.

Pillow remains a soft dependency: when missing, the dimension check
returns False and the existing byte-size guard remains the last line
of defence (same behaviour as today).

Adds TestPixelDimensionCap covering the helpers, the Pillow-missing
fallback, and the 10000x100 / 100x10000 regression cases.  All 125
tests pass across vision_tools, vision_native_fast_path,
image_shrink_recovery, and image_rejection_fallback.
@yoniebans yoniebans force-pushed the fix/vision-dimension-cap branch from ffa991c to 2882899 Compare May 21, 2026 17:54
yoniebans added 7 commits May 21, 2026 21:53
Anthropic is the only major provider that hard-rejects >8000 px images.
Clamping unconditionally silently downscaled images for OpenAI/Gemini/custom
hosts that could handle larger inputs. Gate the clamp on the active provider
and add an opt-in clamp_dimensions kwarg to _resize_image_for_vision.
Manual script that hits real Anthropic API to confirm: (1) >8000 px images
are still rejected with the same error message, (2) our clamp produces an
image Anthropic accepts. Run when threshold drift is suspected.
…mpression paths

- Broaden _is_anthropic_provider to cover claude/claude-code aliases and
  aggregators that proxy Claude (openrouter, nous, vertex, bedrock,
  anthropic-vertex, google-vertex) — same set as
  _supports_media_in_tool_results.
- Wire clamp_dimensions through browser_tool screenshot resize and
  conversation_compression image-shrink recovery, both of which were
  bypassing the clamp.
- Promote Pillow-missing log to warning when clamp was requested.
- Add parametrized tests for _is_anthropic_provider covering 19 cases.
…script

- Inline the provider check via _ANTHROPIC_IMAGE_PROVIDERS frozenset
  instead of duplicating the predicate logic in a function body.
- Drop scripts/verify_anthropic_pixel_cap.py — it was a one-off
  development probe, not a repeatable utility. Moved to local workspace.
…ature

The _fake_resize mock in test_image_shrink_recovery.py predates the
clamp_dimensions kwarg on _resize_image_for_vision. Add it to keep the
mock signature aligned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists provider/anthropic Anthropic native Messages API tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: vision_analyze / browser_vision can brick session by inlining oversized image

3 participants