Skip to content

feat(vision): downscale attached images before sending; add detail knob#4210

Merged
esengine merged 1 commit into
main-v2from
feat/vision-image-downscale
Jun 12, 2026
Merged

feat(vision): downscale attached images before sending; add detail knob#4210
esengine merged 1 commit into
main-v2from
feat/vision-image-downscale

Conversation

@esengine

Copy link
Copy Markdown
Owner

Follow-up to #4204. That PR could send images but did nothing about size — an attached photo went out at full resolution (up to the 10 MB cap), wasting both request bytes and image tokens, since vision models downscale server-side anyway.

What changed

  • Downscale before send (internal/control). A vision-only path, visionImageDataURL, downscales an oversized image to 1568px on its longest side (where OpenAI/Anthropic cap server-side) and re-encodes it: PNG/GIF stay lossless (screenshots, text, transparency), JPEG/WebP → JPEG q85. Guarded against decompression bombs (DecodeConfig dimension check before decode). Best-effort — an undecodable format (bmp/tiff/svg) passes through untouched. The desktop preview path (ImageDataURL) is left at full resolution; only the model-send path shrinks.
  • vision_detail knob — a per-model low|high config flag sets the openai image_url.detail hint (empty = auto, field omitted). low pins an image to a fixed ~85 tokens for cheap coarse reads; anthropic has no such knob and ignores it.

On compression headers (the original ask)

Request-body gzip was considered and deliberately skipped: it only shrinks the wire (~25%, undoing the base64 inflation) and is provider-support-dependent (OpenAI doesn't guarantee gzipped request bodies — sending one risks a 400), and it does nothing for tokens or context. Downscaling cuts both bytes and tokens, so it's the right lever.

Also

Corrects the comment introduced in #4204 that said embedding images "breaks prefix-cache stability." They don't — images are vision-gated, append-only, and byte-stable across turns, so the prefix cache is unaffected; the real concern was always token cost.

Tests

internal/control: oversized PNG → 1568px (pixel count reduced), in-budget image passes through verbatim, JPEG stays JPEG, undecodable mime passes through. openai: detail emitted from config, omitted by default. go vet + touched packages green locally; both go.mod/go.sum (main + desktop module) tidied for golang.org/x/image.

Follow-up to #4204. Sent images had no size control — an attached photo went out
at full resolution (up to the 10 MB cap), wasting request bytes and image tokens
since vision models downscale server-side anyway.

- internal/control: a vision-only send path (visionImageDataURL) downscales an
  oversized image to 1568px on its longest side and re-encodes it — PNG/GIF stay
  lossless (screenshots, text, transparency), JPEG/WebP go to JPEG q85 — guarded
  against decompression bombs. Best-effort: an undecodable format passes through
  untouched. The desktop preview path (ImageDataURL) is unchanged, full res.
- A per-model `vision_detail` (low|high) config flag sets the openai image_url
  detail hint; empty = auto/omit. "low" pins an image to ~85 tokens.
- Deliberately no request-body gzip: it only helps the wire (~25%, and
  provider-support-dependent) and nothing for tokens, so downscaling is the lever.

Also corrects the #4204 comment that claimed images "break prefix-cache
stability" — they don't (vision-gated, append-only, byte-stable); the real
concern was always cost.
@esengine esengine requested a review from SivanCola as a code owner June 12, 2026 15:29
@github-actions github-actions Bot added desktop Wails desktop app (desktop/**) agent Core agent loop (internal/agent, internal/control) config Configuration & setup (internal/config) provider Model providers & selection (internal/provider) v2 Go rewrite (1.x) — main-v2 branch, active development labels Jun 12, 2026
@esengine esengine merged commit 62645d1 into main-v2 Jun 12, 2026
14 checks passed
@esengine esengine deleted the feat/vision-image-downscale branch June 12, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) config Configuration & setup (internal/config) desktop Wails desktop app (desktop/**) provider Model providers & selection (internal/provider) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant