feat(vision): downscale attached images before sending; add detail knob#4210
Merged
Conversation
Follow-up to #4204. Sent images had no size control — an attached photo went out at full resolution (up to the 10 MB cap), wasting request bytes and image tokens since vision models downscale server-side anyway. - internal/control: a vision-only send path (visionImageDataURL) downscales an oversized image to 1568px on its longest side and re-encodes it — PNG/GIF stay lossless (screenshots, text, transparency), JPEG/WebP go to JPEG q85 — guarded against decompression bombs. Best-effort: an undecodable format passes through untouched. The desktop preview path (ImageDataURL) is unchanged, full res. - A per-model `vision_detail` (low|high) config flag sets the openai image_url detail hint; empty = auto/omit. "low" pins an image to ~85 tokens. - Deliberately no request-body gzip: it only helps the wire (~25%, and provider-support-dependent) and nothing for tokens, so downscaling is the lever. Also corrects the #4204 comment that claimed images "break prefix-cache stability" — they don't (vision-gated, append-only, byte-stable); the real concern was always cost.
This was referenced Jun 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #4204. That PR could send images but did nothing about size — an attached photo went out at full resolution (up to the 10 MB cap), wasting both request bytes and image tokens, since vision models downscale server-side anyway.
What changed
internal/control). A vision-only path,visionImageDataURL, downscales an oversized image to 1568px on its longest side (where OpenAI/Anthropic cap server-side) and re-encodes it: PNG/GIF stay lossless (screenshots, text, transparency), JPEG/WebP → JPEG q85. Guarded against decompression bombs (DecodeConfigdimension check before decode). Best-effort — an undecodable format (bmp/tiff/svg) passes through untouched. The desktop preview path (ImageDataURL) is left at full resolution; only the model-send path shrinks.vision_detailknob — a per-modellow|highconfig flag sets the openaiimage_url.detailhint (empty =auto, field omitted).lowpins an image to a fixed ~85 tokens for cheap coarse reads; anthropic has no such knob and ignores it.On compression headers (the original ask)
Request-body gzip was considered and deliberately skipped: it only shrinks the wire (~25%, undoing the base64 inflation) and is provider-support-dependent (OpenAI doesn't guarantee gzipped request bodies — sending one risks a 400), and it does nothing for tokens or context. Downscaling cuts both bytes and tokens, so it's the right lever.
Also
Corrects the comment introduced in #4204 that said embedding images "breaks prefix-cache stability." They don't — images are vision-gated, append-only, and byte-stable across turns, so the prefix cache is unaffected; the real concern was always token cost.
Tests
internal/control: oversized PNG → 1568px (pixel count reduced), in-budget image passes through verbatim, JPEG stays JPEG, undecodable mime passes through.openai:detailemitted from config, omitted by default.go vet+ touched packages green locally; both go.mod/go.sum (main + desktop module) tidied forgolang.org/x/image.