Skip to content

[Bug]: vision_analyze retries non-retryable 4xx (404/403) image downloads 3x with backoff #32296

@jhsmith409

Description

@jhsmith409

Bug Description

vision_analyze retries non-retryable HTTP 4xx responses. When _download_image()
(tools/vision_tools.py) gets a 404/403, response.raise_for_status() raises an
httpx.HTTPStatusError that is caught by a broad except Exception, which then retries the
identical URL 3× with 2s/4s exponential backoff before failing. A 4xx for a fixed URL is
deterministic — the retries can never succeed, so they only add ~6s of latency and flood the
logs with duplicate ERROR tracebacks.

Observed in a gateway session (the model had constructed a bad Wikimedia Commons thumbnail
URL — wrong hash prefix + a non-existent thumb size):

WARNING tools.vision_tools: Image download failed (attempt 1/3): Client error '404 Not Found' ...
WARNING tools.vision_tools: Retrying in 2s...
WARNING tools.vision_tools: Image download failed (attempt 2/3): Client error '404 Not Found' ...
WARNING tools.vision_tools: Retrying in 4s...
ERROR   tools.vision_tools: Image download failed after 3 attempts: Client error '404 Not Found' ...

(The model fabricating the URL is separate model behavior — this issue is only about the
download layer wasting retries on a response that can't change.)

Steps to Reproduce

  1. Call vision_analyze with an image_url that returns 404 (any stable dead image URL).
  2. Watch the logs: 3 attempts with 2s + 4s sleeps between them.

Expected Behavior

Fail fast on non-retryable 4xx client errors (404, 403, etc.) — one request, no backoff.
429 Too Many Requests is the one 4xx worth retrying with backoff, so it should be preserved.

Actual Behavior

All 4xx are retried max_retries (3) times with exponential backoff, adding ~6s and
duplicate ERROR tracebacks before surfacing the same error.

Affected Component

Tools — tools/vision_tools.py::_download_image

Version

v2026.5.16

Proposed Fix

In the retry loop, re-raise immediately when the caught exception is an
httpx.HTTPStatusError with a 4xx status other than 429. PR to follow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/toolsTool registry, model_tools, toolsetstool/visionVision analysis and image generationtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions