Skip to content

fix(auxiliary): retry on temperature deprecation + memoize rejecting models#12523

Closed
avaclaw1 wants to merge 1 commit into
NousResearch:mainfrom
avaclaw1:fix/aux-client-temperature-retry
Closed

fix(auxiliary): retry on temperature deprecation + memoize rejecting models#12523
avaclaw1 wants to merge 1 commit into
NousResearch:mainfrom
avaclaw1:fix/aux-client-temperature-retry

Conversation

@avaclaw1

Copy link
Copy Markdown

Summary

The _forbids_sampling_params check in agent/anthropic_adapter.py statically matches the substrings 4-7/4.7, so other restricted families still forward temperature and 400 with `temperature` is deprecated for this model. Confirmed for claude-haiku-4-5-20251001 (observed in production logs); per internal comments, claude-opus-4-6 also rejects it.

Adds a runtime layer of defence in _AnthropicCompletionsAdapter.create() that complements the two existing static strips:

  1. _forbids_sampling_params(model) guard at the call site (unchanged).
  2. Safety-net strip inside build_anthropic_kwargs (unchanged).
  3. New: catch the specific 400, drop temperature, retry once, and memoize the model name in a module-level _TEMP_UNSUPPORTED_MODELS set so subsequent calls skip the param pre-emptively.

One-time ~30ms retry per (model, process) pair; zero overhead on subsequent calls. No regression on older models that still accept temperature (they keep their configured values — avoids quality loss on OCR / vision defaults that rely on temperature=0.1). Self-healing against future restricted families with no code change needed.

Why not just expand the static substring list

Tempting, but 4-5 matches claude-sonnet-4-5 (accepts temperature) as well as claude-haiku-4-5-20251001 (rejects it), so substrings alone can't disambiguate. A runtime-learned set avoids that brittleness and stays correct when Anthropic ships a new restricted family tomorrow.

Test plan

  • tests/agent/test_auxiliary_temperature_retry.py — 4 new tests:
    • deprecated-temperature 400 is retried without the param, response normalized
    • model added to _TEMP_UNSUPPORTED_MODELS and skipped on the next call
    • unrelated BadRequestError propagates unchanged; model not cached
    • deprecated-temperature 400 without temperature in kwargs does not infinite-loop
  • Full tests/agent/ suite (1,402 tests) — all pass
  • Full aux-client-specific suites — all pass

🤖 Generated with Claude Code

…g models

`_forbids_sampling_params` statically matches "4-7"/"4.7" and misses other
families that also 400 with "`temperature` is deprecated for this model."
(confirmed for Haiku 4.5; per upstream, Opus 4.6 rejects it too).

Add a third layer of defence next to the existing static + safety-net strips:
catch the specific 400 from `messages.create`, drop `temperature`, retry once,
and cache the model in `_TEMP_UNSUPPORTED_MODELS` so later calls in the same
process skip the param pre-emptively.

- One-time ~30ms retry per (model, process) pair; zero latency after that.
- Self-healing for future restricted families — no code change required.
- Older models keep their `temperature` value (no OCR/vision quality regression).

Covered by tests for the retry path, memoization, unrelated 400 passthrough,
and the no-op case when `temperature` wasn't sent.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/anthropic Anthropic native Messages API labels Apr 23, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the detailed write-up and the reproduction evidence — this is a real bug and the analysis is solid.

However, the core fix landed on main independently while this PR was open:

The implementation in agent/auxiliary_client.py (lines 3004–3038 sync, lines 3299–3326 async) covers the same claude-haiku-4-5-20251001 / claude-opus-4-6 temperature-rejection scenario described here. The one thing main does not have is the _TEMP_UNSUPPORTED_MODELS memoization set that would skip temperature proactively on subsequent calls — if you think that micro-optimization is worth a follow-up, feel free to open a focused PR for it.

Closing as implemented. — automated hermes-sweeper review

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/anthropic Anthropic native Messages API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants