fix(auxiliary): retry on temperature deprecation + memoize rejecting models by avaclaw1 · Pull Request #12523 · NousResearch/hermes-agent

avaclaw1 · 2026-04-19T11:08:27Z

Summary

The _forbids_sampling_params check in agent/anthropic_adapter.py statically matches the substrings 4-7/4.7, so other restricted families still forward temperature and 400 with `temperature` is deprecated for this model. Confirmed for claude-haiku-4-5-20251001 (observed in production logs); per internal comments, claude-opus-4-6 also rejects it.

Adds a runtime layer of defence in _AnthropicCompletionsAdapter.create() that complements the two existing static strips:

_forbids_sampling_params(model) guard at the call site (unchanged).
Safety-net strip inside build_anthropic_kwargs (unchanged).
New: catch the specific 400, drop temperature, retry once, and memoize the model name in a module-level _TEMP_UNSUPPORTED_MODELS set so subsequent calls skip the param pre-emptively.

One-time ~30ms retry per (model, process) pair; zero overhead on subsequent calls. No regression on older models that still accept temperature (they keep their configured values — avoids quality loss on OCR / vision defaults that rely on temperature=0.1). Self-healing against future restricted families with no code change needed.

Why not just expand the static substring list

Tempting, but 4-5 matches claude-sonnet-4-5 (accepts temperature) as well as claude-haiku-4-5-20251001 (rejects it), so substrings alone can't disambiguate. A runtime-learned set avoids that brittleness and stays correct when Anthropic ships a new restricted family tomorrow.

Test plan

tests/agent/test_auxiliary_temperature_retry.py — 4 new tests:
- deprecated-temperature 400 is retried without the param, response normalized
- model added to _TEMP_UNSUPPORTED_MODELS and skipped on the next call
- unrelated BadRequestError propagates unchanged; model not cached
- deprecated-temperature 400 without temperature in kwargs does not infinite-loop
Full tests/agent/ suite (1,402 tests) — all pass
Full aux-client-specific suites — all pass

🤖 Generated with Claude Code

…g models `_forbids_sampling_params` statically matches "4-7"/"4.7" and misses other families that also 400 with "`temperature` is deprecated for this model." (confirmed for Haiku 4.5; per upstream, Opus 4.6 rejects it too). Add a third layer of defence next to the existing static + safety-net strips: catch the specific 400 from `messages.create`, drop `temperature`, retry once, and cache the model in `_TEMP_UNSUPPORTED_MODELS` so later calls in the same process skip the param pre-emptively. - One-time ~30ms retry per (model, process) pair; zero latency after that. - Self-healing for future restricted families — no code change required. - Older models keep their `temperature` value (no OCR/vision quality regression). Covered by tests for the retry path, memoization, unrelated 400 passthrough, and the no-op case when `temperature` wasn't sent.

teknium1 · 2026-04-27T04:41:01Z

Thanks for the detailed write-up and the reproduction evidence — this is a real bug and the analysis is solid.

However, the core fix landed on main independently while this PR was open:

PR platforms: split storage from LLM-invocation gate (group-chat 'observe but don't invoke' mode) #15621 (commit facea8455, merged 2026-04-25) added _is_unsupported_temperature_error, wired it into both call_llm and async_call_llm as a reactive retry, and shipped 237 lines of tests in tests/agent/test_unsupported_temperature_retry.py.
PR fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry #15633 (commit 3c1c65e75, merged 2026-04-25) generalized the detector to _is_unsupported_parameter_error(exc, param) so the same retry strategy covers max_tokens and any future restricted param.

The implementation in agent/auxiliary_client.py (lines 3004–3038 sync, lines 3299–3326 async) covers the same claude-haiku-4-5-20251001 / claude-opus-4-6 temperature-rejection scenario described here. The one thing main does not have is the _TEMP_UNSUPPORTED_MODELS memoization set that would skip temperature proactively on subsequent calls — if you think that micro-optimization is worth a follow-up, feel free to open a focused PR for it.

Closing as implemented. — automated hermes-sweeper review

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/anthropic Anthropic native Messages API labels Apr 23, 2026

This was referenced Apr 24, 2026

fix: retry auxiliary calls without unsupported temperature #15416

Closed

fix: retry auxiliary calls without unsupported temperature #15609

Closed

fix(auxiliary): universal retry when any provider rejects temperature #15627

Merged

teknium1 closed this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auxiliary): retry on temperature deprecation + memoize rejecting models#12523

fix(auxiliary): retry on temperature deprecation + memoize rejecting models#12523
avaclaw1 wants to merge 1 commit into
NousResearch:mainfrom
avaclaw1:fix/aux-client-temperature-retry

avaclaw1 commented Apr 19, 2026

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

avaclaw1 commented Apr 19, 2026

Summary

Why not just expand the static substring list

Test plan

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants