fix(kimi): drop client-side temperature overrides for Kimi/Moonshot models by kshitijk4poor · Pull Request #13137 · NousResearch/hermes-agent

kshitijk4poor · 2026-04-20T18:23:52Z

Summary

The Kimi gateway selects the correct temperature server-side based on the active mode (thinking on → 1.0, thinking off → 0.6). Client-side clamping is no longer needed and would conflict if the gateway changes its defaults.

Changes

agent/auxiliary_client.py — Removed all Kimi temperature forcing infrastructure:

_FIXED_TEMPERATURE_MODELS dict (kimi-for-coding → 0.6)
_KIMI_INSTANT_MODELS frozenset (kimi-k2.5, turbo-preview, 0905-preview → 0.6)
_KIMI_THINKING_MODELS frozenset (k2-thinking, k2-thinking-turbo → 1.0)
_KIMI_PUBLIC_API_OVERRIDES dict (kimi-k2.5 on moonshot.ai → 1.0)
All Kimi-specific branches in _fixed_temperature_for_model()

The function signature is preserved (returns None for all models) so callers don't need changes — they already guard with if fixed_temperature is not None:.

run_agent.py — Updated stale comment referencing "kimi-for-coding → 0.6".

Tests (6 files) — Replaced all "forces temperature" tests with "preserves caller temperature" / "no temperature in kwargs" assertions:

tests/agent/test_auxiliary_client.py — TestKimiTemperatureNotForced (was TestKimiForCodingTemperature)
tests/run_agent/test_run_agent.py — 3 tests updated
tests/run_agent/test_provider_parity.py — TestBuildApiKwargsKimiNoTemperatureOverride
tests/test_trajectory_compressor.py — 3 tests updated
tests/test_trajectory_compressor_async.py — 3 tests updated
tests/test_mini_swe_runner.py — 2 tests updated

Net: -122 lines (94 added, 216 removed)

Test plan

27 targeted kimi temperature tests pass (sync)
8 async trajectory compressor tests pass
968 passed in broader suite
E2E validation with real imports confirms _fixed_temperature_for_model() returns None for all Kimi models regardless of base URL
Pre-existing CI failures unrelated to this change (insights, gemini catalog, config version, plugin head, approval)

…odels The Kimi gateway selects the correct temperature server-side based on the active mode (thinking on → 1.0, thinking off → 0.6). Client-side clamping is no longer needed and would conflict if the gateway changes its defaults. Removed: - _FIXED_TEMPERATURE_MODELS, _KIMI_INSTANT_MODELS, _KIMI_THINKING_MODELS, _KIMI_PUBLIC_API_OVERRIDES maps from auxiliary_client.py - All Kimi-specific branches in _fixed_temperature_for_model() — the function now always returns None (kept for future non-Kimi contracts) Callers already guard with 'if fixed_temperature is not None:' so the change is transparent — temperature is simply omitted from API calls, letting the Kimi gateway use its own defaults. Updated tests across 5 files to verify temperature is NOT forced.

@kshitijk4poor

Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR #13137 (@kshitijk4poor).

teknium1 · 2026-04-20T19:23:14Z

Merged via PR #13157 (#13157), which builds on your approach but goes further — instead of removing the forced values and passing the caller's temperature through, it strips the temperature key from API kwargs entirely for all kimi-* models using an OMIT_TEMPERATURE sentinel. This ensures Kimi's gateway has full control over temperature selection. Thanks for identifying that the client-side clamping was no longer needed!

@kshitijk4poor

…search#13157) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR NousResearch#13137 (@kshitijk4poor).

@kshitijk4poor

…search#13157) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR NousResearch#13137 (@kshitijk4poor).

@kshitijk4poor

…search#13157) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR NousResearch#13137 (@kshitijk4poor).

@kshitijk4poor

…search#13157) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR NousResearch#13137 (@kshitijk4poor).

@kshitijk4poor

…search#13157) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR NousResearch#13137 (@kshitijk4poor).

@kshitijk4poor

…search#13157) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR NousResearch#13137 (@kshitijk4poor).

kshitijk4poor force-pushed the fix/kimi-drop-temperature branch from 3276ecd to ed201cc Compare April 20, 2026 18:30

teknium1 mentioned this pull request Apr 20, 2026

fix(kimi): omit temperature entirely for Kimi/Moonshot models #13157

Merged

teknium1 closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kimi): drop client-side temperature overrides for Kimi/Moonshot models#13137

fix(kimi): drop client-side temperature overrides for Kimi/Moonshot models#13137
kshitijk4poor wants to merge 1 commit into
mainfrom
fix/kimi-drop-temperature

kshitijk4poor commented Apr 20, 2026 •

edited

Loading

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Net: -122 lines (94 added, 216 removed)

Test plan

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kshitijk4poor commented Apr 20, 2026 •

edited

Loading