Skip to content

fix(kimi): drop client-side temperature overrides for Kimi/Moonshot models#13137

Closed
kshitijk4poor wants to merge 1 commit into
mainfrom
fix/kimi-drop-temperature
Closed

fix(kimi): drop client-side temperature overrides for Kimi/Moonshot models#13137
kshitijk4poor wants to merge 1 commit into
mainfrom
fix/kimi-drop-temperature

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

The Kimi gateway selects the correct temperature server-side based on the active mode (thinking on → 1.0, thinking off → 0.6). Client-side clamping is no longer needed and would conflict if the gateway changes its defaults.

Changes

agent/auxiliary_client.py — Removed all Kimi temperature forcing infrastructure:

  • _FIXED_TEMPERATURE_MODELS dict (kimi-for-coding → 0.6)
  • _KIMI_INSTANT_MODELS frozenset (kimi-k2.5, turbo-preview, 0905-preview → 0.6)
  • _KIMI_THINKING_MODELS frozenset (k2-thinking, k2-thinking-turbo → 1.0)
  • _KIMI_PUBLIC_API_OVERRIDES dict (kimi-k2.5 on moonshot.ai → 1.0)
  • All Kimi-specific branches in _fixed_temperature_for_model()

The function signature is preserved (returns None for all models) so callers don't need changes — they already guard with if fixed_temperature is not None:.

run_agent.py — Updated stale comment referencing "kimi-for-coding → 0.6".

Tests (6 files) — Replaced all "forces temperature" tests with "preserves caller temperature" / "no temperature in kwargs" assertions:

  • tests/agent/test_auxiliary_client.pyTestKimiTemperatureNotForced (was TestKimiForCodingTemperature)
  • tests/run_agent/test_run_agent.py — 3 tests updated
  • tests/run_agent/test_provider_parity.pyTestBuildApiKwargsKimiNoTemperatureOverride
  • tests/test_trajectory_compressor.py — 3 tests updated
  • tests/test_trajectory_compressor_async.py — 3 tests updated
  • tests/test_mini_swe_runner.py — 2 tests updated

Net: -122 lines (94 added, 216 removed)

Test plan

  • 27 targeted kimi temperature tests pass (sync)
  • 8 async trajectory compressor tests pass
  • 968 passed in broader suite
  • E2E validation with real imports confirms _fixed_temperature_for_model() returns None for all Kimi models regardless of base URL
  • Pre-existing CI failures unrelated to this change (insights, gemini catalog, config version, plugin head, approval)

…odels

The Kimi gateway selects the correct temperature server-side based on the
active mode (thinking on → 1.0, thinking off → 0.6).  Client-side clamping
is no longer needed and would conflict if the gateway changes its defaults.

Removed:
- _FIXED_TEMPERATURE_MODELS, _KIMI_INSTANT_MODELS, _KIMI_THINKING_MODELS,
  _KIMI_PUBLIC_API_OVERRIDES maps from auxiliary_client.py
- All Kimi-specific branches in _fixed_temperature_for_model() — the
  function now always returns None (kept for future non-Kimi contracts)

Callers already guard with 'if fixed_temperature is not None:' so the
change is transparent — temperature is simply omitted from API calls,
letting the Kimi gateway use its own defaults.

Updated tests across 5 files to verify temperature is NOT forced.
@kshitijk4poor kshitijk4poor force-pushed the fix/kimi-drop-temperature branch from 3276ecd to ed201cc Compare April 20, 2026 18:30
teknium1 added a commit that referenced this pull request Apr 20, 2026
Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #13157 (#13157), which builds on your approach but goes further — instead of removing the forced values and passing the caller's temperature through, it strips the temperature key from API kwargs entirely for all kimi-* models using an OMIT_TEMPERATURE sentinel. This ensures Kimi's gateway has full control over temperature selection. Thanks for identifying that the client-side clamping was no longer needed!

@teknium1 teknium1 closed this Apr 20, 2026
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
Luminet2023 pushed a commit to Luminet2023/hermes-agent that referenced this pull request May 1, 2026
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…search#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR NousResearch#13137 (@kshitijk4poor).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants