Skip to content

fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot#13168

Closed
mengjian-github wants to merge 1 commit into
NousResearch:mainfrom
mengjian-github:fix/kimi-reasoning-and-max-tokens
Closed

fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot#13168
mengjian-github wants to merge 1 commit into
NousResearch:mainfrom
mengjian-github:fix/kimi-reasoning-and-max-tokens

Conversation

@mengjian-github

Copy link
Copy Markdown

Problem

Kimi/Moonshot endpoints return "Response truncated due to output length limit" and exhibit inconsistent reasoning behavior when used through Hermes, even though the same API key works perfectly with Kimi CLI.

Root Cause

Compared Hermes API calls against Kimi CLI source code (kimi.py). Three parameters that Kimi CLI sends were missing from Hermes:

Parameter Kimi CLI Hermes (before)
max_tokens 32000 (default in generate()) Not sent — API picks a very low default
reasoning_effort Top-level param Not sent — _supports_reasoning_extra_body() returns False for non-OpenRouter
extra_body.thinking {"type": "enabled"} via with_thinking() Not sent

Without max_tokens, the Kimi gateway uses a small default. Reasoning tokens share that budget, so the model exhausts it on thinking alone → truncation.

Without reasoning_effort and extra_body.thinking, reasoning mode activation is inconsistent — it depends on gateway heuristics rather than explicit client intent.

Changes

run_agent.py — 2 additions in _build_api_kwargs():

  1. max_tokens + reasoning_effort — Added elif branch for api.kimi.com and api.moonshot URLs in the max_tokens chain (after Qwen Portal, before OpenRouter). Sends max_tokens: 32000 (matching Kimi CLI default) and reasoning_effort as a top-level parameter, respecting reasoning_config.effort when set.

  2. extra_body.thinking — Before the existing _supports_reasoning_extra_body() gate, detect Kimi/Moonshot URLs and inject extra_body.thinking = {"type": "enabled"}. Set to "disabled" when reasoning_config.enabled is False.

tests/run_agent/test_run_agent.py — 6 new test cases:

  • test_kimi_coding_endpoint_sends_max_tokens_and_reasoning
  • test_kimi_coding_endpoint_respects_custom_effort
  • test_kimi_coding_endpoint_sends_thinking_extra_body
  • test_kimi_coding_endpoint_disables_thinking
  • test_moonshot_endpoint_sends_max_tokens_and_reasoning

Testing

28 passed in 5.05s  (TestBuildApiKwargs class)

References

…oonshot

Kimi/Moonshot endpoints require explicit parameters that Hermes was not
sending, causing 'Response truncated due to output length limit' errors
and inconsistent reasoning behavior.

Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli,
packages/kosong/src/kosong/chat_provider/kimi.py):

1. max_tokens: Kimi's API defaults to a very low value when omitted.
   Reasoning tokens share the output budget — the model exhausts it on
   thinking alone.  Send 32000, matching Kimi CLI's generate() default.

2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not
   inside extra_body).  Hermes was not sending it at all because
   _supports_reasoning_extra_body() returns False for non-OpenRouter
   endpoints.

3. extra_body.thinking: Kimi CLI uses with_thinking() which sets
   extra_body.thinking={"type":"enabled"} alongside reasoning_effort.
   This is a separate control from the OpenAI-style reasoning extra_body
   that Hermes sends for OpenRouter/GitHub.  Without it, the Kimi gateway
   may not activate reasoning mode correctly.

Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot).

Tests: 6 new test cases for max_tokens, reasoning_effort, and
extra_body.thinking under various configs.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #13503 — your commit was cherry-picked onto current main with your authorship preserved in git log (063bc3c). Thanks @mengjian-github!

Small follow-up adjustments on top of your change:

  • Converted URL checks to base_url_host_matches() to match the hostname-hardening sweep recently merged to main (dbb7e00).
  • Added explicit api.moonshot.cn support alongside api.moonshot.ai.
  • When reasoning_config.enabled=False, we now omit reasoning_effort entirely to mirror upstream Kimi CLI's with_thinking("off")reasoning_effort=None mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants