fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot by mengjian-github · Pull Request #13168 · NousResearch/hermes-agent

mengjian-github · 2026-04-20T19:46:52Z

Problem

Kimi/Moonshot endpoints return "Response truncated due to output length limit" and exhibit inconsistent reasoning behavior when used through Hermes, even though the same API key works perfectly with Kimi CLI.

Root Cause

Compared Hermes API calls against Kimi CLI source code (kimi.py). Three parameters that Kimi CLI sends were missing from Hermes:

Parameter	Kimi CLI	Hermes (before)
`max_tokens`	`32000` (default in `generate()`)	Not sent — API picks a very low default
`reasoning_effort`	Top-level param	Not sent — `_supports_reasoning_extra_body()` returns `False` for non-OpenRouter
`extra_body.thinking`	`{"type": "enabled"}` via `with_thinking()`	Not sent

Without max_tokens, the Kimi gateway uses a small default. Reasoning tokens share that budget, so the model exhausts it on thinking alone → truncation.

Without reasoning_effort and extra_body.thinking, reasoning mode activation is inconsistent — it depends on gateway heuristics rather than explicit client intent.

Changes

run_agent.py — 2 additions in _build_api_kwargs():

max_tokens + reasoning_effort — Added elif branch for api.kimi.com and api.moonshot URLs in the max_tokens chain (after Qwen Portal, before OpenRouter). Sends max_tokens: 32000 (matching Kimi CLI default) and reasoning_effort as a top-level parameter, respecting reasoning_config.effort when set.
extra_body.thinking — Before the existing _supports_reasoning_extra_body() gate, detect Kimi/Moonshot URLs and inject extra_body.thinking = {"type": "enabled"}. Set to "disabled" when reasoning_config.enabled is False.

tests/run_agent/test_run_agent.py — 6 new test cases:

test_kimi_coding_endpoint_sends_max_tokens_and_reasoning
test_kimi_coding_endpoint_respects_custom_effort
test_kimi_coding_endpoint_sends_thinking_extra_body
test_kimi_coding_endpoint_disables_thinking
test_moonshot_endpoint_sends_max_tokens_and_reasoning

Testing

28 passed in 5.05s  (TestBuildApiKwargs class)

References

Kimi CLI source: MoonshotAI/kimi-cli kimi.py
Related: fix(kimi): omit temperature entirely for Kimi/Moonshot models #13157 (temperature omission), [Bug]: run_agent.py uses _default_headers instead of default_headers, dropping routed client headers and breaking Kimi API #8779 (header passthrough), feat: add k2.6-code-preview support for kimi-coding / kimi-coding-cn / moonshot #11168 (k2.6 support)

…oonshot Kimi/Moonshot endpoints require explicit parameters that Hermes was not sending, causing 'Response truncated due to output length limit' errors and inconsistent reasoning behavior. Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli, packages/kosong/src/kosong/chat_provider/kimi.py): 1. max_tokens: Kimi's API defaults to a very low value when omitted. Reasoning tokens share the output budget — the model exhausts it on thinking alone. Send 32000, matching Kimi CLI's generate() default. 2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not inside extra_body). Hermes was not sending it at all because _supports_reasoning_extra_body() returns False for non-OpenRouter endpoints. 3. extra_body.thinking: Kimi CLI uses with_thinking() which sets extra_body.thinking={"type":"enabled"} alongside reasoning_effort. This is a separate control from the OpenAI-style reasoning extra_body that Hermes sends for OpenRouter/GitHub. Without it, the Kimi gateway may not activate reasoning mode correctly. Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot). Tests: 6 new test cases for max_tokens, reasoning_effort, and extra_body.thinking under various configs.

teknium1 · 2026-04-21T12:32:42Z

Merged via PR #13503 — your commit was cherry-picked onto current main with your authorship preserved in git log (063bc3c). Thanks @mengjian-github!

Small follow-up adjustments on top of your change:

Converted URL checks to base_url_host_matches() to match the hostname-hardening sweep recently merged to main (dbb7e00).
Added explicit api.moonshot.cn support alongside api.moonshot.ai.
When reasoning_config.enabled=False, we now omit reasoning_effort entirely to mirror upstream Kimi CLI's with_thinking("off") → reasoning_effort=None mapping.

teknium1 mentioned this pull request Apr 21, 2026

fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot #13503

Merged

teknium1 closed this in #13503 Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot#13168

fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot#13168
mengjian-github wants to merge 1 commit into
NousResearch:mainfrom
mengjian-github:fix/kimi-reasoning-and-max-tokens

mengjian-github commented Apr 20, 2026

Uh oh!

teknium1 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mengjian-github commented Apr 20, 2026

Problem

Root Cause

Changes

Testing

References

Uh oh!

teknium1 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants