fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot#13168
Closed
mengjian-github wants to merge 1 commit into
Closed
fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot#13168mengjian-github wants to merge 1 commit into
mengjian-github wants to merge 1 commit into
Conversation
…oonshot
Kimi/Moonshot endpoints require explicit parameters that Hermes was not
sending, causing 'Response truncated due to output length limit' errors
and inconsistent reasoning behavior.
Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli,
packages/kosong/src/kosong/chat_provider/kimi.py):
1. max_tokens: Kimi's API defaults to a very low value when omitted.
Reasoning tokens share the output budget — the model exhausts it on
thinking alone. Send 32000, matching Kimi CLI's generate() default.
2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not
inside extra_body). Hermes was not sending it at all because
_supports_reasoning_extra_body() returns False for non-OpenRouter
endpoints.
3. extra_body.thinking: Kimi CLI uses with_thinking() which sets
extra_body.thinking={"type":"enabled"} alongside reasoning_effort.
This is a separate control from the OpenAI-style reasoning extra_body
that Hermes sends for OpenRouter/GitHub. Without it, the Kimi gateway
may not activate reasoning mode correctly.
Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot).
Tests: 6 new test cases for max_tokens, reasoning_effort, and
extra_body.thinking under various configs.
Contributor
|
Merged via PR #13503 — your commit was cherry-picked onto current main with your authorship preserved in git log (063bc3c). Thanks @mengjian-github! Small follow-up adjustments on top of your change:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Kimi/Moonshot endpoints return
"Response truncated due to output length limit"and exhibit inconsistent reasoning behavior when used through Hermes, even though the same API key works perfectly with Kimi CLI.Root Cause
Compared Hermes API calls against Kimi CLI source code (
kimi.py). Three parameters that Kimi CLI sends were missing from Hermes:max_tokens32000(default ingenerate())reasoning_effort_supports_reasoning_extra_body()returnsFalsefor non-OpenRouterextra_body.thinking{"type": "enabled"}viawith_thinking()Without
max_tokens, the Kimi gateway uses a small default. Reasoning tokens share that budget, so the model exhausts it on thinking alone → truncation.Without
reasoning_effortandextra_body.thinking, reasoning mode activation is inconsistent — it depends on gateway heuristics rather than explicit client intent.Changes
run_agent.py— 2 additions in_build_api_kwargs():max_tokens+reasoning_effort— Addedelifbranch forapi.kimi.comandapi.moonshotURLs in the max_tokens chain (after Qwen Portal, before OpenRouter). Sendsmax_tokens: 32000(matching Kimi CLI default) andreasoning_effortas a top-level parameter, respectingreasoning_config.effortwhen set.extra_body.thinking— Before the existing_supports_reasoning_extra_body()gate, detect Kimi/Moonshot URLs and injectextra_body.thinking = {"type": "enabled"}. Set to"disabled"whenreasoning_config.enabledisFalse.tests/run_agent/test_run_agent.py— 6 new test cases:test_kimi_coding_endpoint_sends_max_tokens_and_reasoningtest_kimi_coding_endpoint_respects_custom_efforttest_kimi_coding_endpoint_sends_thinking_extra_bodytest_kimi_coding_endpoint_disables_thinkingtest_moonshot_endpoint_sends_max_tokens_and_reasoningTesting
References
kimi.py