fix(kilo): workaround DeepSeek thinking-mode 400 on Kilo gateway (v4-pro/v4-flash/reasoner)#15323
Closed
asin76-svg wants to merge 2 commits into
Closed
fix(kilo): workaround DeepSeek thinking-mode 400 on Kilo gateway (v4-pro/v4-flash/reasoner)#15323asin76-svg wants to merge 2 commits into
asin76-svg wants to merge 2 commits into
Conversation
Kilo gateway (https://api.kilo.ai/api/gateway) при форварде в upstream DeepSeek срезает reasoning/reasoning_content/reasoning_details (см. Kilo-Org/cloud:apps/web/src/lib/ai-gateway/providers/openrouter/ request-helpers.ts — removeChatCompletionsReasoning). В результате DeepSeek v3.2+ thinking-модели (deepseek-v4-pro, v4-flash, deepseek-reasoner) возвращают 400 'The reasoning_content in the thinking mode must be passed back to the API' в двух случаях: 1) У assistant с tool_calls отсутствует непустой reasoning_content (типично при переносе сессии с Codex API, где reasoning хранится в зашифрованном codex_reasoning_items и недоступен для форварда); 2) После последнего tool-результата идёт user-сообщение (либо цепочка user-сообщений от plugin-инъекций контекста). Решение — по аналогии с Roo Code (convertToR1Format с mergeToolResultText: true): - _needs_deepseek_thinking_tool_merge() — детектит связку api.kilo.ai + модель с 'deepseek' (кроме deepseek-chat, он non-thinking). - _copy_reasoning_content_for_api расширен: для kilo+DeepSeek thinking на assistant+tool_calls без reasoning подставляет reasoning_content='.' (пустая строка отбрасывается как falsy на стороне gateway, нужен минимум один символ). - _merge_post_tool_text_into_tool() — сливает user/assistant-text после последнего tool прямо в его content и обрезает хвост, чтобы структура не имела tool -> user. Вызовы добавлены в обоих местах сборки api_messages: основной agent loop и flush_memories. Ссылки: - https://api-docs.deepseek.com/guides/thinking_mode - RooCodeInc/Roo-Code#10171 - SillyTavern/SillyTavern#4857 Тесты: tests/run_agent/test_kilo_deepseek_thinking.py (20 штук). Реальный упавший request_dump после фикса возвращает HTTP 200. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
E2E-тесты через api.kilo.ai выявили, что первая версия фикса (PR NousResearch#15323) покрывает только подмножество случаев: DeepSeek thinking-моделям через kilo gateway нужно больше. Что добавлено: 1. reasoning_content="." инжектируется теперь в ЛЮБОЕ assistant-сообщение (а не только в assistant с tool_calls). Если в истории есть хотя бы один tool-результат, следующий plain-text assistant без rc тоже вызывает 400. 2. _merge_post_tool_text_into_tool переработан: теперь сливает КАЖДОЕ user-сообщение, идущее в истории после tool-результата, внутрь ближайшего предыдущего tool-сообщения. Раньше сливались только trailing user после последнего tool. Новая версия покрывает реальный сценарий hermes: пользователь вводит новый запрос между двумя tool-циклами (старая версия такое не трогала, и kilo возвращал 400 даже с rc="."). 3. assistant-text между tool и user остаётся на своём месте (не портит структуру), но user всё равно сливается в tool — e2e показали, что наличие assistant-text между tool и user не защищает от 400. Обновлены тесты (test_kilo_deepseek_thinking.py): добавлены test_user_between_tool_cycles_merged_into_prev_tool, test_asst_text_between_tool_and_user_preserved, test_first_user_before_any_tool_preserved, test_dot_injected_on_assistant_text_without_tool_calls, test_no_injection_on_non_assistant_role.
Contributor
|
Closing as redundant — the DeepSeek
21 regression tests in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes HTTP 400 errors ("The
reasoning_contentin the thinking mode must be passed back to the API.") for DeepSeek V3.2+ thinking models (deepseek-v4-pro,deepseek-v4-flash,deepseek-reasoner) accessed via the Kilo Code Gateway (https://api.kilo.ai/api/gateway).Why this is distinct from #14941, #15228, #15237
Existing open/closed DeepSeek reasoning_content PRs target direct DeepSeek API access (
api.deepseek.com) and all assume that settingreasoning_content = ""is sufficient. That assumption does not hold through the Kilo gateway:Kilo strips
reasoning/reasoning_content/reasoning_detailsfields on forward — seeKilo-Org/cloud:apps/web/src/lib/ai-gateway/providers/openrouter/request-helpers.ts, functionremoveChatCompletionsReasoning, and alsoinjectReasoningIntoContentwhich falls back toif (reasoning)— a truthy check that drops empty strings. Soreasoning_content = ""reaches the upstream DeepSeek provider as absent.DeepSeek V3.2+ thinking mode additionally rejects conversations shaped as
tool → user— i.e. any user turn after the lasttoolresult causes the 400 even whenreasoning_contentis present. I reproduced this directly againstapi.kilo.aiwith every combination ofreasoning/reasoning_content/reasoning_details(empty, whitespace, non-empty, native and unified naming). The only shapes that succeed are:user/assistantafter the lasttool, orcontent.This is the same workaround Roo Code adopted in
kilocode-legacy/src/api/transform/r1-format.ts(convertToR1Format(..., { mergeToolResultText: true })). Their comment:This PR ports that insight into Hermes.
Changes
All in
run_agent.py, plus a new test file.1.
_needs_deepseek_thinking_tool_merge()(new)Returns
Truewhenbase_urlisapi.kilo.aiand the model contains"deepseek"but isn'tdeepseek-chat(which is non-thinking and unaffected).2.
_copy_reasoning_content_for_api()extendedNew terminal branch: when the above predicate is true and
source_msghastool_callsbut noreasoning_content/reasoning, injectreasoning_content = ".".Why
"."and not"": empirically verified againstapi.kilo.ai/api/gateway/chat/completionswithdeepseek-v4-pro:reasoning_content = ""→ HTTP 400 (falsy, gateway drops it).reasoning_content = " "(single space) → HTTP 200.reasoning_content = "."→ HTTP 200.Minimum-cost placeholder that survives the gateway's truthy filter.
3.
_merge_post_tool_text_into_tool()(new, static)Walks
api_messages, finds the lasttoolmessage, and if only user/assistant-text messages (no furthertool_calls) follow it, merges their text into the tool'scontentand truncates the list there. Does not mutate the input; returns a new list. Bails out if another tool-call round follows the last tool result, to preserve structure.4. Invoked at both
api_messagesassembly sitesflush_memories().Testing
Unit tests (new file
tests/run_agent/test_kilo_deepseek_thinking.py)20 tests covering:
_needs_deepseek_thinking_tool_merge()matrix: kilo × [v4-pro, v4-flash, deepseek-reasoner, deepseek-chat, non-deepseek] + openrouter-with-deepseek (must be False)._merge_post_tool_text_into_tool(): empty list, no tool, no trailing, single trailing user, multiple trailing users, mixed asst-text + user trailing, trailing asst-with-tool-calls (must bail), multi-round with merge only on last, input immutability._copy_reasoning_content_for_api()kilo+deepseek branch:"."injection, preservation of explicitreasoning_content, conversion fromreasoning, non-injection on non-kilo, non-injection withouttool_calls.All 20 pass. Existing
tests/run_agent/test_provider_parity.py(76 tests) still passes.End-to-end verification
I replayed a real failing
request_dump_*.jsoncaptured from a user-session (DeepSeek v4-pro via kilo, mid-agent-loop with 7 trailingusermessages after the last tool result — caused by plugin context injection). Before fix: HTTP 400. After fix: HTTP 200 with propercontentandreasoningfields returned by the model.Interaction with #14941, #15228
Compatible. This PR only fires when
base_urlisapi.kilo.ai; #15228 paths run forapi.deepseek.com. They do not conflict at runtime. If #15228 merges first, a small rebase of this PR removes any duplicated condition around_copy_reasoning_content_for_api. If this merges first, #15228 can follow up unchanged.References
mergeToolResultTextworkaround