Skip to content

fix(kilo): workaround DeepSeek thinking-mode 400 on Kilo gateway (v4-pro/v4-flash/reasoner)#15323

Closed
asin76-svg wants to merge 2 commits into
NousResearch:mainfrom
asin76-svg:fix/kilo-deepseek-thinking-mode
Closed

fix(kilo): workaround DeepSeek thinking-mode 400 on Kilo gateway (v4-pro/v4-flash/reasoner)#15323
asin76-svg wants to merge 2 commits into
NousResearch:mainfrom
asin76-svg:fix/kilo-deepseek-thinking-mode

Conversation

@asin76-svg

Copy link
Copy Markdown

Summary

This PR fixes HTTP 400 errors ("The reasoning_content in the thinking mode must be passed back to the API.") for DeepSeek V3.2+ thinking models (deepseek-v4-pro, deepseek-v4-flash, deepseek-reasoner) accessed via the Kilo Code Gateway (https://api.kilo.ai/api/gateway).

Why this is distinct from #14941, #15228, #15237

Existing open/closed DeepSeek reasoning_content PRs target direct DeepSeek API access (api.deepseek.com) and all assume that setting reasoning_content = "" is sufficient. That assumption does not hold through the Kilo gateway:

  1. Kilo strips reasoning / reasoning_content / reasoning_details fields on forward — see Kilo-Org/cloud:apps/web/src/lib/ai-gateway/providers/openrouter/request-helpers.ts, function removeChatCompletionsReasoning, and also injectReasoningIntoContent which falls back to if (reasoning) — a truthy check that drops empty strings. So reasoning_content = "" reaches the upstream DeepSeek provider as absent.

  2. DeepSeek V3.2+ thinking mode additionally rejects conversations shaped as tool → user — i.e. any user turn after the last tool result causes the 400 even when reasoning_content is present. I reproduced this directly against api.kilo.ai with every combination of reasoning / reasoning_content / reasoning_details (empty, whitespace, non-empty, native and unified naming). The only shapes that succeed are:

    • No trailing user/assistant after the last tool, or
    • Such trailing text merged into the last tool message's content.

This is the same workaround Roo Code adopted in kilocode-legacy/src/api/transform/r1-format.ts (convertToR1Format(..., { mergeToolResultText: true })). Their comment:

"environment_details text after tool_results would create user messages that cause DeepSeek to drop all previous reasoning_content."

This PR ports that insight into Hermes.

Changes

All in run_agent.py, plus a new test file.

1. _needs_deepseek_thinking_tool_merge() (new)

Returns True when base_url is api.kilo.ai and the model contains "deepseek" but isn't deepseek-chat (which is non-thinking and unaffected).

2. _copy_reasoning_content_for_api() extended

New terminal branch: when the above predicate is true and source_msg has tool_calls but no reasoning_content/reasoning, inject reasoning_content = ".".

Why "." and not "": empirically verified against api.kilo.ai/api/gateway/chat/completions with deepseek-v4-pro:

  • reasoning_content = "" → HTTP 400 (falsy, gateway drops it).
  • reasoning_content = " " (single space) → HTTP 200.
  • reasoning_content = "." → HTTP 200.

Minimum-cost placeholder that survives the gateway's truthy filter.

3. _merge_post_tool_text_into_tool() (new, static)

Walks api_messages, finds the last tool message, and if only user/assistant-text messages (no further tool_calls) follow it, merges their text into the tool's content and truncates the list there. Does not mutate the input; returns a new list. Bails out if another tool-call round follows the last tool result, to preserve structure.

4. Invoked at both api_messages assembly sites

  • Main agent loop (after the per-message copy+sanitize pass).
  • flush_memories().

Testing

Unit tests (new file tests/run_agent/test_kilo_deepseek_thinking.py)

20 tests covering:

  • _needs_deepseek_thinking_tool_merge() matrix: kilo × [v4-pro, v4-flash, deepseek-reasoner, deepseek-chat, non-deepseek] + openrouter-with-deepseek (must be False).
  • _merge_post_tool_text_into_tool(): empty list, no tool, no trailing, single trailing user, multiple trailing users, mixed asst-text + user trailing, trailing asst-with-tool-calls (must bail), multi-round with merge only on last, input immutability.
  • _copy_reasoning_content_for_api() kilo+deepseek branch: "." injection, preservation of explicit reasoning_content, conversion from reasoning, non-injection on non-kilo, non-injection without tool_calls.

All 20 pass. Existing tests/run_agent/test_provider_parity.py (76 tests) still passes.

End-to-end verification

I replayed a real failing request_dump_*.json captured from a user-session (DeepSeek v4-pro via kilo, mid-agent-loop with 7 trailing user messages after the last tool result — caused by plugin context injection). Before fix: HTTP 400. After fix: HTTP 200 with proper content and reasoning fields returned by the model.

Interaction with #14941, #15228

Compatible. This PR only fires when base_url is api.kilo.ai; #15228 paths run for api.deepseek.com. They do not conflict at runtime. If #15228 merges first, a small rebase of this PR removes any duplicated condition around _copy_reasoning_content_for_api. If this merges first, #15228 can follow up unchanged.

References

Kilo gateway (https://api.kilo.ai/api/gateway) при форварде в upstream
DeepSeek срезает reasoning/reasoning_content/reasoning_details
(см. Kilo-Org/cloud:apps/web/src/lib/ai-gateway/providers/openrouter/
request-helpers.ts — removeChatCompletionsReasoning). В результате
DeepSeek v3.2+ thinking-модели (deepseek-v4-pro, v4-flash,
deepseek-reasoner) возвращают 400 'The reasoning_content in the
thinking mode must be passed back to the API' в двух случаях:

1) У assistant с tool_calls отсутствует непустой reasoning_content
   (типично при переносе сессии с Codex API, где reasoning хранится
   в зашифрованном codex_reasoning_items и недоступен для форварда);
2) После последнего tool-результата идёт user-сообщение (либо
   цепочка user-сообщений от plugin-инъекций контекста).

Решение — по аналогии с Roo Code (convertToR1Format с
mergeToolResultText: true):

- _needs_deepseek_thinking_tool_merge() — детектит связку
  api.kilo.ai + модель с 'deepseek' (кроме deepseek-chat, он
  non-thinking).
- _copy_reasoning_content_for_api расширен: для kilo+DeepSeek thinking
  на assistant+tool_calls без reasoning подставляет
  reasoning_content='.' (пустая строка отбрасывается как falsy на
  стороне gateway, нужен минимум один символ).
- _merge_post_tool_text_into_tool() — сливает user/assistant-text
  после последнего tool прямо в его content и обрезает хвост, чтобы
  структура не имела tool -> user.

Вызовы добавлены в обоих местах сборки api_messages: основной
agent loop и flush_memories.

Ссылки:
- https://api-docs.deepseek.com/guides/thinking_mode
- RooCodeInc/Roo-Code#10171
- SillyTavern/SillyTavern#4857

Тесты: tests/run_agent/test_kilo_deepseek_thinking.py (20 штук).
Реальный упавший request_dump после фикса возвращает HTTP 200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder provider/kilo Kilo Code labels Apr 24, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #15228 and #15250 but distinct — this addresses the Kilo gateway specifically stripping reasoning_content fields and rejecting tool→user message shapes, requiring a different workaround than direct DeepSeek API fixes.

E2E-тесты через api.kilo.ai выявили, что первая версия фикса (PR NousResearch#15323)
покрывает только подмножество случаев: DeepSeek thinking-моделям через
kilo gateway нужно больше.

Что добавлено:

1. reasoning_content="." инжектируется теперь в ЛЮБОЕ assistant-сообщение
   (а не только в assistant с tool_calls). Если в истории есть хотя бы
   один tool-результат, следующий plain-text assistant без rc тоже
   вызывает 400.

2. _merge_post_tool_text_into_tool переработан: теперь сливает КАЖДОЕ
   user-сообщение, идущее в истории после tool-результата, внутрь
   ближайшего предыдущего tool-сообщения. Раньше сливались только
   trailing user после последнего tool. Новая версия покрывает реальный
   сценарий hermes: пользователь вводит новый запрос между двумя
   tool-циклами (старая версия такое не трогала, и kilo возвращал 400
   даже с rc=".").

3. assistant-text между tool и user остаётся на своём месте (не портит
   структуру), но user всё равно сливается в tool — e2e показали, что
   наличие assistant-text между tool и user не защищает от 400.

Обновлены тесты (test_kilo_deepseek_thinking.py): добавлены
test_user_between_tool_cycles_merged_into_prev_tool,
test_asst_text_between_tool_and_user_preserved,
test_first_user_before_any_tool_preserved,
test_dot_injected_on_assistant_text_without_tool_calls,
test_no_injection_on_non_assistant_role.
@teknium1

Copy link
Copy Markdown
Contributor

Closing as redundant — the DeepSeek reasoning_content thinking-mode 400 and cross-provider leak chain of issues is now fully covered on main:

21 regression tests in tests/run_agent/test_deepseek_reasoning_content_echo.py + 2 new tests for the cross-provider scenario exercise every known path. Thanks for the submission — appreciate the digging on this area.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround provider/kilo Kilo Code type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants