Skip to content

fix: extract DeepSeek reasoning_content from model_extra#14973

Closed
HiddenPuppy wants to merge 1 commit into
NousResearch:mainfrom
HiddenPuppy:fix/openrouter-tools-400
Closed

fix: extract DeepSeek reasoning_content from model_extra#14973
HiddenPuppy wants to merge 1 commit into
NousResearch:mainfrom
HiddenPuppy:fix/openrouter-tools-400

Conversation

@HiddenPuppy

Copy link
Copy Markdown
Contributor

Problem

DeepSeek V4 Flash has thinking mode enabled by default. When the model makes a tool call, the API returns a reasoning_content field in the response.

However, OpenAI SDK < 1.60 doesn't declare reasoning_content as a ChatCompletionMessage field. It ends up in Pydantic's model_extra instead. Without this fix, the reasoning_content is lost and subsequent requests fail with:

Changes

  • _extract_reasoning() now checks model_extra first, then falls back to the direct attribute for backward compatibility with newer SDK versions.
  • Added tests to verify extraction from both model_extra and direct attributes.

Fixes

Test Plan

============================= test session starts ==============================
platform linux -- Python 3.11.6, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /tmp/hermes-agent
configfile: pyproject.toml
plugins: anyio-4.13.0, asyncio-1.3.0, xdist-3.8.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 2/2 workers
2 workers [6 items]

scheduling tests via LoadScheduling

tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_extract_from_model_extra
tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_model_extra_takes_precedence_over_attribute
[gw1] [ 16%] PASSED tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_model_extra_takes_precedence_over_attribute
[gw0] [ 33%] PASSED tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_extract_from_model_extra
tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_extract_from_direct_attribute
tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_no_reasoning_returns_none
[gw0] [ 50%] PASSED tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_extract_from_direct_attribute
[gw1] [ 66%] PASSED tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_no_reasoning_returns_none
tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_none_model_extra
tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_empty_model_extra
[gw1] [ 83%] PASSED tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_empty_model_extra
[gw0] [100%] PASSED tests/run_agent/test_deepseek_reasoning_content.py::TestExtractReasoning::test_none_model_extra

============================== 6 passed in 3.60s ===============================
============================= test session starts ==============================
platform linux -- Python 3.11.6, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /tmp/hermes-agent
configfile: pyproject.toml
plugins: anyio-4.13.0, asyncio-1.3.0, xdist-3.8.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
created: 2/2 workers
2 workers [39 items]

scheduling tests via LoadScheduling

tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_convert_messages_strips_codex_fields
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_api_mode
[gw1] [ 2%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_convert_messages_strips_codex_fields
[gw0] [ 5%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_api_mode
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_registered
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_basic_kwargs
[gw0] [ 7%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_registered
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_convert_tools_identity
[gw0] [ 10%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_convert_tools_identity
[gw1] [ 12%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_basic_kwargs
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_developer_role_swap
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_convert_messages_no_codex_leaks
[gw1] [ 15%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_developer_role_swap
[gw0] [ 17%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBasic::test_convert_messages_no_codex_leaks
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_tools_included
[gw0] [ 20%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_tools_included
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_openrouter_provider_prefs
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_no_developer_swap_for_non_gpt5
[gw0] [ 23%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_openrouter_provider_prefs
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_nous_tags
[gw1] [ 25%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_no_developer_swap_for_non_gpt5
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_ollama_num_ctx
[gw0] [ 28%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_nous_tags
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_reasoning_default
[gw1] [ 30%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_ollama_num_ctx
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_custom_think_false
[gw0] [ 33%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_reasoning_default
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_nous_omits_disabled_reasoning
[gw1] [ 35%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_custom_think_false
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_max_tokens_with_fn
[gw1] [ 38%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_max_tokens_with_fn
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_ephemeral_overrides_max_tokens
[gw1] [ 41%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_ephemeral_overrides_max_tokens
[gw0] [ 43%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_nous_omits_disabled_reasoning
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_nvidia_default_max_tokens
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_fixed_temperature
[gw1] [ 46%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_fixed_temperature
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_omit_temperature
[gw0] [ 48%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_nvidia_default_max_tokens
[gw1] [ 51%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_omit_temperature
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_qwen_default_max_tokens
tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_max_tokens_default
[gw1] [ 53%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_max_tokens_default
tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_reasoning_effort_top_level
[gw1] [ 56%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_reasoning_effort_top_level
[gw0] [ 58%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_qwen_default_max_tokens
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_anthropic_max_output_for_claude_on_aggregator
tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_reasoning_effort_omitted_when_thinking_disabled
[gw0] [ 61%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_anthropic_max_output_for_claude_on_aggregator
[gw1] [ 64%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_reasoning_effort_omitted_when_thinking_disabled
tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_request_overrides_last
tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_thinking_enabled_extra_body
[gw0] [ 66%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsBuildKwargs::test_request_overrides_last
[gw1] [ 69%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_thinking_enabled_extra_body
tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_thinking_disabled_extra_body
[gw1] [ 71%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsKimi::test_kimi_thinking_disabled_extra_body
tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_none
[gw0] [ 74%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_none
tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_empty_choices
tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_no_choices
[gw1] [ 76%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_empty_choices
[gw0] [ 79%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_no_choices
tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_valid
[gw1] [ 82%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsValidate::test_valid
tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_text_response
[gw0] [ 84%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_text_response
tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_tool_call_extra_content_preserved
[gw0] [ 87%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_tool_call_extra_content_preserved
tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_tool_call_response
[gw1] [ 89%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_tool_call_response
tests/agent/transports/test_chat_completions.py::TestChatCompletionsCacheStats::test_no_usage
[gw1] [ 92%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsCacheStats::test_no_usage
tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_reasoning_content_preserved_separately
[gw0] [ 94%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsNormalize::test_reasoning_content_preserved_separately
tests/agent/transports/test_chat_completions.py::TestChatCompletionsCacheStats::test_no_details
[gw1] [ 97%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsCacheStats::test_no_details
tests/agent/transports/test_chat_completions.py::TestChatCompletionsCacheStats::test_with_cache
[gw0] [100%] PASSED tests/agent/transports/test_chat_completions.py::TestChatCompletionsCacheStats::test_with_cache

============================== 39 passed in 0.67s ==============================

All tests pass.

…h#14938)

DeepSeek V4 Flash returns reasoning_content in API responses, but OpenAI
SDK < 1.60 doesn't declare it as a ChatCompletionMessage field. It ends up
in Pydantic's model_extra instead.

Without this fix, the reasoning_content is lost and subsequent requests fail
with HTTP 400: The reasoning_content in the thinking mode must be passed back
to the API.

Changes:
- _extract_reasoning() now checks model_extra first, then falls back to the
direct attribute for backward compatibility with newer SDK versions.
- Added tests to verify extraction from both model_extra and direct attributes.

Fixes NousResearch#14938, NousResearch#14933
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder provider/deepseek DeepSeek API labels Apr 24, 2026
@huacao59109

Copy link
Copy Markdown

I hit a related but slightly different reproduction in a real scheduled Hermes run, and wanted to add the evidence here in case #14973 only covers the "DeepSeek generated reasoning_content but Hermes failed to extract it from model_extra" path.

Scenario

  • Primary model/provider: openai-codex / gpt-5.5
  • Fallback provider/model: custom OpenAI-compatible provider deepseek-v4 / deepseek-v4-flash
  • Fallback endpoint: https://api.deepseek.com/v1
  • Execution path: cron job (daily-github-trending-brief) that uses tools (web_extract, browser tools, terminal/lark-cli) and then auto-delivers the final response.

Timeline / logs

The primary model hit rate limits earlier in the run window, and the cron run activated fallback successfully:

2026-04-24 19:00:12,619 INFO [cron_f8de41fd3ff8_20260424_190002] root: Fallback activated: gpt-5.5 → deepseek-v4-flash (deepseek-v4)

But the fallback run then failed before producing a final response:

2026-04-24 19:01:16,685 ERROR [cron_f8de41fd3ff8_20260424_190002] root: Non-retryable client error: Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_request_error'}}

The saved cron output confirms that delivery did not fail; no final assistant response was generated:

## Response

(No response generated)

Request-dump shape

The failing request dump was a DeepSeek chat-completions request:

POST https://api.deepseek.com/v1/chat/completions
model: deepseek-v4-flash

The request history contained assistant messages from the pre-fallback execution with tool calls and reasoning_content, e.g. multiple messages shaped like:

{
  "role": "assistant",
  "content": "",
  "tool_calls": [...],
  "reasoning_content": "..."
}

There were also tool result messages after those assistant tool calls. In the dump I checked, 10+ assistant messages had reasoning_content; most were tool-call turns produced before/around the fallback.

Why this may be a separate edge case

#14973 appears to fix the case where DeepSeek itself returns reasoning_content in model_extra, Hermes fails to extract it, and the next DeepSeek request omits it.

This cron failure seems like a cross-provider fallback/migration case:

  1. The turn starts on openai-codex / gpt-5.5.
  2. Hermes records assistant/tool-call history, including reasoning fields from the original provider path.
  3. Hermes switches in-place to deepseek-v4-flash after the primary fails.
  4. DeepSeek receives a mixed-provider tool-call history and rejects the thinking-mode replay with the same reasoning_content error.

I also saw a similar non-cron/gateway case today after switching from the primary model to DeepSeek; the immediate workaround was setting reasoning to none, which suggests this is not limited to cron delivery.

Expected behavior

Fallback to DeepSeek should either:

  • preserve and replay DeepSeek-native reasoning_content correctly when the previous turns were actually generated by DeepSeek; or
  • when migrating an existing conversation/tool-call history from a different provider to DeepSeek, sanitize provider-specific reasoning replay fields and/or disable DeepSeek thinking for that migrated turn, so fallback can complete instead of failing with 400.

In short: please consider testing both paths:

  1. DeepSeek → tool call → DeepSeek continuation (the model_extra extraction case fixed by this PR)
  2. OpenAI/Codex → tool call(s) → fallback to DeepSeek continuation (cross-provider fallback case)

Happy to provide more details from the local request dump if useful, but I avoided pasting the full dump because it includes large tool outputs and deployment-specific prompt/config text.

@teknium1

Copy link
Copy Markdown
Contributor

Closing as redundant — the DeepSeek reasoning_content thinking-mode 400 and cross-provider leak chain of issues is now fully covered on main:

21 regression tests in tests/run_agent/test_deepseek_reasoning_content_echo.py + 2 new tests for the cross-provider scenario exercise every known path. Thanks for the submission — appreciate the digging on this area.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround provider/deepseek DeepSeek API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DeepSeek V4 Flash Toolcall Fails with reasoning_content Error [BUG] DeepSeek V4 thinking mode fails with reasoning_content error

4 participants