Bug Description
Hermes correctly preserves reasoning_content in the main loop (run_agent.py::_copy_reasoning_content_for_api). I verified this with a two-turn round-trip test against deepseek-v4-pro — it passes.
In production, however, a long-running cron job consistently fails with HTTP 400 reasoning_content must be passed back to the API after several auxiliary calls (title_generation, vision_analyze, auxiliary auto-detect). The error happens ~6 minutes after the last title_generation call, with no user intervention between.
This looks like the same passthrough is missing in auxiliary_client and/or the context compressor path, not in the main loop.
Steps to Reproduce
- Set the main model to
deepseek-v4-pro via custom provider (https://api.deepseek.com, api: openai-completions). Thinking is enabled by default on v4-pro.
- Register a cron job that runs a non-trivial multi-step task (with at least one tool call + at least one vision/title/session-search auxiliary trigger).
- Let the job run for 15–30 minutes.
- Observe
Non-retryable client error: Error code: 400 ... The reasoning_content in the thinking mode must be passed back to the API. in the log.
Expected Behavior
Either:
reasoning_content is preserved along the auxiliary / compression / cron paths the same way it is in the main loop, or
- auxiliary calls default to
thinking: disabled (they don't need CoT for title generation / vision descriptions / session search anyway).
Actual Behavior
HTTP 400 on a DeepSeek v4-pro call somewhere in the cron → auxiliary → main-loop chain; the session becomes "poisoned" and cannot recover without clearing history.
Affected Component
CLI (interactive chat)
Messaging Platform (if gateway-related)
N/A (CLI only)
Debug Report
2026-04-24 16:33:46 INFO cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
'type': 'invalid_request_error', 'code': 'invalid_request_error'}}
Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.
Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.
1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."
- #9571 — `title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927 — HTTP 400 with OpenRouter when tools are enabled
Operating System
Ubuntu: 24.04.2
Python Version
3.11.8
Hermes Version
0.11.0
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
Bug Description
Hermes correctly preserves
reasoning_contentin the main loop (run_agent.py::_copy_reasoning_content_for_api). I verified this with a two-turn round-trip test againstdeepseek-v4-pro— it passes.In production, however, a long-running cron job consistently fails with HTTP 400
reasoning_content must be passed back to the APIafter several auxiliary calls (title_generation,vision_analyze,auxiliary auto-detect). The error happens ~6 minutes after the lasttitle_generationcall, with no user intervention between.This looks like the same passthrough is missing in
auxiliary_clientand/or the context compressor path, not in the main loop.Steps to Reproduce
deepseek-v4-provia custom provider (https://api.deepseek.com,api: openai-completions). Thinking is enabled by default on v4-pro.Non-retryable client error: Error code: 400 ... The reasoning_content in the thinking mode must be passed back to the API.in the log.Expected Behavior
Either:
reasoning_contentis preserved along the auxiliary / compression / cron paths the same way it is in the main loop, orthinking: disabled(they don't need CoT for title generation / vision descriptions / session search anyway).Actual Behavior
HTTP 400 on a DeepSeek v4-pro call somewhere in the cron → auxiliary → main-loop chain; the session becomes "poisoned" and cannot recover without clearing history.
Affected Component
CLI (interactive chat)
Messaging Platform (if gateway-related)
N/A (CLI only)
Debug Report
Operating System
Ubuntu: 24.04.2
Python Version
3.11.8
Hermes Version
0.11.0
Additional Logs / Traceback (optional)
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?