Skip to content

[Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) #15213

@romannekrasovaillm

Description

@romannekrasovaillm

Bug Description

Hermes correctly preserves reasoning_content in the main loop (run_agent.py::_copy_reasoning_content_for_api). I verified this with a two-turn round-trip test against deepseek-v4-pro — it passes.

In production, however, a long-running cron job consistently fails with HTTP 400 reasoning_content must be passed back to the API after several auxiliary calls (title_generation, vision_analyze, auxiliary auto-detect). The error happens ~6 minutes after the last title_generation call, with no user intervention between.

This looks like the same passthrough is missing in auxiliary_client and/or the context compressor path, not in the main loop.

Steps to Reproduce

  1. Set the main model to deepseek-v4-pro via custom provider (https://api.deepseek.com, api: openai-completions). Thinking is enabled by default on v4-pro.
  2. Register a cron job that runs a non-trivial multi-step task (with at least one tool call + at least one vision/title/session-search auxiliary trigger).
  3. Let the job run for 15–30 minutes.
  4. Observe Non-retryable client error: Error code: 400 ... The reasoning_content in the thinking mode must be passed back to the API. in the log.

Expected Behavior

Either:

  • reasoning_content is preserved along the auxiliary / compression / cron paths the same way it is in the main loop, or
  • auxiliary calls default to thinking: disabled (they don't need CoT for title generation / vision descriptions / session search anyway).

Actual Behavior

HTTP 400 on a DeepSeek v4-pro call somewhere in the cron → auxiliary → main-loop chain; the session becomes "poisoned" and cannot recover without clearing history.

Affected Component

CLI (interactive chat)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

2026-04-24 16:33:46 INFO  cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO  agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
  Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
                               'type': 'invalid_request_error', 'code': 'invalid_request_error'}}


Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.

1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."

- #9571 — `title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927 — HTTP 400 with OpenRouter when tools are enabled

Operating System

Ubuntu: 24.04.2

Python Version

3.11.8

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

Labels

P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildercomp/cronCron scheduler and job managementprovider/deepseekDeepSeek APItype/bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions