Skip to content

[Bug]: persisted assistant messages store reasoning in 'reasoning' (internal) instead of 'reasoning_content', leaving sessions silently poisoned for any future DeepSeek/Kimi thinking-mode replay #16844

@29206394

Description

@29206394

[Bug]: persisted assistant messages store reasoning in reasoning (internal) instead of reasoning_content, leaving sessions silently poisoned for any future DeepSeek/Kimi thinking-mode replay

Summary

run_agent.py writes assistant turns to disk with the chain-of-thought stored under the internal field name reasoning, not the protocol-standard reasoning_content. The standard field is only persisted when the upstream SDK object happens to expose assistant_message.reasoning_content, which is provider-dependent. For most non-DeepSeek providers (GLM, MiniMax, GPT‑5.x via aigw / OpenAI Chat Completions wrappers) the field never gets written.

This means every assistant tool-call turn produced by those providers is silently poisoned at write time. The poison is invisible until the user later switches to a DeepSeek‑v4 / Kimi thinking model — which strictly requires reasoning_content on every replayed assistant turn — at which point HTTP 400 fires:

The reasoning_content in the thinking mode must be passed back to the API.

The recently merged read-side guards (#15213, #15741, #15748, #15353) all attempt to compensate at request-build time. They each fix one build path. But the underlying schema mismatch on disk means every new build path is a candidate for the same 400, and any session created by another provider becomes a latent bomb the moment the user switches model.

This issue is about the write side, not the read side. The proposal is to normalize the field name at persistence time so the read-side compensation code is unnecessary.

Why this is distinct from #15213 / #15741 / #15748 / #15353

Those issues all describe a single read path that fails to copy or inject reasoning_content when building the next API request. Each fix patches one path:

This issue identifies the upstream cause: assistant messages are persisted with the wrong field name, so every read path has to reinvent a "promote reasoningreasoning_content (or inject "")" dance. Any code path that omits the dance — present or future — will fail.

The cumulative evidence below shows this is not theoretical: a single user's session store accumulated 4 031 poisoned messages across 1 101 session files, every one of which would 400 on DeepSeek replay despite all four landed fixes being present in tree.

Forensic data from a real install

Hermes Agent v0.11.0 (2026.4.23). After encountering the 400 with provider=custom, model=deepseek-v4-pro against https://aigw.netease.com/v1, I scanned the full session store at ~/.hermes/sessions/ and ~/.hermes/profiles/*/sessions/:

Scanned files       : 1 497
Files with poison   : 1 101   (assistant + tool_calls + missing reasoning_content)
Poisoned msgs total : 4 031

Breakdown of the 4 031 poisoned messages:

By session.model (top entries):

count model
3 651 glm-5.1
272 MiniMax-M2.7
74 gpt-5.4
21 MiniMax-M2.7-highspeed
11 claude-opus-4-6
2 deepseek-v4-pro

By message structure:

signal value meaning
has internal reasoning field, non-empty string 3 603 / 4 031 (89%) hermes captured the chain of thought, just under the wrong key
no reasoning at all 428 / 4 031 (11%) message stored without any reasoning info
finish_reason == "tool_calls" 3 501 / 4 031 (87%) classic tool-call termination
empty content 3 027 / 4 031 (75%) pure function-call turns

Sample poisoned message (from a cron job that ran 2026-04-26 under glm-5.1):

{
  "role": "assistant",
  "content": "",
  "reasoning": "Let me analyze the health check output:\n\n- CRIT: 0\n- WARN: 1 - gateway_state hasn't been updated for over 27 hours (pid=75659)\n\nI need to investigate this warning about the gateway process. Let me che…",
  "finish_reason": "tool_calls",
  "tool_calls": [ … 2 calls … ]
}

Note: the chain of thought was captured (267 chars under reasoning). It just isn't written under the name DeepSeek requires.

Root cause in code

run_agent.py (around line 7755):

msg = {
    ...
    "reasoning": reasoning_text,        # internal canonical name — always written
    "finish_reason": finish_reason,
}
if hasattr(assistant_message, "reasoning_content"):
    raw = getattr(assistant_message, "reasoning_content", None)
    if raw is not None:
        msg["reasoning_content"] = _sanitize_surrogates(raw)   # only when SDK exposed it
    elif msg.get("tool_calls") and self._needs_deepseek_tool_reasoning():
        msg["reasoning_content"] = ""                           # narrow guard, only when current provider is DeepSeek at write time

Two failure modes:

  1. The non-DeepSeek SDK object often doesn't expose reasoning_content as a top-level attribute (the data lives under delta.reasoning_content in streaming chunks, accumulated into the local variable reasoning_text, and then written only to the internal "reasoning" key). The standard field never lands on disk.
  2. The _needs_deepseek_tool_reasoning() guard only fires when the current provider is DeepSeek. If the message is being written under glm/minimax/gpt and the user later switches to DeepSeek, the guard never ran when it would have helped.

The read-side _copy_reasoning_content_for_api does have a path that promotes reasoningreasoning_content, and after #15748's reordering it does the right thing on the main loop. But every new code path that builds an API request from history (cron, fallback switch, auxiliary clients, ACP adapter, gateway replay, transports/chat_completions, transports/bedrock) is a fresh place where the dance can be forgotten — and #15213 / #15741 are evidence that this happens.

Reproduction

  1. Hermes v0.11.0, any non-DeepSeek thinking model that emits reasoning via delta.reasoning_content in streaming (e.g. glm-5.1 over an aigw or zhipu endpoint).
  2. Have at least one tool-call turn in the conversation.
  3. Inspect the persisted session JSON — the assistant turn will have "reasoning": "…" but no "reasoning_content" key.
  4. Switch the same session (or a new run that loads accumulated context, e.g. cron with persistent session, or an a2a sub-agent) to deepseek-v4-pro / deepseek-v4-flash.
  5. The next API request that replays the message returns HTTP 400.

In my install this happened at message ~100 of a session that had been growing for a day under glm-5.1, the moment the fallback chain promoted DeepSeek to primary.

Suggested fix

Normalize at write time, not at read time.

In the persistence path that builds the assistant message dict, write the chain of thought to reasoning_content directly (which is the standard cross-provider name; the SDK ecosystem has effectively converged on this), and either drop the reasoning alias or keep both for one release for backward compat.

Concretely: at the point where reasoning_text is finalized for the message, write:

msg["reasoning_content"] = _sanitize_surrogates(reasoning_text or "")

unconditionally for assistant turns. The empty string is the safest default — DeepSeek/Kimi accept it, every other provider ignores unknown empty fields, and the read side no longer needs to compensate.

This makes the four landed read-side fixes redundant safety nets rather than mandatory promotion paths, and prevents the same class of bug from recurring in future build paths.

Defense-in-depth (optional)

A startup-time migration that scans ~/.hermes/sessions/**/*.json and adds reasoning_content: "" (or copies from reasoning) on any assistant turn missing it would clean the existing fleet. I wrote one for my install — happy to PR it if useful. It found and repaired the 4 031 messages above; total run time on 1 497 files was under 10 seconds.

Workaround for affected users

Until the write side is fixed, two things have to be done together:

  1. hermes config set agent.reasoning_effort none (stops new poisoned writes when DeepSeek is primary)
  2. Run a one-time repair across the session store to inject reasoning_content: "" on every poisoned message — otherwise switching to DeepSeek at any later date re-triggers the 400.

(1) alone is not enough. (2) alone gets re-poisoned the next time a non-DeepSeek provider is used.

Environment

  • Hermes Agent v0.11.0 (2026.4.23)
  • Python 3.14.3
  • openai 2.26.0
  • macOS 26.4.1 (Darwin 25.4)
  • Provider: custom, base_url https://aigw.netease.com/v1
  • Affected models observed: deepseek-v4-pro (failing), glm-5.1 / MiniMax-M2.7 / gpt-5.4 / claude-opus-4-6 (poisoning sources)

Related

All of the above are read-side fixes. This issue proposes a write-side fix that makes them unnecessary going forward.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderprovider/deepseekDeepSeek APIprovider/kimiKimi / Moonshottype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions