Skip to content

fix(agent): always persist reasoning_content on assistant turns#16884

Closed
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/issue-16844
Closed

fix(agent): always persist reasoning_content on assistant turns#16884
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/issue-16844

Conversation

@Sanjays2402

Copy link
Copy Markdown
Contributor

What

run_agent.py persists assistant turns with the chain of thought under the internal field reasoning and only writes the protocol-standard reasoning_content when:

  • the upstream SDK exposes assistant_message.reasoning_content as a top-level attribute, or
  • the current provider is DeepSeek and the turn has tool_calls (existing narrow guard).

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via openai-compat shims, etc.) emit reasoning through delta.reasoning_content chunks that get accumulated into the local reasoning_text string — but never land on the assistant message object as an attribute, so the persisted message is missing reasoning_content.

The bug is silent until the user later replays that history through a DeepSeek‑v4 / Kimi thinking model, which strictly requires reasoning_content on every replayed assistant turn:

The reasoning_content in the thinking mode must be passed back to the API.

Read-side patches (#15213, #15741, #15748, #15353) each fix one build path, but every new path that reads history from disk is a fresh place the same 400 can resurface.

Fix

Normalize at write time. At the point where the assistant message dict is built (around run_agent.py:8085), always populate reasoning_content:

  1. prefer the SDK-supplied assistant_message.reasoning_content when present (may carry structured data);
  2. otherwise fall back to the already-sanitized reasoning_text that was accumulated from streaming deltas;
  3. finally default to "" so non-thinking providers ignore it harmlessly while DeepSeek/Kimi see a valid (empty) value.

The internal reasoning alias is preserved for backward compatibility with existing read paths and downstream consumers — this PR is purely additive on the wire format.

This makes the four landed read-side fixes redundant safety nets rather than mandatory promotion paths.

Diff

run_agent.py | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)

The change is local to the assistant-message persistence block; no behavioural change for currently-working DeepSeek paths (those already get reasoning_content from the SDK or from the existing tool-call guard).

Not in scope

  • Migration of existing poisoned session files under ~/.hermes/sessions/** — happy to follow up in a separate PR if maintainers want it; the original issue author offered one as well.
  • Removing the read-side _copy_reasoning_content_for_api promotion logic — leaving it in place as defense in depth.

Refs #16844

run_agent.py persisted assistant turns with the chain of thought under
the internal field 'reasoning' and only wrote the protocol-standard
'reasoning_content' when (a) the upstream SDK happened to expose it as
a top-level attribute, or (b) the current provider was DeepSeek and the
turn had tool_calls. Streaming-only providers (glm, MiniMax, gpt-5.x via
aigw, Anthropic via openai-compat shim, etc.) accumulate reasoning from
delta.reasoning_content into a local string but never set the SDK
attribute, so the persisted message is missing reasoning_content.

The bug is silent until the user later replays that history through a
DeepSeek-v4 / Kimi thinking model, which requires reasoning_content on
every replayed assistant turn:

    The reasoning_content in the thinking mode must be passed back to
    the API.

Read-side patches (NousResearch#15213, NousResearch#15741, NousResearch#15748, NousResearch#15353) each fix one build
path, but every new path that reads history from disk is a fresh place
the same 400 can resurface.

Normalize at write time:
- prefer the SDK-supplied reasoning_content when present (may carry
  structured data);
- otherwise fall back to the already-sanitized reasoning_text that was
  accumulated from streaming deltas;
- finally default to "" so non-thinking providers ignore the field
  harmlessly while DeepSeek/Kimi see a valid (empty) value.

The internal 'reasoning' alias is preserved for backward compatibility
with existing read paths and downstream consumers.

Refs NousResearch#16844
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder provider/deepseek DeepSeek API provider/kimi Kimi / Moonshot labels Apr 28, 2026
teknium1 added a commit that referenced this pull request Apr 28, 2026
…6844) (#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in #16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs #16844, #16884, #15250, #15353, #15748.
@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #16892 (merged as d63abbc).

Thank you for the careful diagnosis and the write-side forensic data in #16844 — 4,031 poisoned messages across 1,101 files was the detail that made the bug's scope undeniable. The rework in #16892 keeps your core insight (promote streamed reasoning to reasoning_content at write time) but layers it on top of the existing branch rather than replacing it, so:

Live E2E confirmed the fix is load-bearing — before/after diff at scenario 1 goes from reasoning_content absent to populated from the streamed reasoning text, exactly matching the #16844 bug profile.

Credit preserved via the merged commit message. Thanks again!

@teknium1 teknium1 closed this Apr 28, 2026
cluricaun28 referenced this pull request in cluricaun28/Logos Apr 28, 2026
…6844) (#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in #16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs #16844, #16884, #15250, #15353, #15748.
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…usResearch#16844) (NousResearch#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (NousResearch#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (NousResearch#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (NousResearch#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in NousResearch#16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs NousResearch#16844, NousResearch#16884, NousResearch#15250, NousResearch#15353, NousResearch#15748.
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
…usResearch#16844) (NousResearch#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (NousResearch#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (NousResearch#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (NousResearch#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in NousResearch#16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs NousResearch#16844, NousResearch#16884, NousResearch#15250, NousResearch#15353, NousResearch#15748.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…usResearch#16844) (NousResearch#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (NousResearch#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (NousResearch#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (NousResearch#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in NousResearch#16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs NousResearch#16844, NousResearch#16884, NousResearch#15250, NousResearch#15353, NousResearch#15748.
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
…usResearch#16844) (NousResearch#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (NousResearch#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (NousResearch#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (NousResearch#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in NousResearch#16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs NousResearch#16844, NousResearch#16884, NousResearch#15250, NousResearch#15353, NousResearch#15748.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…usResearch#16844) (NousResearch#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (NousResearch#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (NousResearch#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (NousResearch#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in NousResearch#16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs NousResearch#16844, NousResearch#16884, NousResearch#15250, NousResearch#15353, NousResearch#15748.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…usResearch#16844) (NousResearch#16892)

Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.

The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (NousResearch#16844).

Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (NousResearch#15250) rather than replacing them,
so every existing behavior is preserved:

- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
  wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
  but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
  _copy_reasoning_content_for_api's cross-provider leak guard (NousResearch#15748),
  promote-from-'reasoning' tier, and thinking-pad tier remain live at
  replay time.
- No empty '' gets eagerly written on every assistant turn (which would
  have bypassed the read-side ladder and triggered empty thinking-block
  insertion in the Anthropic adapter).

Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.

Credit @Sanjays2402 for the original diagnosis and patch in NousResearch#16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.

Refs NousResearch#16844, NousResearch#16884, NousResearch#15250, NousResearch#15353, NousResearch#15748.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround provider/deepseek DeepSeek API provider/kimi Kimi / Moonshot type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants