Skip to content

fix: persist and restore assistant reasoning across gateway session turns (#2936)#2941

Closed
Mibayy wants to merge 2 commits into
NousResearch:mainfrom
Mibayy:fix/gateway-context-files-cwd
Closed

fix: persist and restore assistant reasoning across gateway session turns (#2936)#2941
Mibayy wants to merge 2 commits into
NousResearch:mainfrom
Mibayy:fix/gateway-context-files-cwd

Conversation

@Mibayy

@Mibayy Mibayy commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Problem

Closes #2936.

When using the Telegram (or any messaging) gateway with a reasoning model — Hermes-4-405B/70B with reasoning_effort: high — the model never invokes tools in a multi-turn session. It generates hallucinated tool outputs as natural language instead. Direct API calls and cron scheduler runs work correctly.

Root cause (corrects the diagnosis in #2936)

The XML system-prompt hypothesis in the issue is incorrect — the _convert_to_trajectory_format block at line 1618 is a datagen/trajectory-saving helper, not the live API path. Tool schemas are already passed natively via api_kwargs["tools"] to the Nous endpoint.

The actual bug is a missing column in the session DB.

Every gateway message creates a fresh AIAgent instance and reloads conversation history from SQLite via get_messages_as_conversation(). When the model is reasoning-capable, assistant messages carry a reasoning field (the chain-of-thought tokens). This field was stored only in the in-process message list — never written to the messages table.

On reload, the restored history contains assistant messages that called tools but apparently did so with no reasoning. The Nous inference API enforces reasoning consistency across turns: when it sees a prior assistant tool-call without accompanying reasoning_content, it treats the conversation as broken and the model falls back to generating free text rather than structured tool calls.

This explains the three-way behavior difference reported:

  • Direct API calls — single turn, no history reload → works
  • Cron scheduler — fresh agent, conversation_history=None → works
  • Telegram gateway — reloads truncated DB history on every message from turn 2 onward → fails

Fix

hermes_state.py — schema v6:

  • Add reasoning TEXT column to the messages table
  • Auto-migration via ALTER TABLE messages ADD COLUMN reasoning TEXT (backward-compatible, existing DBs keep working)
  • Include reasoning in both INSERT (via append_message()) and SELECT (via get_messages_as_conversation())

run_agent.py_flush_messages_to_session_db():

  • Pass msg.get('reasoning') to append_message() for assistant messages

Testing

The fix is safe for existing deployments:

  • ALTER TABLE ... ADD COLUMN is a no-op if the column already exists (guarded by try/except OperationalError)
  • Existing sessions without reasoning continue to work (column is NULL)
  • No data migration needed — old turns without reasoning are simply served as-is; only new turns after the upgrade will carry reasoning through

To reproduce the original bug on a live gateway: send a tool-requiring message (e.g. "create a daily cron job") with Hermes-4 + reasoning_effort: high and observe tool_calls: 0 in the session JSON. After this patch, the reasoning chain is preserved across turns and tool calls resume normally from turn 2 onward.

Hermes added 2 commits March 25, 2026 10:26
…urns (NousResearch#2936)

When a gateway session uses a reasoning model (Hermes-4, DeepSeek-R1, etc.)
with reasoning_effort > none, the model returns assistant messages that
include a 'reasoning' field alongside tool_calls or content.

Previously, this field was stored in the internal message list during a turn
but never persisted to SQLite. On the next gateway message (a fresh AIAgent
instance loads the transcript from the DB), the assistant messages arrived
without their reasoning chain.

The Nous inference API requires reasoning_content to be present in prior
assistant messages when the model originally produced it — receiving a
history where tool-calling assistant turns have no reasoning causes the API
to return a malformed continuation, and the model falls back to describing
tool usage in plain text rather than actually calling tools (zero
tool_calls across the entire 14-message session as reported).

This explains why:
- Direct API calls work (single-turn, no history reload)
- Cron jobs work (fresh session, conversation_history=None, no reload)
- Telegram gateway fails on turn 2+ (loads truncated history each message)

Fix:
- hermes_state.py: add 'reasoning' TEXT column to messages table with
  schema v6 migration (ALTER TABLE ... ADD COLUMN, backward-compatible)
- hermes_state.py: include reasoning in INSERT and SELECT for
  get_messages_as_conversation() so the restored history is complete
- run_agent.py: pass msg.get('reasoning') to append_message() for
  assistant messages in _flush_messages_to_session_db()
…tests

- Update test_schema_version and test_migration_from_v2 to expect v6
- Add test_reasoning_persisted_and_restored: verifies assistant reasoning
  survives a DB round-trip and appears in get_messages_as_conversation()
- Add test_reasoning_not_set_for_non_assistant: verifies reasoning is
  never leaked onto user or tool messages
@teknium1

Copy link
Copy Markdown
Contributor

Superseded by PR #2974, which persists all three reasoning field types (reasoning, reasoning_details, and codex_reasoning_items) rather than just reasoning. The schema migration and DB round-trip approach from your PR informed the final implementation — thanks for identifying the gap.

Note: the original diagnosis (Hermes-4 multi-turn tool calling failure due to missing reasoning) was incorrect — Hermes-4 models don't support multi-turn tool calling regardless of reasoning preservation. The fix is still valuable as future-proofing for reasoning models that do support it (DeepSeek R1, OpenAI o-series, etc.).

@teknium1 teknium1 closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hermes-4 does not invoke tools via Telegram gateway despite tools being loaded

2 participants