fix(agent): exclude prior-history tool messages from background review summary by luyao618 · Pull Request #14967 · NousResearch/hermes-agent

luyao618 · 2026-04-24T07:34:24Z

What does this PR do?

Fixes a bug where the background memory/skill review's user-visible summary (💾 ...) re-surfaces stale tool successes from the prior conversation as if they had just happened.

_spawn_background_review forks a new AIAgent initialized with conversation_history=messages_snapshot. The forked agent's _session_messages therefore contains tool messages copied from the prior conversation. The post-review scan that builds the summary walked the entire _session_messages list and reported every successful tool result it found, so historical actions (e.g. an earlier Cron job '...' created.) were re-announced — sometimes repeatedly across unrelated background-review runs.

Related Issue

Fixes #14944

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

Extracted the scan into a new AIAgent._summarize_background_review_actions staticmethod for testability.
Before scanning, collect every tool_call_id already present in messages_snapshot and skip review messages whose tool_call_id matches — those are inherited from the prior conversation, not new actions.
For tool messages without a tool_call_id, fall back to content-equality against the prior snapshot's anonymous tool messages.
Hardened data handling so a non-dict JSON payload no longer raises in the data.get("success") branch.

How to Test

Repro per the issue: in a gateway session create a one-shot cron reminder, then continue chatting until the background memory/profile review fires. Before this fix, the next review's 💾 notification could include Cron job '<reminder>' created. even though no cron was created during that review. After the fix it doesn't.
Run the targeted tests: pytest tests/run_agent/test_background_review_summary.py -v
Run the broader run_agent suite: pytest tests/run_agent/test_run_agent.py -q

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run pytest tests/ -q and all tests pass
I've added tests for my changes
I've tested on my platform: macOS (Darwin 25.4.0, Apple Silicon), Python 3.11

Documentation & Housekeeping

Updated relevant documentation — N/A
Updated cli-config.yaml.example — N/A
Updated contributing / agents docs — N/A
Considered cross-platform impact — N/A (logic-only, no platform-specific paths)
Updated tool descriptions/schemas — N/A

…w summary The background memory/skill review forks a new AIAgent with conversation_history=messages_snapshot. The forked agent's _session_messages therefore contains tool messages copied from the prior conversation. The post-review scan that builds the user-visible 💾 summary walked the entire _session_messages list, so historical successes (e.g. 'Cron job '...' created.') were re-surfaced as if they had just happened — sometimes multiple times across unrelated background-review runs. Extract the scan into a staticmethod and skip any tool message whose tool_call_id was already present in messages_snapshot, with a content-equality fallback for tool messages that lack one. Fixes NousResearch#14944

alt-glitch · 2026-04-24T07:50:29Z

Competing fix for #14944 alongside #14969 and #9696 — all three PRs address the same stale-tool-result bug in background review. Recommend maintainer pick one.

@luyao618

…w summary Cherry-pick-of: 27b6a21 (PR #14967 by @luyao618) Co-authored-by: luyao618 <364939526@qq.com>

teknium1 · 2026-04-24T10:10:22Z

Merged via #15057 — your commit was cherry-picked onto current main with your authorship preserved in git log.

Chose this implementation over the parallel #14969 because:

ID-based matching is robust to any future change to how _session_messages gets populated during agent init
The extracted _summarize_background_review_actions staticmethod is directly testable without mocking the spawn path
Matches the issue author's explicit suggested approach verbatim
Bonus: hardens the data.get('success') branch against non-dict JSON payloads (latent crash)

Thanks for the clean refactor and comprehensive tests.

@luyao618