fix(agent): scan only new tool results in bg review, skip snapshot history (#14944) by Bartok9 · Pull Request #14969 · NousResearch/hermes-agent

Bartok9 · 2026-04-24T07:35:01Z

Problem

Background review notifications were surfacing old successful tool results from the conversation history as if they had just been performed by the review agent.

For example, after a user created a cron job in an earlier turn, a later unrelated background review would repeat:

💾 Cron job '<reminder name>' created. · User profile updated

...even though the review agent never touched cron — it only updated the profile.

Reported in #14944. Related code path also noted in #9055 and #9696.

Root Cause

_spawn_background_review passes conversation_history=messages_snapshot to the review agent, which populates _session_messages with the full prior history (including old role=tool messages). The post-run scan then iterated over all of _session_messages — treating historical tool results as new review actions.

# Before: scans entire _session_messages including snapshot history
for msg in getattr(review_agent, "_session_messages", []):
    ...

Fix

Record the snapshot length before run_conversation(), then slice to only the messages the review agent actually appended:

snapshot_len = len(messages_snapshot)
all_msgs = getattr(review_agent, "_session_messages", [])
new_msgs = all_msgs[snapshot_len:]  # only messages added by the review agent

for msg in new_msgs:
    ...

This is a minimal, zero-risk change — it only affects which messages are summarised in the notification, not the review agent's behaviour or memory writes.

Tests

Added tests/agent/test_bg_review_stale_tool_results.py with 7 unit tests:

Test	Verifies
`test_stale_tool_result_not_surfaced`	Historical tool results ignored
`test_new_tool_result_is_surfaced`	Review agent's own results reported
`test_mixed_stale_and_new_only_new_surfaced`	Mixed history: only new results show
`test_empty_snapshot_all_msgs_scanned`	No snapshot → all messages are new
`test_failed_tool_results_excluded`	Failures never shown
`test_deduplication_preserved`	Duplicate strings collapsed
`test_regression_issue_14944`	Exact scenario from the bug report

All 7 pass ✅

Closes #14944

…story (NousResearch#14944) Background review notifications were incorrectly including successful tool results that were already present in the conversation history *before* the review agent ran. Root cause: _spawn_background_review initialises the review sub-agent with conversation_history=messages_snapshot, which populates _session_messages with the full history. The post-run scan then iterated over ALL messages in _session_messages — including the historical role=tool messages from the snapshot — treating them as newly performed review actions. Fix: record snapshot_len before run_conversation(), then slice _session_messages[snapshot_len:] so the scan covers only messages that the review agent actually appended during its run. Adds 7 unit tests covering: - stale tool results not surfaced (regression for NousResearch#14944) - new tool results correctly surfaced - mixed history: only new results reported - empty snapshot: all messages scanned as new - failed tool results excluded - deduplication preserved - exact reporter scenario from issue description Related: NousResearch#9055, NousResearch#9696 (same code path)

Bartok9 · 2026-04-24T07:35:45Z

I rechecked this against current main 6051fba9 — no conflicts, the change is isolated to the _spawn_background_review function in run_agent.py. CI test failures (e2e and check-attribution) are pre-existing on main and unrelated to this PR.

alt-glitch · 2026-04-24T07:50:27Z

Competing fix for #14944 alongside #14967 and #9696 — all three PRs address the same stale-tool-result bug in background review. Recommend maintainer pick one.

teknium1 · 2026-04-24T10:10:24Z

Thanks for the fix @Bartok9 — same bug was also addressed by #14967 (submitted ~37s before yours) which we just merged via salvage #15057.

Your slice-based approach (_session_messages[len(snapshot):]) was cleaner and smaller, but we went with #14967's ID-based matching because it's robust to any future init-step that might reorder/filter the inherited history (compression, prefix-cache replay, etc.). ID matching survives those changes; slice boundary does not.

Your 216-line test suite was excellent. Really appreciate the detailed root-cause writeup and the PR body table. Please keep contributing — next one where your approach is the clear winner, it gets merged.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026

alt-glitch mentioned this pull request Apr 24, 2026

fix(agent): exclude prior-history tool messages from background review summary #14967

Closed

13 tasks

teknium1 mentioned this pull request Apr 24, 2026

fix(agent): exclude prior-history tool messages from background review summary (salvage #14967) #15057

Merged

teknium1 closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)#14969

fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)#14969
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/14944-bg-review-stale-tool-results

Bartok9 commented Apr 24, 2026

Uh oh!

Bartok9 commented Apr 24, 2026

Uh oh!

alt-glitch commented Apr 24, 2026

Uh oh!

teknium1 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Bartok9 commented Apr 24, 2026

Problem

Root Cause

Fix

Tests

Uh oh!

Bartok9 commented Apr 24, 2026

Uh oh!

alt-glitch commented Apr 24, 2026

Uh oh!

teknium1 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants