fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)#14969
fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)#14969Bartok9 wants to merge 1 commit into
Conversation
…story (NousResearch#14944) Background review notifications were incorrectly including successful tool results that were already present in the conversation history *before* the review agent ran. Root cause: _spawn_background_review initialises the review sub-agent with conversation_history=messages_snapshot, which populates _session_messages with the full history. The post-run scan then iterated over ALL messages in _session_messages — including the historical role=tool messages from the snapshot — treating them as newly performed review actions. Fix: record snapshot_len before run_conversation(), then slice _session_messages[snapshot_len:] so the scan covers only messages that the review agent actually appended during its run. Adds 7 unit tests covering: - stale tool results not surfaced (regression for NousResearch#14944) - new tool results correctly surfaced - mixed history: only new results reported - empty snapshot: all messages scanned as new - failed tool results excluded - deduplication preserved - exact reporter scenario from issue description Related: NousResearch#9055, NousResearch#9696 (same code path)
|
I rechecked this against current main |
|
Thanks for the fix @Bartok9 — same bug was also addressed by #14967 (submitted ~37s before yours) which we just merged via salvage #15057. Your slice-based approach ( Your 216-line test suite was excellent. Really appreciate the detailed root-cause writeup and the PR body table. Please keep contributing — next one where your approach is the clear winner, it gets merged. |
Problem
Background review notifications were surfacing old successful tool results from the conversation history as if they had just been performed by the review agent.
For example, after a user created a cron job in an earlier turn, a later unrelated background review would repeat:
...even though the review agent never touched cron — it only updated the profile.
Reported in #14944. Related code path also noted in #9055 and #9696.
Root Cause
_spawn_background_reviewpassesconversation_history=messages_snapshotto the review agent, which populates_session_messageswith the full prior history (including oldrole=toolmessages). The post-run scan then iterated over all of_session_messages— treating historical tool results as new review actions.Fix
Record the snapshot length before
run_conversation(), then slice to only the messages the review agent actually appended:This is a minimal, zero-risk change — it only affects which messages are summarised in the notification, not the review agent's behaviour or memory writes.
Tests
Added
tests/agent/test_bg_review_stale_tool_results.pywith 7 unit tests:test_stale_tool_result_not_surfacedtest_new_tool_result_is_surfacedtest_mixed_stale_and_new_only_new_surfacedtest_empty_snapshot_all_msgs_scannedtest_failed_tool_results_excludedtest_deduplication_preservedtest_regression_issue_14944All 7 pass ✅
Closes #14944