Skip to content

fix(agent): prevent silent tool result loss during context compression#1976

Closed
Gutslabs wants to merge 1 commit into
NousResearch:mainfrom
Gutslabs:fix/compression-tool-result-loss
Closed

fix(agent): prevent silent tool result loss during context compression#1976
Gutslabs wants to merge 1 commit into
NousResearch:mainfrom
Gutslabs:fix/compression-tool-result-loss

Conversation

@Gutslabs

Copy link
Copy Markdown
Contributor

Summary

Fix silent data loss in context compression when parallel tool calls (3+) span multiple messages and the compression boundary falls in the middle of a tool-result group.

The Bug

_align_boundary_backward only checked messages[idx-1] to decide if the compress-end boundary splits a tool_call/result group. When an assistant issues 3+ parallel tool calls, their results span multiple consecutive messages. If the tail protection boundary (n_messages - protect_last_n) fell in the middle of that group, the method saw another tool result at idx-1 (not the parent assistant) and did nothing.

This caused a silent data loss chain:

  1. The parent assistant message was summarized away (included in the middle region)
  2. The orphaned tool result(s) remained in the tail
  3. _sanitize_tool_pairs removed them as orphans
  4. The content was neither in the summary nor preserved — gone forever

No error, no warning, no indication that data was lost. The agent continued operating with incomplete information.

Triggering scenario

Any conversation where the model issues >= 3 parallel tool calls AND (total_messages - protect_last_n) falls in the middle of the resulting tool result messages. With default protect_last_n=4, this happens whenever the last 4 messages start inside a tool result group.

Concrete example with 15 messages (protect_first_n=3, protect_last_n=4):

tc_G's result at [11] is orphaned: parent assistant [4] was summarized, result [11] is in the tail. _sanitize_tool_pairs deletes it. Content lost forever.

The Fix

_align_boundary_backward now walks backward through all consecutive tool results to find the parent assistant, then pulls the boundary before the entire group so it gets summarized together.

Test Plan

6 regression tests added (tests/test_compression_boundary.py):

  • Clean boundary (no adjustment needed)
  • Original case: assistant at idx-1
  • NEW: boundary in middle of tool result group (the bug)
  • NEW: boundary after last tool result in a group
  • NEW: consecutive tool groups (only walks to nearest parent)
  • End-to-end: compress() with 7 parallel tool calls — verifies no orphans and no stub results

Tests confirmed: 3/6 FAIL with old code, 6/6 PASS with fix.

_align_boundary_backward only checked messages[idx-1] when deciding
whether the compress-end boundary splits a tool_call/result group.
When an assistant issues 3+ parallel tool calls, the resulting tool
messages span multiple consecutive positions. If the tail protection
boundary (n_messages - protect_last_n) fell in the MIDDLE of that
group, the method saw another tool result at idx-1 (not the parent
assistant) and did nothing.

This caused a silent data loss chain:
1. The parent assistant message was summarized away (middle region)
2. The orphaned tool result(s) remained in the tail
3. _sanitize_tool_pairs removed them as orphans
4. The content was neither in the summary nor preserved — gone forever

The fix walks backward through all consecutive tool results to find
the parent assistant, then pulls the boundary before the entire group.

Includes 6 regression tests covering:
- Clean boundaries (no adjustment needed)
- Original case (assistant at idx-1)
- NEW: boundary in middle of tool result group
- NEW: boundary after last tool result in a group
- NEW: consecutive tool groups (only walks to nearest parent)
- End-to-end: compress() with 7 parallel tool calls
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #1993. Your core fix for _align_boundary_backward was cherry-picked onto current main with authorship preserved. The api_key constructor addition and print→logger cleanup you included were already independently merged to main, so only the boundary fix and your tests were needed. Thank you for catching this silent data loss bug!

@teknium1 teknium1 closed this Mar 18, 2026
spiky02plateau added a commit to spiky02plateau/hermes-agent that referenced this pull request Jun 3, 2026
…xt override unset

Subprocess env builders (_sanitize_subprocess_env, _make_run_env) pin
HOME to the profile's home/ dir but only inject HERMES_HOME from the
_HERMES_HOME_OVERRIDE ContextVar, which is unset for background/PTY/cron
spawns. The child then has HOME={profile}/home but no HERMES_HOME, so
get_hermes_home() falls back to ~/.hermes and reads the default profile's
config/auth/memory instead of its own — cross-profile data corruption.

Add a single os.getenv("HERMES_HOME") fallback in the shared
_inject_context_hermes_home() so the common single-profile-gateway case
is covered. The ContextVar keeps precedence (per-session profile
mutation, NousResearch#1976); only one key (HERMES_HOME, a non-secret path) is
touched, so the secret-isolation invariant is intact.

Fixes NousResearch#4707

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants