fix(research): keep tool_call/tool_response pairs intact when compressing trajectories#40495
Closed
synapsesx wants to merge 1 commit into
Closed
fix(research): keep tool_call/tool_response pairs intact when compressing trajectories#40495synapsesx wants to merge 1 commit into
synapsesx wants to merge 1 commit into
Conversation
…sing trajectories ## What does this PR do? The trajectory compressor could corrupt training trajectories by cutting a conversation in the middle of a tool-call/tool-response pair. In the from/value trajectory format a `tool` turn (carrying `<tool_response>` markers) is always emitted immediately after the `gpt` turn whose `<tool_call>` it answers, so the two turns must stay together. The compressible region's end boundary, however, was chosen purely by token accumulation: the loop stopped at the first turn where the accumulated tokens met the savings target, with no regard for turn roles. For any over-budget trajectory whose savings boundary happened to land between a `gpt` turn and its `tool` turn, the `gpt` (with its `<tool_call>`) was summarised away into the replacement `human` message while the now-orphaned `tool` turn (with its `<tool_response>`) was kept verbatim in the tail — producing an unmatched marker and silently corrupting the training signal. The head boundary had the mirror problem when the first tool turn was not protected. This change snaps both compression boundaries to a clean turn boundary before the region is extracted and replaced, so the summary always covers whole gpt+tool blocks and a `tool` turn is never separated from the `gpt` turn that precedes it. The boundary is moved forward when possible (folding an orphaned tool turn into the region that already holds its gpt) and falls back to moving backward when no clean boundary exists ahead, such as when the protected tail itself begins on a tool turn. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) ## Changes Made - `trajectory_compressor.py`: added `_is_boundary_clean()` and `_snap_boundary()` helpers on `TrajectoryCompressor`, and applied them to both the head and tail compression boundaries in `compress_trajectory()` and `compress_trajectory_async()`. When snapping collapses the region to nothing safe to compress, the trajectory is returned unchanged and flagged as still over the limit rather than being corrupted. - `tests/test_trajectory_compressor.py`: added `TestCompressionToolPairIntegrity` covering the sync and async paths plus direct unit tests for the boundary snapping (forward skip and backward fallback). ## How to Test 1. Run the focused tests: `pytest tests/test_trajectory_compressor.py -q`. 2. The new sync/async cases build a trajectory of gpt/tool pairs with an oversized middle gpt turn and choose a token target that forces the accumulation boundary to stop between a `<tool_call>` and its `<tool_response>`. They assert that `<tool_call>` and `<tool_response>` markers stay balanced after compression and that every kept `tool` turn is immediately preceded by a `gpt` turn (never the inserted summary or another tool turn). ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Contributor
teknium1
added a commit
that referenced
this pull request
Jun 7, 2026
agogo233
added a commit
to agogo233/hermes-agent
that referenced
this pull request
Jun 8, 2026
* upstream/main: (430 commits) fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (NousResearch#40607) docs: add Urdu translation of README (NousResearch#40578) fix(hindsight): send only new-turn delta on append retains instead of whole session (NousResearch#40605) feat(gateway): render terminal tool calls as native bash code blocks on markdown platforms (NousResearch#41215) feat(desktop): stop the chat viewport from following streaming output (NousResearch#41414) chore(release): map AlchemistChaos co-author email for NousResearch#40135 salvage fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend fix(web): make _has_env config-aware so SEARXNG_URL auto-detect honors Hermes config fix(web): honor Hermes config-aware SEARXNG_URL lookup install.sh: hint at root-owned npm cache when desktop npm install fails (NousResearch#39688) fix(tools): percent-encode non-ascii URL components fix(skills): browse shows full catalog, not first 5000 (NousResearch#41413) feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network feat(desktop): full tool-backend config (pickers + per-backend settings) in Settings (NousResearch#41232) hardening(api-server): scan cron prompts on REST create/update for parity with the agent tool fix: skip MCP preflight content-type probe on reconnect when already ready (NousResearch#40604) fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests fix: defer scratch workspace cleanup when task has active children (NousResearch#33774) feat(onboarding): opt-in structured profile-build path on first contact (NousResearch#41114) feat(compression): temporal anchoring in compaction summaries (NousResearch#41102) test(discord): align clarify/model-picker tests with fail-closed component auth (NousResearch#41338) chore(release): map Dusk1e and LaPhilosophie for approval fail-closed salvage (NousResearch#33844, NousResearch#33866, NousResearch#30964) fix(discord): fail closed for component button auth when no allowlist set fix(feishu): fail closed for update prompt card actions fix(slack): re-check gateway auth on approval and slash-confirm buttons fix: guard int(os.getenv()) casts against malformed env vars (NousResearch#40598) fix: respect Honcho env var fallback in doctor and honcho status chore(release): add synapsesx to AUTHOR_MAP for NousResearch#40495 salvage fix(research): keep tool_call/tool_response pairs intact when compressing trajectories fix(simplex): accept display name in SIMPLEX_ALLOWED_USERS fix(desktop): make the running-turn timer per-session (NousResearch#41182) test(approval): regression for shell-escape denylist bypass (NousResearch#36846, NousResearch#36847) fix(security): strip shell escapes in denylist normalizer; fail-closed on missing approval module fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (NousResearch#40589) fix(desktop): bootstrap falls back to installed agent install.sh on GitHub 404 feat(dashboard): change UI font from the theme picker, independent of theme (NousResearch#41145) fix(cli): return bool (not None) when a destructive-slash confirmation is cancelled (NousResearch#40583) fix(desktop): preserve configured base_url on same-provider model switch (NousResearch#41121) fix(desktop): stop bare-URL autolinker swallowing trailing emphasis asterisks (NousResearch#41093) fix(cron): bound the desktop run-history query to one job (NousResearch#41088) fix(desktop): scope in-session /model switch per-session, stop process-env leak (NousResearch#41120) chore: map bmoore210 author email for PR NousResearch#40550 salvage fix(desktop): scope session list to active profile + longer timeout fix: harden gateway startup and turn persistence fix(computer_use): honor custom vision routing fix(aux): honor model.default_headers on auxiliary client too (NousResearch#40033) fix(agent): honor model.default_headers for custom OpenAI-compatible providers (NousResearch#40033) docs(i18n): port deep-audit corrections to zh-Hans mirror (NousResearch#41104) fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (NousResearch#36718) chore(release): map singhsanidhya741@gmail.com to sanidhyasin (NousResearch#41094) ...
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The trajectory compressor could corrupt training trajectories by cutting a
conversation in the middle of a tool-call/tool-response pair. In the from/value
trajectory format a
toolturn (carrying<tool_response>markers) is alwaysemitted immediately after the
gptturn whose<tool_call>it answers, so thetwo turns must stay together. The compressible region's end boundary, however,
was chosen purely by token accumulation: the loop stopped at the first turn where
the accumulated tokens met the savings target, with no regard for turn roles. For
any over-budget trajectory whose savings boundary happened to land between a
gptturn and its
toolturn, thegpt(with its<tool_call>) was summarised awayinto the replacement
humanmessage while the now-orphanedtoolturn (with its<tool_response>) was kept verbatim in the tail — producing an unmatched markerand silently corrupting the training signal. The head boundary had the mirror
problem when the first tool turn was not protected.
This change snaps both compression boundaries to a clean turn boundary before the
region is extracted and replaced, so the summary always covers whole gpt+tool
blocks and a
toolturn is never separated from thegptturn that precedes it.The boundary is moved forward when possible (folding an orphaned tool turn into
the region that already holds its gpt) and falls back to moving backward when no
clean boundary exists ahead, such as when the protected tail itself begins on a
tool turn.
Related Issue
N/A
Type of Change
Changes Made
trajectory_compressor.py: added_is_boundary_clean()and_snap_boundary()helpers on
TrajectoryCompressor, and applied them to both the head and tailcompression boundaries in
compress_trajectory()andcompress_trajectory_async(). When snapping collapses the region to nothingsafe to compress, the trajectory is returned unchanged and flagged as still
over the limit rather than being corrupted.
tests/test_trajectory_compressor.py: addedTestCompressionToolPairIntegritycovering the sync and async paths plus direct unit tests for the boundary
snapping (forward skip and backward fallback).
How to Test
pytest tests/test_trajectory_compressor.py -q.middle gpt turn and choose a token target that forces the accumulation
boundary to stop between a
<tool_call>and its<tool_response>. They assertthat
<tool_call>and<tool_response>markers stay balanced aftercompression and that every kept
toolturn is immediately preceded by agptturn (never the inserted summary or another tool turn).
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/A