fix(research): keep tool_call/tool_response pairs intact when compressing trajectories by synapsesx · Pull Request #40495 · NousResearch/hermes-agent

synapsesx · 2026-06-06T12:52:26Z

What does this PR do?

The trajectory compressor could corrupt training trajectories by cutting a
conversation in the middle of a tool-call/tool-response pair. In the from/value
trajectory format a tool turn (carrying <tool_response> markers) is always
emitted immediately after the gpt turn whose <tool_call> it answers, so the
two turns must stay together. The compressible region's end boundary, however,
was chosen purely by token accumulation: the loop stopped at the first turn where
the accumulated tokens met the savings target, with no regard for turn roles. For
any over-budget trajectory whose savings boundary happened to land between a gpt
turn and its tool turn, the gpt (with its <tool_call>) was summarised away
into the replacement human message while the now-orphaned tool turn (with its
<tool_response>) was kept verbatim in the tail — producing an unmatched marker
and silently corrupting the training signal. The head boundary had the mirror
problem when the first tool turn was not protected.

This change snaps both compression boundaries to a clean turn boundary before the
region is extracted and replaced, so the summary always covers whole gpt+tool
blocks and a tool turn is never separated from the gpt turn that precedes it.
The boundary is moved forward when possible (folding an orphaned tool turn into
the region that already holds its gpt) and falls back to moving backward when no
clean boundary exists ahead, such as when the protected tail itself begins on a
tool turn.

Related Issue

N/A

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

trajectory_compressor.py: added _is_boundary_clean() and _snap_boundary()
helpers on TrajectoryCompressor, and applied them to both the head and tail
compression boundaries in compress_trajectory() and
compress_trajectory_async(). When snapping collapses the region to nothing
safe to compress, the trajectory is returned unchanged and flagged as still
over the limit rather than being corrupted.
tests/test_trajectory_compressor.py: added TestCompressionToolPairIntegrity
covering the sync and async paths plus direct unit tests for the boundary
snapping (forward skip and backward fallback).

How to Test

Run the focused tests: pytest tests/test_trajectory_compressor.py -q.
The new sync/async cases build a trajectory of gpt/tool pairs with an oversized
middle gpt turn and choose a token target that forces the accumulation
boundary to stop between a <tool_call> and its <tool_response>. They assert
that <tool_call> and <tool_response> markers stay balanced after
compression and that every kept tool turn is immediately preceded by a gpt
turn (never the inserted summary or another tool turn).

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15 (Darwin 25.5)

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

…sing trajectories ## What does this PR do? The trajectory compressor could corrupt training trajectories by cutting a conversation in the middle of a tool-call/tool-response pair. In the from/value trajectory format a `tool` turn (carrying `<tool_response>` markers) is always emitted immediately after the `gpt` turn whose `<tool_call>` it answers, so the two turns must stay together. The compressible region's end boundary, however, was chosen purely by token accumulation: the loop stopped at the first turn where the accumulated tokens met the savings target, with no regard for turn roles. For any over-budget trajectory whose savings boundary happened to land between a `gpt` turn and its `tool` turn, the `gpt` (with its `<tool_call>`) was summarised away into the replacement `human` message while the now-orphaned `tool` turn (with its `<tool_response>`) was kept verbatim in the tail — producing an unmatched marker and silently corrupting the training signal. The head boundary had the mirror problem when the first tool turn was not protected. This change snaps both compression boundaries to a clean turn boundary before the region is extracted and replaced, so the summary always covers whole gpt+tool blocks and a `tool` turn is never separated from the `gpt` turn that precedes it. The boundary is moved forward when possible (folding an orphaned tool turn into the region that already holds its gpt) and falls back to moving backward when no clean boundary exists ahead, such as when the protected tail itself begins on a tool turn. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) ## Changes Made - `trajectory_compressor.py`: added `_is_boundary_clean()` and `_snap_boundary()` helpers on `TrajectoryCompressor`, and applied them to both the head and tail compression boundaries in `compress_trajectory()` and `compress_trajectory_async()`. When snapping collapses the region to nothing safe to compress, the trajectory is returned unchanged and flagged as still over the limit rather than being corrupted. - `tests/test_trajectory_compressor.py`: added `TestCompressionToolPairIntegrity` covering the sync and async paths plus direct unit tests for the boundary snapping (forward skip and backward fallback). ## How to Test 1. Run the focused tests: `pytest tests/test_trajectory_compressor.py -q`. 2. The new sync/async cases build a trajectory of gpt/tool pairs with an oversized middle gpt turn and choose a token target that forces the accumulation boundary to stop between a `<tool_call>` and its `<tool_response>`. They assert that `<tool_call>` and `<tool_response>` markers stay balanced after compression and that every kept `tool` turn is immediately preceded by a `gpt` turn (never the inserted summary or another tool turn). ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

teknium1 · 2026-06-06T15:29:26Z

Salvaged into #40593 — cherry-picked your commit, authorship preserved, added you to the AUTHOR_MAP. Confirmed it's the offline (training-data) compressor path so no live prompt-cache implications. Thanks!

#40593

* upstream/main: (430 commits) fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (NousResearch#40607) docs: add Urdu translation of README (NousResearch#40578) fix(hindsight): send only new-turn delta on append retains instead of whole session (NousResearch#40605) feat(gateway): render terminal tool calls as native bash code blocks on markdown platforms (NousResearch#41215) feat(desktop): stop the chat viewport from following streaming output (NousResearch#41414) chore(release): map AlchemistChaos co-author email for NousResearch#40135 salvage fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend fix(web): make _has_env config-aware so SEARXNG_URL auto-detect honors Hermes config fix(web): honor Hermes config-aware SEARXNG_URL lookup install.sh: hint at root-owned npm cache when desktop npm install fails (NousResearch#39688) fix(tools): percent-encode non-ascii URL components fix(skills): browse shows full catalog, not first 5000 (NousResearch#41413) feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network feat(desktop): full tool-backend config (pickers + per-backend settings) in Settings (NousResearch#41232) hardening(api-server): scan cron prompts on REST create/update for parity with the agent tool fix: skip MCP preflight content-type probe on reconnect when already ready (NousResearch#40604) fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests fix: defer scratch workspace cleanup when task has active children (NousResearch#33774) feat(onboarding): opt-in structured profile-build path on first contact (NousResearch#41114) feat(compression): temporal anchoring in compaction summaries (NousResearch#41102) test(discord): align clarify/model-picker tests with fail-closed component auth (NousResearch#41338) chore(release): map Dusk1e and LaPhilosophie for approval fail-closed salvage (NousResearch#33844, NousResearch#33866, NousResearch#30964) fix(discord): fail closed for component button auth when no allowlist set fix(feishu): fail closed for update prompt card actions fix(slack): re-check gateway auth on approval and slash-confirm buttons fix: guard int(os.getenv()) casts against malformed env vars (NousResearch#40598) fix: respect Honcho env var fallback in doctor and honcho status chore(release): add synapsesx to AUTHOR_MAP for NousResearch#40495 salvage fix(research): keep tool_call/tool_response pairs intact when compressing trajectories fix(simplex): accept display name in SIMPLEX_ALLOWED_USERS fix(desktop): make the running-turn timer per-session (NousResearch#41182) test(approval): regression for shell-escape denylist bypass (NousResearch#36846, NousResearch#36847) fix(security): strip shell escapes in denylist normalizer; fail-closed on missing approval module fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (NousResearch#40589) fix(desktop): bootstrap falls back to installed agent install.sh on GitHub 404 feat(dashboard): change UI font from the theme picker, independent of theme (NousResearch#41145) fix(cli): return bool (not None) when a destructive-slash confirmation is cancelled (NousResearch#40583) fix(desktop): preserve configured base_url on same-provider model switch (NousResearch#41121) fix(desktop): stop bare-URL autolinker swallowing trailing emphasis asterisks (NousResearch#41093) fix(cron): bound the desktop run-history query to one job (NousResearch#41088) fix(desktop): scope in-session /model switch per-session, stop process-env leak (NousResearch#41120) chore: map bmoore210 author email for PR NousResearch#40550 salvage fix(desktop): scope session list to active profile + longer timeout fix: harden gateway startup and turn persistence fix(computer_use): honor custom vision routing fix(aux): honor model.default_headers on auxiliary client too (NousResearch#40033) fix(agent): honor model.default_headers for custom OpenAI-compatible providers (NousResearch#40033) docs(i18n): port deep-audit corrections to zh-Hans mirror (NousResearch#41104) fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (NousResearch#36718) chore(release): map singhsanidhya741@gmail.com to sanidhyasin (NousResearch#41094) ...

…lvage

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have labels Jun 6, 2026

teknium1 mentioned this pull request Jun 6, 2026

fix(research): keep tool_call/tool_response pairs intact when compressing trajectories #40593

Merged

teknium1 closed this Jun 6, 2026

teknium1 added a commit that referenced this pull request Jun 7, 2026

chore(release): add synapsesx to AUTHOR_MAP for #40495 salvage

fa8fd51

changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026

chore(release): add synapsesx to AUTHOR_MAP for NousResearch#40495 sa…

2e496e2

…lvage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(research): keep tool_call/tool_response pairs intact when compressing trajectories#40495

fix(research): keep tool_call/tool_response pairs intact when compressing trajectories#40495
synapsesx wants to merge 1 commit into
NousResearch:mainfrom
synapsesx:fix/compression-tool-pair-boundary

synapsesx commented Jun 6, 2026

Uh oh!

teknium1 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

synapsesx commented Jun 6, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Uh oh!

teknium1 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants