feat(tool-result-compaction): add opt-in large tool result compaction by pinguarmy · Pull Request #29454 · NousResearch/hermes-agent

pinguarmy · 2026-05-20T20:17:13Z

Hermes Tool Result Compaction

Opt-in string-tool-result compaction with disk-backed raw storage, quota cleanup, documentation, benchmark/replay scripts, and 51 tests passing.

What this PR does

When tool_result_compaction.enabled: true is set in ~/.hermes/config.yaml, large string tool results above the configured threshold are:

Saved to disk at ~/.hermes/raw_results/<session_id>/<tool_call_id>_<tool_name>_<timestamp>.json
Replaced in message history with a compact JSON reference containing tool name, call id, original char/token count, first+last preview, raw result path, and a recovery note
Managed by disk quota — oldest raw result files are purged when default 500 MB quota is exceeded; the current file is protected from immediate cleanup

Disabled by default. No behavior change without explicit opt-in to config.yaml.

Design

Opt-in: DEFAULT_ENABLED = False
String-only: non-string / multimodal tool results are skipped by the existing isinstance(..., str) guard
Fail-open: any config, filesystem, permission, or compaction error returns the original content unchanged (tested explicitly)
LLM-free: deterministic first+last preview; no summarization model call
Private storage: raw result directories use 0o700; raw JSON files use 0o600
Existing pipeline compatible: runs after maybe_persist_tool_result() and _tool_result_content_for_active_model()
Race-safe: symlinks excluded from iteration; stat failures in sort/cleanup fall back gracefully

Files

File	Purpose
`agent/tool_result_compaction.py`	Config schema, token estimate, compaction core, raw storage, permissions, quota cleanup, fail-open handling
`agent/tool_executor.py`	Integration in sequential and concurrent tool-result paths before `messages.append()`
`scripts/benchmark_tool_result_compaction.py`	Synthetic LLM-free benchmark
`scripts/replay_tool_result_compaction.py`	JSONL replay benchmark
`docs/tool-result-compaction.md`	Full documentation: config, behavior, storage, recovery, benchmarks, limitations
`docs/tool-result-compaction-review-checklist.md`	Pre-merge review checklist
`examples/tool-result-compaction.config.yaml`	Copyable config snippet
5 test files under `tests/agent/`	Core, quota, tool message shape, config coercion, edge cases
2 test files under `tests/scripts/`	Benchmark and replay script tests

Config

tool_result_compaction:
  enabled: false
  threshold_tokens: 5000
  raw_result_dir: ""          # empty => ~/.hermes/raw_results
  max_disk_mb: 500
  preview_chars: 1000         # first 1000 + last 1000 chars

Benchmarks

Synthetic (10 results @ 50K chars each)

before_tokens_estimate: 125,000
after_tokens_estimate:    6,450
saved_percent_estimate:  94.84%

Replay (2 JSONL objects: 50K + small string)

saved_percent_estimate:   94.94%
compacted_count:          1

Both are LLM-free, using the same chars / 4 estimate as the compaction module.

Verification

py_compile: OK
pytest: 51 passed in 1.99s
ruff: All checks passed (11 files)
benchmark: 94.84% estimated savings

Commits (squashed to 4)

e2134a280 docs(tool-result-compaction): document configuration and tradeoffs
a60f96ef5 bench(tool-result-compaction): add synthetic and replay benchmarks
a5d301235 test(tool-result-compaction): cover compaction behavior and edge cases
74e14b732 feat(tool-result-compaction): add opt-in tool result compaction

Known tradeoffs

raw_result_path is a local filesystem path embedded in the compacted JSON for model recovery. Works locally; remote/docker modes would need a reference-ID resolver. (Documented in Limitations.)
Docs and benchmark scripts are in the same PR. Could be split into a follow-up if upstream prefers a minimal diff.

Status

Ready for review. 4 squashed commits, 51 tests passing, all checks green.

pinguarmy · 2026-05-20T21:40:29Z

This PR is ready for review now.

Final cleanup is complete:

squashed to 4 logical commits
py_compile: OK
pytest: 51 passed
ruff: all checks passed
synthetic benchmark: ~94.84% estimated token reduction
replay smoke: ~94.94% estimated token reduction

The feature is default-off, string-only, LLM-free, and fail-open. The main known tradeoff is that v1 recovery exposes a local raw_result_path; this is documented in the PR body and docs, with a future reference-ID resolver noted as a possible follow-up.

I'm not asking to change the P3 priority; just flagging that the branch is now cleaned up and ready whenever someone has review bandwidth.

pinguarmy · 2026-05-20T21:48:05Z

Noting related work for reviewer context: I found #28098 and #6339 after preparing this PR.

#28098 is closest in motivation: it adds plugin-based, terminal-only tool-result compaction. This PR takes a different path: core agent-side compaction for any string tool result, with disk-backed raw-result recovery, quota cleanup, private permissions, and fail-open behavior.

#6339 is also related, but focuses on context-window overflow / dynamic budgeting. This PR focuses on reducing repeated tool-result token pressure in long tool-heavy sessions, even before the context window is near overflow.

If maintainers prefer the plugin/hook direction from #28098, I'm happy to adapt this PR or split out reusable pieces.

alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have labels May 20, 2026

pinguarmy changed the title ~~feat(tool-result-compaction): add config schema and passthrough stub (PR 1)~~ feat(tool-result-compaction): add opt-in large tool result compaction May 20, 2026

郝鹏宇 added 4 commits May 20, 2026 22:34

feat(tool-result-compaction): add opt-in tool result compaction

74e14b7

test(tool-result-compaction): cover compaction behavior and edge cases

a5d3012

bench(tool-result-compaction): add synthetic and replay benchmarks

a60f96e

docs(tool-result-compaction): document configuration and tradeoffs

e2134a2

pinguarmy force-pushed the feat/tool-result-compaction branch from 8170741 to e2134a2 Compare May 20, 2026 21:34

pinguarmy marked this pull request as ready for review May 20, 2026 21:40

alt-glitch mentioned this pull request May 23, 2026

feat(tool-context): compact large tool results before replay #30980

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tool-result-compaction): add opt-in large tool result compaction#29454

feat(tool-result-compaction): add opt-in large tool result compaction#29454
pinguarmy wants to merge 4 commits into
NousResearch:mainfrom
pinguarmy:feat/tool-result-compaction

pinguarmy commented May 20, 2026 •

edited

Loading

Uh oh!

pinguarmy commented May 20, 2026

Uh oh!

pinguarmy commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pinguarmy commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hermes Tool Result Compaction

What this PR does

Design

Files

Config

Benchmarks

Synthetic (10 results @ 50K chars each)

Replay (2 JSONL objects: 50K + small string)

Verification

Commits (squashed to 4)

Known tradeoffs

Status

Uh oh!

pinguarmy commented May 20, 2026

Uh oh!

pinguarmy commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pinguarmy commented May 20, 2026 •

edited

Loading