Skip to content

feat(tool-result-compaction): add opt-in large tool result compaction#29454

Open
pinguarmy wants to merge 4 commits into
NousResearch:mainfrom
pinguarmy:feat/tool-result-compaction
Open

feat(tool-result-compaction): add opt-in large tool result compaction#29454
pinguarmy wants to merge 4 commits into
NousResearch:mainfrom
pinguarmy:feat/tool-result-compaction

Conversation

@pinguarmy

@pinguarmy pinguarmy commented May 20, 2026

Copy link
Copy Markdown

Hermes Tool Result Compaction

Opt-in string-tool-result compaction with disk-backed raw storage, quota cleanup, documentation, benchmark/replay scripts, and 51 tests passing.

What this PR does

When tool_result_compaction.enabled: true is set in ~/.hermes/config.yaml, large string tool results above the configured threshold are:

  1. Saved to disk at ~/.hermes/raw_results/<session_id>/<tool_call_id>_<tool_name>_<timestamp>.json
  2. Replaced in message history with a compact JSON reference containing tool name, call id, original char/token count, first+last preview, raw result path, and a recovery note
  3. Managed by disk quota — oldest raw result files are purged when default 500 MB quota is exceeded; the current file is protected from immediate cleanup

Disabled by default. No behavior change without explicit opt-in to config.yaml.

Design

  • Opt-in: DEFAULT_ENABLED = False
  • String-only: non-string / multimodal tool results are skipped by the existing isinstance(..., str) guard
  • Fail-open: any config, filesystem, permission, or compaction error returns the original content unchanged (tested explicitly)
  • LLM-free: deterministic first+last preview; no summarization model call
  • Private storage: raw result directories use 0o700; raw JSON files use 0o600
  • Existing pipeline compatible: runs after maybe_persist_tool_result() and _tool_result_content_for_active_model()
  • Race-safe: symlinks excluded from iteration; stat failures in sort/cleanup fall back gracefully

Files

File Purpose
agent/tool_result_compaction.py Config schema, token estimate, compaction core, raw storage, permissions, quota cleanup, fail-open handling
agent/tool_executor.py Integration in sequential and concurrent tool-result paths before messages.append()
scripts/benchmark_tool_result_compaction.py Synthetic LLM-free benchmark
scripts/replay_tool_result_compaction.py JSONL replay benchmark
docs/tool-result-compaction.md Full documentation: config, behavior, storage, recovery, benchmarks, limitations
docs/tool-result-compaction-review-checklist.md Pre-merge review checklist
examples/tool-result-compaction.config.yaml Copyable config snippet
5 test files under tests/agent/ Core, quota, tool message shape, config coercion, edge cases
2 test files under tests/scripts/ Benchmark and replay script tests

Config

tool_result_compaction:
  enabled: false
  threshold_tokens: 5000
  raw_result_dir: ""          # empty => ~/.hermes/raw_results
  max_disk_mb: 500
  preview_chars: 1000         # first 1000 + last 1000 chars

Benchmarks

Synthetic (10 results @ 50K chars each)

before_tokens_estimate: 125,000
after_tokens_estimate:    6,450
saved_percent_estimate:  94.84%

Replay (2 JSONL objects: 50K + small string)

saved_percent_estimate:   94.94%
compacted_count:          1

Both are LLM-free, using the same chars / 4 estimate as the compaction module.

Verification

py_compile: OK
pytest: 51 passed in 1.99s
ruff: All checks passed (11 files)
benchmark: 94.84% estimated savings

Commits (squashed to 4)

e2134a280 docs(tool-result-compaction): document configuration and tradeoffs
a60f96ef5 bench(tool-result-compaction): add synthetic and replay benchmarks
a5d301235 test(tool-result-compaction): cover compaction behavior and edge cases
74e14b732 feat(tool-result-compaction): add opt-in tool result compaction

Known tradeoffs

  • raw_result_path is a local filesystem path embedded in the compacted JSON for model recovery. Works locally; remote/docker modes would need a reference-ID resolver. (Documented in Limitations.)
  • Docs and benchmark scripts are in the same PR. Could be split into a follow-up if upstream prefers a minimal diff.

Status

Ready for review. 4 squashed commits, 51 tests passing, all checks green.

@alt-glitch alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have labels May 20, 2026
@pinguarmy pinguarmy changed the title feat(tool-result-compaction): add config schema and passthrough stub (PR 1) feat(tool-result-compaction): add opt-in large tool result compaction May 20, 2026
@pinguarmy pinguarmy force-pushed the feat/tool-result-compaction branch from 8170741 to e2134a2 Compare May 20, 2026 21:34
@pinguarmy

Copy link
Copy Markdown
Author

This PR is ready for review now.

Final cleanup is complete:

  • squashed to 4 logical commits
  • py_compile: OK
  • pytest: 51 passed
  • ruff: all checks passed
  • synthetic benchmark: ~94.84% estimated token reduction
  • replay smoke: ~94.94% estimated token reduction

The feature is default-off, string-only, LLM-free, and fail-open. The main known tradeoff is that v1 recovery exposes a local raw_result_path; this is documented in the PR body and docs, with a future reference-ID resolver noted as a possible follow-up.

I'm not asking to change the P3 priority; just flagging that the branch is now cleaned up and ready whenever someone has review bandwidth.

@pinguarmy pinguarmy marked this pull request as ready for review May 20, 2026 21:40
@pinguarmy

Copy link
Copy Markdown
Author

Noting related work for reviewer context: I found #28098 and #6339 after preparing this PR.

#28098 is closest in motivation: it adds plugin-based, terminal-only tool-result compaction. This PR takes a different path: core agent-side compaction for any string tool result, with disk-backed raw-result recovery, quota cleanup, private permissions, and fail-open behavior.

#6339 is also related, but focuses on context-window overflow / dynamic budgeting. This PR focuses on reducing repeated tool-result token pressure in long tool-heavy sessions, even before the context window is near overflow.

If maintainers prefer the plugin/hook direction from #28098, I'm happy to adapt this PR or split out reusable pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants