feat(compressor): implement preflight compression check by kshitijk4poor · Pull Request #9675 · NousResearch/hermes-agent

kshitijk4poor · 2026-04-14T13:28:23Z

Summary

Override should_compress_preflight() on ContextCompressor to enable pre-API-call compression detection.

Problem

The ContextEngine ABC defines should_compress_preflight() but the default ContextCompressor inherits the base return False. This means preflight compression never fires — a conversation that's already over the limit still makes a full API call (paying for all those tokens) before compression triggers.

The preflight path exists in run_agent.py:7941-7997 and calls estimate_request_tokens_rough(), but the compressor itself never reports "yes, compress now" via its own preflight method.

Fix

3-line override that uses the existing estimate_messages_tokens_rough() (already imported) to do a cheap O(n) estimate and compare against threshold_tokens.

False negatives (under-estimate) just delay compression by one turn — current behavior. False positives (over-estimate) trigger an early compress that's harmless.

Test plan

All 63 existing compressor/engine tests pass
The ABC tests already verify the interface contract

Part of #9666.

…rade, hardening Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).

teknium1 · 2026-04-15T05:21:50Z

Closing as part of the compression PR triage (#9666). #9665: the user message preservation concept is good but appending verbatim messages after the LLM summary risks duplication — would need to be injected into the summarizer prompt instead. #9675: should_compress_preflight() is never called by run_agent.py (the preflight loop does its own estimation directly). See #10088 for the merged improvements.

…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).

ContextEngine.should_compress_preflight() is documented as the per-turn ingest entry for plugin engines, but run_agent.py never calls it. PR NousResearch#10088 explicitly noted this as dead code when skipping NousResearch#9675: > NousResearch#9675 (preflight check) — dead code, run_agent.py never calls > should_compress_preflight() This breaks plugin context engines that rely on the hook for per-turn message ingest. hermes-lcm overrides should_compress_preflight() to persist messages each turn into its DAG store, but with the hook never called, the lossless message store stays empty until compress() fires at the threshold (typically ~96K tokens). Reproducible: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 0 (Verified on hermes-agent v0.11.0 with hermes-lcm v0.7.0.) Add two calls to should_compress_preflight(messages): 1. Top of the main loop, right after api_call_count is incremented — per-turn ingest before each API call. 2. End of run_conversation(), before the on_session_end plugin hook — final flush so the last assistant message reaches the engine when the turn exited via the no-tool-calls branch and skipped the per-turn hook above. The return value is discarded; compression is still decided by the later should_compress(_real_tokens) call which uses the provider- reported token count. Both calls are wrapped in try/except so a misbehaving plugin engine cannot break the conversation loop. Default ContextEngine.should_compress_preflight() returns False with no work, so this is zero overhead for the built-in ContextCompressor and any engine that does not override the hook. After this fix: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 2 Refs: - NousResearch#9675 (closed: feat(compressor): implement preflight compression check) - NousResearch#10088 (merged body: skipped NousResearch#9675 as dead code) - stephenschoettler/hermes-lcm#68 (LCM author flagged host integration issue but could not file upstream because GitHub Issues was off on a different fork) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).

feat(compressor): implement preflight compression check

c9b5985

kshitijk4poor mentioned this pull request Apr 14, 2026

tracking: context compression improvements #9666

Closed

7 tasks

teknium1 mentioned this pull request Apr 15, 2026

feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening #10088

Merged

teknium1 mentioned this pull request Apr 15, 2026

feat(compressor): preserve user messages verbatim during compression #9665

Closed

teknium1 closed this Apr 15, 2026

catgodtw mentioned this pull request Apr 25, 2026

fix(run_agent): wire up should_compress_preflight() per-turn ingest hook #15806

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compressor): implement preflight compression check#9675

feat(compressor): implement preflight compression check#9675
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/compression-preflight

kshitijk4poor commented Apr 14, 2026

Uh oh!

teknium1 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented Apr 14, 2026

Summary

Problem

Fix

Test plan

Uh oh!

teknium1 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants