feat(compressor): implement preflight compression check#9675
Closed
kshitijk4poor wants to merge 1 commit into
Closed
feat(compressor): implement preflight compression check#9675kshitijk4poor wants to merge 1 commit into
kshitijk4poor wants to merge 1 commit into
Conversation
7 tasks
teknium1
pushed a commit
that referenced
this pull request
Apr 15, 2026
…rade, hardening Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
teknium1
pushed a commit
that referenced
this pull request
Apr 15, 2026
…rade, hardening Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
Contributor
|
Closing as part of the compression PR triage (#9666). #9665: the user message preservation concept is good but appending verbatim messages after the LLM summary risks duplication — would need to be injected into the summarizer prompt instead. #9675: |
23 tasks
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
aj-nt
pushed a commit
to aj-nt/hermes-agent
that referenced
this pull request
May 1, 2026
…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
Ataraksea
pushed a commit
to Ataraksea/hermes-agent
that referenced
this pull request
May 6, 2026
ContextEngine.should_compress_preflight() is documented as the per-turn ingest entry for plugin engines, but run_agent.py never calls it. PR NousResearch#10088 explicitly noted this as dead code when skipping NousResearch#9675: > NousResearch#9675 (preflight check) — dead code, run_agent.py never calls > should_compress_preflight() This breaks plugin context engines that rely on the hook for per-turn message ingest. hermes-lcm overrides should_compress_preflight() to persist messages each turn into its DAG store, but with the hook never called, the lossless message store stays empty until compress() fires at the threshold (typically ~96K tokens). Reproducible: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 0 (Verified on hermes-agent v0.11.0 with hermes-lcm v0.7.0.) Add two calls to should_compress_preflight(messages): 1. Top of the main loop, right after api_call_count is incremented — per-turn ingest before each API call. 2. End of run_conversation(), before the on_session_end plugin hook — final flush so the last assistant message reaches the engine when the turn exited via the no-tool-calls branch and skipped the per-turn hook above. The return value is discarded; compression is still decided by the later should_compress(_real_tokens) call which uses the provider- reported token count. Both calls are wrapped in try/except so a misbehaving plugin engine cannot break the conversation loop. Default ContextEngine.should_compress_preflight() returns False with no work, so this is zero overhead for the built-in ContextCompressor and any engine that does not override the hook. After this fix: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 2 Refs: - NousResearch#9675 (closed: feat(compressor): implement preflight compression check) - NousResearch#10088 (merged body: skipped NousResearch#9675 as dead code) - stephenschoettler/hermes-lcm#68 (LCM author flagged host integration issue but could not file upstream because GitHub Issues was off on a different fork) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…rade, hardening Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor. - Smart tool output collapse: informative 1-line summaries replace generic placeholder - Dedup identical tool results via MD5 hash, truncate large tool_call arguments - Anti-thrashing: skip compression after 2 consecutive <10% savings passes - Structured action-log summary template with numbered actions and Active State - Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown Follow-up fixes applied during salvage: - web_extract: reads 'urls' (list) not 'url' (original PR bug) - Multimodal list content guards in dedup and prune passes - Kept 'Relevant Files' section in template (original PR removed it) Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Override
should_compress_preflight()onContextCompressorto enable pre-API-call compression detection.Problem
The
ContextEngineABC definesshould_compress_preflight()but the defaultContextCompressorinherits the basereturn False. This means preflight compression never fires — a conversation that's already over the limit still makes a full API call (paying for all those tokens) before compression triggers.The preflight path exists in
run_agent.py:7941-7997and callsestimate_request_tokens_rough(), but the compressor itself never reports "yes, compress now" via its own preflight method.Fix
3-line override that uses the existing
estimate_messages_tokens_rough()(already imported) to do a cheap O(n) estimate and compare againstthreshold_tokens.False negatives (under-estimate) just delay compression by one turn — current behavior. False positives (over-estimate) trigger an early compress that's harmless.
Test plan
Part of #9666.