Skip to content

feat(compressor): implement preflight compression check#9675

Closed
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/compression-preflight
Closed

feat(compressor): implement preflight compression check#9675
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/compression-preflight

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Summary

Override should_compress_preflight() on ContextCompressor to enable pre-API-call compression detection.

Problem

The ContextEngine ABC defines should_compress_preflight() but the default ContextCompressor inherits the base return False. This means preflight compression never fires — a conversation that's already over the limit still makes a full API call (paying for all those tokens) before compression triggers.

The preflight path exists in run_agent.py:7941-7997 and calls estimate_request_tokens_rough(), but the compressor itself never reports "yes, compress now" via its own preflight method.

Fix

3-line override that uses the existing estimate_messages_tokens_rough() (already imported) to do a cheap O(n) estimate and compare against threshold_tokens.

False negatives (under-estimate) just delay compression by one turn — current behavior. False positives (over-estimate) trigger an early compress that's harmless.

Test plan

  • All 63 existing compressor/engine tests pass
  • The ABC tests already verify the interface contract

Part of #9666.

teknium1 pushed a commit that referenced this pull request Apr 15, 2026
…rade, hardening

Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
teknium1 pushed a commit that referenced this pull request Apr 15, 2026
…rade, hardening

Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
@teknium1

Copy link
Copy Markdown
Contributor

Closing as part of the compression PR triage (#9666). #9665: the user message preservation concept is good but appending verbatim messages after the LLM summary risks duplication — would need to be injected into the summarizer prompt instead. #9675: should_compress_preflight() is never called by run_agent.py (the preflight loop does its own estimation directly). See #10088 for the merged improvements.

@teknium1 teknium1 closed this Apr 15, 2026
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…rade, hardening

Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…rade, hardening

Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
Ataraksea pushed a commit to Ataraksea/hermes-agent that referenced this pull request May 6, 2026
ContextEngine.should_compress_preflight() is documented as the per-turn
ingest entry for plugin engines, but run_agent.py never calls it. PR
NousResearch#10088 explicitly noted this as dead code when skipping NousResearch#9675:

> NousResearch#9675 (preflight check) — dead code, run_agent.py never calls
> should_compress_preflight()

This breaks plugin context engines that rely on the hook for per-turn
message ingest. hermes-lcm overrides should_compress_preflight() to
persist messages each turn into its DAG store, but with the hook never
called, the lossless message store stays empty until compress() fires
at the threshold (typically ~96K tokens). Reproducible:

  $ hermes chat -q "test" -Q
  $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;"
  0

(Verified on hermes-agent v0.11.0 with hermes-lcm v0.7.0.)

Add two calls to should_compress_preflight(messages):

1. Top of the main loop, right after api_call_count is incremented —
   per-turn ingest before each API call.
2. End of run_conversation(), before the on_session_end plugin hook —
   final flush so the last assistant message reaches the engine when
   the turn exited via the no-tool-calls branch and skipped the
   per-turn hook above.

The return value is discarded; compression is still decided by the
later should_compress(_real_tokens) call which uses the provider-
reported token count. Both calls are wrapped in try/except so a
misbehaving plugin engine cannot break the conversation loop.

Default ContextEngine.should_compress_preflight() returns False with no
work, so this is zero overhead for the built-in ContextCompressor and
any engine that does not override the hook.

After this fix:
  $ hermes chat -q "test" -Q
  $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;"
  2

Refs:
- NousResearch#9675 (closed: feat(compressor): implement preflight compression check)
- NousResearch#10088 (merged body: skipped NousResearch#9675 as dead code)
- stephenschoettler/hermes-lcm#68 (LCM author flagged host integration
  issue but could not file upstream because GitHub Issues was off on a
  different fork)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…rade, hardening

Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…rade, hardening

Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…rade, hardening

Combined salvage of PRs NousResearch#9661, NousResearch#9663, NousResearch#9674, NousResearch#9677, NousResearch#9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs NousResearch#9665 (user msg preservation — duplication risk) and NousResearch#9675 (dead code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants