Problem
After context compression completes, the HUD shows -1/262.1K (negative tokens). Immediately after, a new user message triggers a second compression before any API call has returned real token data. The display then jumps to 121.5K/262.1K, cmp2 → cmp3.
Expected: After compression, last_prompt_tokens reflects the compressed size. A new message should NOT trigger another compression until the user actually adds enough tokens to exceed the threshold.
Reproduction
- Run a long session until context compression fires (threshold ~50%).
- Compression completes → HUD shows
-1/262.1K, cmp1.
- Immediately send a new message (e.g., ask a question).
- Observe: compression triggers again →
cmp2 (or cmp3).
- Eventually API returns real token count → display shows
121.5K/262.1K.
The extra compression is wasteful and confuses the user.
Root Cause
conversation_loop.py line 3907:
elif _compressor.last_prompt_tokens == -1:
# Compression just ran and no API-reported prompt count
# has arrived yet. Avoid treating a schema-heavy rough
# post-compression estimate as real context pressure.
_real_tokens = 0
The -1 sentinel is meant as a temporary flag to skip compression until real API data arrives. However:
-
tui_gateway/server.py:1423 reads last_prompt_tokens directly for HUD display:
ctx_used = getattr(comp, 'last_prompt_tokens', 0) or usage['total'] or 0
Python's -1 or ... returns -1 (truthy), so the HUD shows -1/262K.
-
More critically, the second compression happens because should_compress() and the *_real_tokenscheck are separate paths. Whenlast_prompt_tokens == -1` and the user sends a new message, the rough estimate path can still trigger compression if the message history is large enough.
-
last_prompt_tokens = -1 is set after compression but never restored to the actual compressed token count. It waits for the next API response's prompt_tokens field — but that doesn't arrive until the API call completes.
Suggested Fix
After compression, estimate and set last_prompt_tokens to the compressed messages' token count:
# After _compress_context() in conversation_loop.py
_messages = agent._compress_context(...)
# Set last_prompt_tokens to a rough estimate of the compressed context
compressor.last_prompt_tokens = estimate_request_tokens_rough(_messages, tools=agent.tools or None)
Alternatively, change the HUD display to clamp negative values:
ctx_used = max(0, getattr(comp, 'last_prompt_tokens', 0) or usage['total'] or 0)
But the core fix should be updating last_prompt_tokens after compression rather than leaving it at -1.
Environment
- Hermes Agent v2026.5.29 (commit 79f7e7a)
- Model: Qwen/Qwen3.6-27B-FP8 via SGLang (192.168.14.32:8000)
- macOS 26.5, TUI mode
Problem
After context compression completes, the HUD shows
-1/262.1K(negative tokens). Immediately after, a new user message triggers a second compression before any API call has returned real token data. The display then jumps to121.5K/262.1K, cmp2 → cmp3.Expected: After compression,
last_prompt_tokensreflects the compressed size. A new message should NOT trigger another compression until the user actually adds enough tokens to exceed the threshold.Reproduction
-1/262.1K, cmp1.cmp2(orcmp3).121.5K/262.1K.The extra compression is wasteful and confuses the user.
Root Cause
conversation_loop.pyline 3907:The
-1sentinel is meant as a temporary flag to skip compression until real API data arrives. However:tui_gateway/server.py:1423readslast_prompt_tokensdirectly for HUD display:Python's
-1 or ...returns-1(truthy), so the HUD shows-1/262K.More critically, the second compression happens because
should_compress()and the *_real_tokenscheck are separate paths. Whenlast_prompt_tokens == -1` and the user sends a new message, the rough estimate path can still trigger compression if the message history is large enough.last_prompt_tokens = -1is set after compression but never restored to the actual compressed token count. It waits for the next API response'sprompt_tokensfield — but that doesn't arrive until the API call completes.Suggested Fix
After compression, estimate and set
last_prompt_tokensto the compressed messages' token count:Alternatively, change the HUD display to clamp negative values:
But the core fix should be updating
last_prompt_tokensafter compression rather than leaving it at-1.Environment