Skip to content

Bug: Context compression triggers repeatedly after fresh compress — last_prompt_tokens=-1 not updated until next API call #36718

@laoli-no1

Description

@laoli-no1

Problem

After context compression completes, the HUD shows -1/262.1K (negative tokens). Immediately after, a new user message triggers a second compression before any API call has returned real token data. The display then jumps to 121.5K/262.1K, cmp2 → cmp3.

Expected: After compression, last_prompt_tokens reflects the compressed size. A new message should NOT trigger another compression until the user actually adds enough tokens to exceed the threshold.

Reproduction

  1. Run a long session until context compression fires (threshold ~50%).
  2. Compression completes → HUD shows -1/262.1K, cmp1.
  3. Immediately send a new message (e.g., ask a question).
  4. Observe: compression triggers again → cmp2 (or cmp3).
  5. Eventually API returns real token count → display shows 121.5K/262.1K.

The extra compression is wasteful and confuses the user.

Root Cause

conversation_loop.py line 3907:

elif _compressor.last_prompt_tokens == -1:
    # Compression just ran and no API-reported prompt count
    # has arrived yet. Avoid treating a schema-heavy rough
    # post-compression estimate as real context pressure.
    _real_tokens = 0

The -1 sentinel is meant as a temporary flag to skip compression until real API data arrives. However:

  1. tui_gateway/server.py:1423 reads last_prompt_tokens directly for HUD display:

    ctx_used = getattr(comp, 'last_prompt_tokens', 0) or usage['total'] or 0

    Python's -1 or ... returns -1 (truthy), so the HUD shows -1/262K.

  2. More critically, the second compression happens because should_compress() and the *_real_tokenscheck are separate paths. Whenlast_prompt_tokens == -1` and the user sends a new message, the rough estimate path can still trigger compression if the message history is large enough.

  3. last_prompt_tokens = -1 is set after compression but never restored to the actual compressed token count. It waits for the next API response's prompt_tokens field — but that doesn't arrive until the API call completes.

Suggested Fix

After compression, estimate and set last_prompt_tokens to the compressed messages' token count:

# After _compress_context() in conversation_loop.py
_messages = agent._compress_context(...)
# Set last_prompt_tokens to a rough estimate of the compressed context
compressor.last_prompt_tokens = estimate_request_tokens_rough(_messages, tools=agent.tools or None)

Alternatively, change the HUD display to clamp negative values:

ctx_used = max(0, getattr(comp, 'last_prompt_tokens', 0) or usage['total'] or 0)

But the core fix should be updating last_prompt_tokens after compression rather than leaving it at -1.

Environment

  • Hermes Agent v2026.5.29 (commit 79f7e7a)
  • Model: Qwen/Qwen3.6-27B-FP8 via SGLang (192.168.14.32:8000)
  • macOS 26.5, TUI mode

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions