Bug: Context compression triggers repeatedly after fresh compress — last_prompt_tokens=-1 not updated until next API call

## Problem

After context compression completes, the HUD shows `-1/262.1K` (negative tokens). Immediately after, a new user message triggers a **second compression** before any API call has returned real token data. The display then jumps to `121.5K/262.1K`, cmp2 → cmp3.

**Expected**: After compression, `last_prompt_tokens` reflects the compressed size. A new message should NOT trigger another compression until the user actually adds enough tokens to exceed the threshold.

## Reproduction

1. Run a long session until context compression fires (threshold ~50%).
2. Compression completes → HUD shows `-1/262.1K, cmp1`.
3. Immediately send a new message (e.g., ask a question).
4. Observe: compression triggers again → `cmp2` (or `cmp3`).
5. Eventually API returns real token count → display shows `121.5K/262.1K`.

The extra compression is wasteful and confuses the user.

## Root Cause

`conversation_loop.py` line 3907:

```python
elif _compressor.last_prompt_tokens == -1:
    # Compression just ran and no API-reported prompt count
    # has arrived yet. Avoid treating a schema-heavy rough
    # post-compression estimate as real context pressure.
    _real_tokens = 0
```

The `-1` sentinel is meant as a temporary flag to skip compression until real API data arrives. However:

1. `tui_gateway/server.py:1423` reads `last_prompt_tokens` directly for HUD display:
   ```python
   ctx_used = getattr(comp, 'last_prompt_tokens', 0) or usage['total'] or 0
   ```
   Python's `-1 or ... ` returns `-1` (truthy), so the HUD shows `-1/262K`.

2. More critically, the second compression happens because `should_compress()` and the \*_real_tokens` check are separate paths. When `last_prompt_tokens == -1` and the user sends a new message, the rough estimate path can still trigger compression if the message history is large enough.

3. `last_prompt_tokens = -1` is set after compression but never restored to the actual compressed token count. It waits for the next API response's `prompt_tokens` field — but that doesn't arrive until the API call completes.

## Suggested Fix

After compression, estimate and set `last_prompt_tokens` to the compressed messages' token count:

```python
# After _compress_context() in conversation_loop.py
_messages = agent._compress_context(...)
# Set last_prompt_tokens to a rough estimate of the compressed context
compressor.last_prompt_tokens = estimate_request_tokens_rough(_messages, tools=agent.tools or None)
```

Alternatively, change the HUD display to clamp negative values:

```python
ctx_used = max(0, getattr(comp, 'last_prompt_tokens', 0) or usage['total'] or 0)
```

But the core fix should be updating `last_prompt_tokens` after compression rather than leaving it at `-1`.

## Environment

- Hermes Agent v2026.5.29 (commit 79f7e7a1e)
- Model: Qwen/Qwen3.6-27B-FP8 via SGLang (192.168.14.32:8000)
- macOS 26.5, TUI mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Context compression triggers repeatedly after fresh compress — last_prompt_tokens=-1 not updated until next API call #36718

Problem

Reproduction

Root Cause

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Context compression triggers repeatedly after fresh compress — last_prompt_tokens=-1 not updated until next API call #36718

Description

Problem

Reproduction

Root Cause

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions