[Bug]: Large tool results consume entire tail token budget — conversation messages lost to summary on compression

## Bug Description

When context compression triggers during a session that contains large tool results (terminal output, git logs, file diffs, search results), the tail protection mechanism correctly preserves recent messages by token budget — but the budget is almost entirely consumed by the large tool outputs. The user's actual conversation messages (questions, instructions, task context) get pushed out of the tail and into the compressed summary region.

The result feels like "the conversation disappeared": after compression, the agent sees tool call history but loses the conversational context of what was being discussed. The user's most recent messages are buried in the LLM-generated summary rather than being in the active context window.

## Technical Details

### How the tail budget works

In `agent/context_compressor.py`, `_find_tail_cut_by_tokens()` walks backward from the end of the message list, accumulating tokens until `tail_token_budget` is reached:

```python
# Derived budgets (128K context model example):
tail_token_budget = threshold_tokens × summary_target_ratio
                 = 64000 × 0.20
                 = ~12,800 tokens

# protect_last_n = 20 (hard minimum floor)
```

### The problem

1. A single large `terminal` tool result (e.g. `npm test` output, build log, `git diff`) can easily be 3,000-8,000+ tokens
2. The backward walk accumulates these large tool results first (they're at the end)
3. After 2-3 large tool results, the entire ~12.8K token budget is exhausted
4. The boundary (`cut_idx`) is placed such that user/assistant conversation messages just before those tool results fall into the "middle" region — which gets summarized
5. `_ensure_last_user_message_in_tail()` only protects the **single** most recent user message; earlier but still-recent user messages and assistant responses are lost to the summary

### Concrete scenario

```
messages[40]: user: "Now run the test suite and fix any failures"     ← pushed to summary
messages[41]: assistant: "Running tests..."                            ← pushed to summary  
messages[42]: tool: [terminal] npm test → 5000 lines of output        ← in tail (8K tokens)
messages[43]: assistant: "3 tests failed, fixing..."                   ← pushed to summary
messages[44]: tool: [terminal] npm test → 5000 lines of output        ← in tail (6K tokens)
messages[45]: user: "Also check the lint warnings"                     ← in tail (barely)
```

After compression, the agent sees ~14K tokens of test output but has lost the conversational thread about *why* tests were being run and what was being fixed.

## Code References

- `agent/context_compressor.py` — `_find_tail_cut_by_tokens()` (~line 420): backward walk with token budget
- `agent/context_compressor.py` — `_prune_old_tool_results()`: pre-pass pruning only affects messages **outside** the tail boundary
- `agent/context_compressor.py` — `_ensure_last_user_message_in_tail()`: only anchors the **last** user message
- `tail_token_budget` derived in `__init__()`: `int(threshold_tokens * summary_target_ratio)`

## Proposed Solution: Pre-truncate Tool Results in Tail Before Budget Calculation

**Option A — Conversation message floor (complementary):**
Add a guarantee that the last N user/assistant text messages (excluding tool results) are always preserved in the tail, regardless of tool result sizes. This acts as a safety net:

```python
# In _find_tail_cut_by_tokens(), after the backward token walk:
# Count conversation messages (user + assistant without tool_calls) in the tail
conv_msgs_in_tail = sum(
    1 for m in messages[cut_idx:]
    if m["role"] in ("user", "assistant") and not m.get("tool_calls")
)
# If fewer than CONVERSATION_FLOOR, expand the tail backward
CONVERSATION_FLOOR = 6  # guarantee at least 6 conversational turns
while conv_msgs_in_tail < CONVERSATION_FLOOR and cut_idx > head_end + 1:
    cut_idx -= 1
    m = messages[cut_idx]
    if m["role"] in ("user", "assistant") and not m.get("tool_calls"):
        conv_msgs_in_tail += 1
```

**Option B — Truncate tool results in tail before budget calculation (primary fix):**
Before calculating the tail boundary, cap tool results in the tail region to a reasonable size. This ensures the budget is spent on a *mix* of conversation + tool context:

```python
# Before _find_tail_cut_by_tokens():
MAX_TOOL_RESULT_TAIL_TOKENS = 2000  # per tool result

# Create a temporary view where tool results are truncated
# Use this truncated view for tail boundary calculation
# Then apply the boundary to the ORIGINAL messages (keeping full tool results in the tail)
```

This way:
- Full tool results are still sent to the model (they're in the tail)
- But the boundary calculation isn't skewed by oversized outputs
- Conversation messages are more likely to be included in the tail

**Recommended:** Implement both — Option B as the primary fix, Option A as a safety net.

## Impact

- **Severity:** High — causes task amnesia in long sessions with heavy tool use
- **Frequency:** Common during SWE/coding workflows (build-fix-test loops, git operations, large file reads)
- **User impact:** Agent appears to "forget" what it was doing and needs task re-explanation after every compression cycle

## Environment

- Any model with context compression enabled
- Most noticeable on 128K context models where `tail_token_budget` ≈ 12.8K tokens
- Exacerbated by tools that produce large outputs (terminal, search_files, read_file on large files)

## Related Issues

- #12131 — Context lost when summary generation fails (different root cause, similar symptom)
- #11588 — Preserve-on-failure principle (broader compression reliability)
- #10896 — Last user message lost to compression (partial fix via `_ensure_last_user_message_in_tail`, but doesn't cover the multi-message case described here)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Large tool results consume entire tail token budget — conversation messages lost to summary on compression #13164

Bug Description

Technical Details

How the tail budget works

The problem

Concrete scenario

Code References

Proposed Solution: Pre-truncate Tool Results in Tail Before Budget Calculation

Impact

Environment

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Large tool results consume entire tail token budget — conversation messages lost to summary on compression #13164

Description

Bug Description

Technical Details

How the tail budget works

The problem

Concrete scenario

Code References

Proposed Solution: Pre-truncate Tool Results in Tail Before Budget Calculation

Impact

Environment

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions