tracking: context compression improvements

## Context Compression Improvements

Tracking issue for seven independent improvements to `agent/context_compressor.py` that improve information retention, efficiency, and robustness during context compression.

### PRs — Information Quality (merged from original batch)

- [ ] #9661 — **Smart tool output collapse**: Replace generic placeholder with informative 1-line summaries (`[terminal] ran \`npm test\` -> exit 0, 47 lines`)
- [ ] #9663 — **Structured action-log format**: Redesign summary template with numbered actions, `[tool: name]` tags, and Active State section
- [ ] #9665 — **Preserve user messages**: Append verbatim user messages from compressed turns to the summary

### PRs — Robustness & Efficiency (new batch)

- [ ] #9674 — **Anti-thrashing protection**: Detect ineffective compression loops (<10% savings) and back off after 2 consecutive failures
- [ ] #9675 — **Preflight compression check**: Override `should_compress_preflight()` to avoid paying for a full API call when already over the limit
- [ ] #9677 — **Dedup + argument pruning**: Deduplicate identical tool results (same file read 5x) + truncate large tool_call arguments in assistant messages
- [ ] #9678 — **Compressor hardening**: Summary max_tokens cap (2x→1.3x), multimodal content safety, compression note fix, adaptive failure cooldown (600s→60s for transient errors)

### Benchmark Summary

**Smart tool collapse** (PR #9661):
| Metric | Before | After |
|--------|--------|-------|
| Space savings | 99.3% | 98.9% |
| Key facts preserved (tool name, file path, command, exit code) | 0/24 | 24/24 |

**User message preservation** (PR #9665):
| Metric | Before | After |
|--------|--------|-------|
| User preferences surviving compression | 0/6 | 6/6 |

**Anti-thrashing** (PR #9674): Prevents infinite compression loops that burn LLM summary calls with <10% savings each.

**Dedup** (PR #9677): 5 reads of the same 10KB file → 50KB in context → deduped to 10KB + 4 stubs.

**Hardening** (PR #9678): Summary model now gets 1.3x budget instead of 2x (cost saving). Transient errors cool down for 60s instead of 600s.

### Independence

All 7 PRs are independent — each can be merged separately. They touch different methods within `context_compressor.py`:
- #9661: `_prune_old_tool_results()` + new `_summarize_tool_result()`
- #9663: `_generate_summary()` prompt template
- #9665: New `_extract_user_messages()` + `compress()` Phase 3b
- #9674: `should_compress()` + `compress()` savings tracking + `on_session_reset()`
- #9675: New `should_compress_preflight()` override
- #9677: `_prune_old_tool_results()` dedup + arg pruning passes
- #9678: `_generate_summary()` max_tokens, `_prune_old_tool_results()` multimodal, `compress()` note, failure cooldown

### Related issues addressed

- #9561 — compression too abrupt / repeated ineffective passes → #9674
- #499 — tool details truncated, quality bottleneck, no dedup → #9661, #9663, #9677
- #9413, #9631 — iterative summary quality → #9663
- #7133 — incoherent after compression on small models → #9674 (prevents re-compression loops)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracking: context compression improvements #9666

Context Compression Improvements

PRs — Information Quality (merged from original batch)

PRs — Robustness & Efficiency (new batch)

Benchmark Summary

Independence

Related issues addressed

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Before	After
Space savings	99.3%	98.9%
Key facts preserved (tool name, file path, command, exit code)	0/24	24/24

tracking: context compression improvements #9666

Description

Context Compression Improvements

PRs — Information Quality (merged from original batch)

PRs — Robustness & Efficiency (new batch)

Benchmark Summary

Independence

Related issues addressed

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions