Feature: /microcompact — Instant, LLM-Free Surgical Context Stripping (inspired by vicnaum's Claude Code RE)

## Overview

Inspired by [@vicnaum's reverse-engineering of Claude Code](https://x.com/vicnaum/status/2029579972688379928) to add surgical context management, this proposes a new `/microcompact` command for Hermes Agent. It surgically removes heavy context artifacts — tool call/result pairs AND reasoning/thinking blocks — from conversation history **without LLM summarization**. Instant, free, and lossless for actual conversational content.

The core insight from vicnaum's work: when context fills up, the current options (`/compress` for LLM summarization, `/clear` for nuclear reset) are too coarse. Context is often 70-90% tool call/result pairs and thinking blocks — stripping these selectively can recover enormous context space while preserving every actual user/assistant message exchange intact. As one commenter put it: "/compact is a grenade. This is a scalpel."

This complements the existing automatic compression system (#513, #499, #415) by giving users a **manual, surgical** option when they want precise control.

---

## Research Findings

### How Claude Code's Context Management Works

Claude Code uses a three-layer system, revealed through vicnaum's reverse engineering:

**Layer 1 — microcompact (silent, every turn):** Runs silently before each API call. Replaces OLD tool_result content with placeholder text like `[Previous: used {tool_name}]`. Only targets results > 100 chars. Keeps a "hot tail" of the N most recent tool results intact. Never removes `tool_use` blocks themselves.

**Layer 2 — auto-compact (threshold triggered):** At ~75-95% context capacity, runs full LLM summarization. Saves full transcript to `.transcripts/` before replacing. Structured summary replaces entire history.

**Layer 3 — /compact (user triggered):** Manual LLM summarization with optional focus hints.

Vicnaum identified the gap: even after Layer 1 runs, tons of tool artifacts remain. Thinking blocks inside messages with tool_use survive all cleanup. So he built `/microcompact` and `/clear-thinking` — instant commands with no LLM calls that surgically strip these artifacts.

### Key Design Decisions

1. **Surgical over summarization** — Stripping artifacts is lossless for conversational content. LLM summarization always loses detail and costs an API call.

2. **User-controlled scope** — A picker UI lets users choose how far back to strip, rather than all-or-nothing. This preserves recent tool results that may still be relevant.

3. **Prompt caching consideration** — Stripping elements from the message array destroys prompt cache prefixes. This is a real tradeoff: you save context space but may increase costs on the next API call due to cache miss. However, when you're at 90%+ context, the alternative (full compaction) destroys the cache anyway.

### Anthropic's Server-Side Context Editing API

Anthropic has released a server-side [Context Editing API](https://docs.anthropic.com/en/docs/build-with-claude/context-management) (beta) that handles this at the API level. See #526 for integration details. The server-side approach preserves prompt cache (edits applied after cache lookup). Anthropic reports 29-39% performance improvement. That issue covers the Anthropic-specific server-side approach; this issue covers the universal client-side approach that works with ALL models.

---

## Current State in Hermes Agent

### What We Have

1. **`/compress` command** (cli.py, gateway/run.py) — LLM-based summarization. Protects first 3 + last 4 messages, summarizes middle turns using auxiliary model (Gemini Flash). Costs an API call, loses detail.

2. **Automatic compression** (run_agent.py) — Triggers at 85% context capacity. Same LLM summarization as `/compress`. Also triggers on 413 context-length errors.

3. **100K char hard cap** on tool results (run_agent.py L2606) — Only caps individual results at insertion time.

4. **Reasoning storage** — Reasoning/thinking text stored in `msg["reasoning"]` field. `reasoning_details` preserved for multi-turn continuity. When building API messages, `reasoning` is converted to `reasoning_content` for API compatibility.

### What's Missing (the Gap)

- **No selective stripping** — Can't remove specific message types (tool artifacts, thinking) without full summarization
- **No instant cleanup option** — Every cleanup path requires an LLM call or nuclear reset
- **No thinking block management** — Thinking blocks accumulate across turns with no cleanup mechanism

### Related Open Issues

- **#513** — Two-Phase Context Management (automatic tool output pruning before compaction). Complementary: #513 is automatic, this is user-initiated.
- **#499** — Context Compaction Quality Overhaul (better summarization prompts). Complementary: improves the LLM path, this adds a non-LLM path.
- **#415** — Insertion-Time Tool Result Trimming (head+tail trimming at insertion). Complementary: trims at insertion, this removes retroactively.
- **#526** — Anthropic Context Editing API (server-side, Claude-only). Complementary: server-side automatic approach vs this client-side manual approach.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **core codebase change**, not a skill or tool. Reasons:
- It modifies the conversation message array directly, requiring access to internal state (`conversation_history`, session transcripts)
- It needs integration with the CLI command system and gateway command dispatch
- It must coordinate with the existing compression system
- It's a fundamental context management capability, same layer as `/compress`

### The Command: `/microcompact`

A single command that strips BOTH tool artifacts AND thinking/reasoning blocks. One command, one action — no unnecessary complexity.

**Usage:**
```
/microcompact        # Strip all tool artifacts + thinking, keep last 3 turns intact
/microcompact 5      # Keep last 5 turns intact
/microcompact 0      # Strip everything (aggressive)
```

### What We'd Need

1. **`microcompact()` function** in `run_agent.py` or a new `context_stripper.py`:

```python
def microcompact(messages: list[dict], keep_last_n: int = 3) -> list[dict]:
    """Surgically strip tool call/result pairs and thinking blocks
    from all but the last N assistant turns. Preserves all actual
    user/assistant text content."""
    
    # Find assistant messages with tool_calls (these are the "turns" to count)
    tool_turns = [(i, m) for i, m in enumerate(messages)
                  if m.get("role") == "assistant" and m.get("tool_calls")]
    
    # Determine which turns to strip (all except last N)
    turns_to_strip = tool_turns[:-keep_last_n] if keep_last_n else tool_turns
    
    # Collect tool_call_ids to remove
    strip_call_ids = set()
    for idx, msg in turns_to_strip:
        for tc in msg.get("tool_calls", []):
            strip_call_ids.add(tc["id"])
        # Remove tool_calls from the assistant message
        del msg["tool_calls"]
        # If the message has no content left, mark for removal
    
    # Remove corresponding tool result messages
    messages = [m for m in messages
                if not (m.get("role") == "tool" and 
                        m.get("tool_call_id") in strip_call_ids)]
    
    # Remove empty assistant messages (had only tool_calls, no text content)
    messages = [m for m in messages
                if not (m.get("role") == "assistant" and 
                        not m.get("content", "").strip() and
                        not m.get("tool_calls"))]
    
    # Strip thinking/reasoning from all but last N assistant messages
    assistant_msgs = [m for m in messages if m.get("role") == "assistant"]
    for msg in assistant_msgs[:-keep_last_n] if keep_last_n else assistant_msgs:
        msg.pop("reasoning", None)
        msg.pop("reasoning_content", None)
        msg.pop("reasoning_details", None)
        msg.pop("codex_reasoning_items", None)
    
    return messages
```

2. **CLI command** in `cli.py`:
   - Register `/microcompact` in the COMMANDS dict
   - Handler: parse optional N argument, call `microcompact()`, report savings

3. **Gateway command** in `gateway/run.py`:
   - Add `microcompact` to known commands
   - Handler mirrors `/compress` pattern: load transcript, strip, rewrite

4. **Session transcript rewrite** — After stripping, rewrite using `rewrite_transcript()` (same as `/compress` uses)

### Phased Rollout

**Phase 1: The Command**
- Implement `microcompact()` stripping function
- Add `/microcompact [N]` to CLI and gateway
- Report before/after token estimates and message counts
- Default: keep last 3 turns intact

**Phase 2: Automatic Integration**
- Integrate with #513's two-phase approach — use `microcompact()` as Phase 1 pruner before LLM compaction
- Add config option for automatic microcompact (e.g., run before every `/compress`)
- Config: `microcompact.keep_last_n: 3` default

**Phase 3: Smart Defaults**
- Auto-run microcompact before LLM compaction kicks in (saves the LLM call when stripping alone frees enough space)
- Per-provider optimization: use Anthropic's server-side Context Editing API when available (#526)
- Token savings telemetry

---

## Technical Details

### Message Structure Reference

```python
# Assistant message with tool call + thinking
{"role": "assistant", "content": "", "reasoning": "...(thinking)...",
 "reasoning_details": [...],
 "tool_calls": [{"id": "call_abc", "function": {"name": "terminal", "arguments": "{...}"}}],
 "finish_reason": "tool_calls"}

# Tool result message  
{"role": "tool", "content": "...(potentially huge output)...", "tool_call_id": "call_abc"}

# Assistant message with actual text content
{"role": "assistant", "content": "Here's what I found...", "reasoning": "...",
 "finish_reason": "stop"}
```

### Orphan Prevention

When stripping tool_calls from an assistant message:
- If the message also has `content` text → keep the message, only remove `tool_calls`
- If the message has NO content (purely a tool-calling turn) → remove the entire message
- Always remove the corresponding `role: "tool"` result messages
- This prevents orphaned tool_call_id references

### What Gets Stripped vs Preserved

| Component | Stripped? | Notes |
|-----------|----------|-------|
| `tool_calls` on assistant msgs | ✅ Yes (except last N turns) | The tool invocation metadata |
| `role: "tool"` result msgs | ✅ Yes (matching stripped calls) | The heavy tool output content |
| `reasoning` field | ✅ Yes (except last N turns) | Thinking/reasoning text |
| `reasoning_details` | ✅ Yes (except last N turns) | Opaque provider reasoning state |
| `reasoning_content` | ✅ Yes (except last N turns) | API-format reasoning |
| User messages | ❌ Never | All user text preserved |
| Assistant `content` text | ❌ Never | All assistant text preserved |
| System messages | ❌ Never | System prompt untouched |

---

## Pros & Cons

### Pros
- **Instant and free** — No LLM call, no API cost, sub-second execution
- **Lossless for conversation** — Every actual user/assistant text message preserved
- **Massive space recovery** — Tool results are typically 60-80% of context. Thinking blocks 10-20%. Combined: 70-90% recovery.
- **Dead simple** — One command, one function, ~50 lines of core logic
- **Universal** — Works with any model/provider
- **Interpretable** — Single command name, obvious behavior, clear output

### Cons / Risks
- **Prompt cache invalidation** — Modifying the message array destroys cached prefixes. Same tradeoff as `/compress`.
- **Loss of tool context** — Model loses knowledge of old tool results. May re-run tools. Mitigated by keeping last N turns.
- **Reasoning continuity** — Stripping `reasoning_details` may break multi-turn reasoning chains on providers using opaque reasoning state. Mitigated by keeping last N turns' reasoning.

---

## Open Questions

1. **Default keep count** — Keep last 3 turns? 5? Claude Code defaults to 3.
2. **Should `/compress` auto-microcompact first?** — Before LLM summarization, strip artifacts for free. Relates to #513.
3. **Token counting** — Show exact counts or rough estimates (4 chars/token)?

---

## References

- [@vicnaum's Twitter thread](https://x.com/vicnaum/status/2029579972688379928) — Reverse-engineering Claude Code to add surgical context management
- [bun-demincer](https://github.com/vicnaum/bun-demincer) — Toolkit for reverse-engineering Bun-compiled binaries
- [Anthropic Context Editing API](https://docs.anthropic.com/en/docs/build-with-claude/context-management) — Server-side approach (see #526)
- Related issues: #513, #499, #415, #526


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: /microcompact — Instant, LLM-Free Surgical Context Stripping (inspired by vicnaum's Claude Code RE) #525

Overview

Research Findings

How Claude Code's Context Management Works

Key Design Decisions

Anthropic's Server-Side Context Editing API

Current State in Hermes Agent

What We Have

What's Missing (the Gap)

Related Open Issues

Implementation Plan

Skill vs. Tool Classification

The Command: `/microcompact`

What We'd Need

Phased Rollout

Technical Details

Message Structure Reference

Orphan Prevention

What Gets Stripped vs Preserved

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Stripped?	Notes
`tool_calls` on assistant msgs	✅ Yes (except last N turns)	The tool invocation metadata
`role: "tool"` result msgs	✅ Yes (matching stripped calls)	The heavy tool output content
`reasoning` field	✅ Yes (except last N turns)	Thinking/reasoning text
`reasoning_details`	✅ Yes (except last N turns)	Opaque provider reasoning state
`reasoning_content`	✅ Yes (except last N turns)	API-format reasoning
User messages	❌ Never	All user text preserved
Assistant `content` text	❌ Never	All assistant text preserved
System messages	❌ Never	System prompt untouched

Feature: /microcompact — Instant, LLM-Free Surgical Context Stripping (inspired by vicnaum's Claude Code RE) #525

Description

Overview

Research Findings

How Claude Code's Context Management Works

Key Design Decisions

Anthropic's Server-Side Context Editing API

Current State in Hermes Agent

What We Have

What's Missing (the Gap)

Related Open Issues

Implementation Plan

Skill vs. Tool Classification

The Command: /microcompact

What We'd Need

Phased Rollout

Technical Details

Message Structure Reference

Orphan Prevention

What Gets Stripped vs Preserved

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

The Command: `/microcompact`