Feature: Anthropic Context Editing API Integration — Server-Side, Cache-Friendly Tool/Thinking Cleanup for Claude Models

## Overview

Anthropic has released a server-side [Context Editing API](https://docs.anthropic.com/en/docs/build-with-claude/context-management) (beta: `anthropic-beta: context-management-2025-06-27`) that automatically manages context by clearing old tool use/result pairs and thinking blocks **at the API level**, before token counting and after prompt cache lookup. This means context is managed without destroying prompt cache prefixes — a significant advantage over client-side stripping.

This was discovered while researching [@vicnaum's reverse-engineering of Claude Code](https://x.com/vicnaum/status/2029579972688379928), which revealed that Claude Code already uses internal microcompact logic. Anthropic has now exposed similar functionality as a public API parameter. Anthropic reports **29% performance improvement** with context editing alone, and **39% with context editing + memory tool** over baseline.

Since Hermes Agent already supports Anthropic/Claude models (via OpenRouter and direct API), integrating this API would provide zero-cost, cache-friendly context management for Claude users with minimal implementation effort.

---

## Research Findings

### How the Context Editing API Works

The API adds a `context_management` parameter to the Messages API request body, containing an ordered list of `edits`:

```json
{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 8096,
  "context_management": {
    "edits": [
      {
        "type": "clear_thinking_20251015",
        "keep": {"type": "thinking_turns", "value": 2}
      },
      {
        "type": "clear_tool_uses_20250919",
        "trigger": {"type": "input_tokens", "value": 50000},
        "keep": {"type": "tool_uses", "value": 5},
        "clear_at_least": {"type": "input_tokens", "value": 5000},
        "exclude_tools": ["web_search"]
      }
    ]
  },
  "messages": [...]
}
```

**Edit Type 1: `clear_tool_uses_20250919`**
- Clears oldest tool_use/tool_result pairs when input tokens exceed a trigger threshold
- Configurable parameters:
  - `trigger` — Input token count to activate (default ~100K)
  - `keep` — Number of recent tool pairs to preserve (default 3)
  - `clear_at_least` — Minimum tokens to clear per activation
  - `exclude_tools` — Tool names never cleared (e.g., important tools like memory, skill)
  - `clear_tool_inputs` — Also clear tool input params, not just results (default false)

**Edit Type 2: `clear_thinking_20251015`**
- Manages extended thinking blocks
- Default: keeps only last assistant turn's thinking
- Config: `keep: {"type": "thinking_turns", "value": N}` or `keep: "all"` to maximize cache hits
- **Must be listed FIRST in the edits array**

### Key Advantages Over Client-Side Stripping

1. **Prompt cache preservation** — Edits are applied server-side AFTER cache lookup, so existing cached prefixes are reused. Client-side modifications to the message array invalidate the cache.
2. **Zero implementation complexity** — Just add parameters to the API request. No message manipulation logic needed.
3. **Anthropic-optimized** — Anthropic knows the exact tokenization, can make optimal clearing decisions.
4. **Automatic** — No user intervention needed. Works transparently on every API call.

### Limitations

- **Claude-only** — Only works with Anthropic's Messages API. Not available for OpenAI, Gemini, or other providers.
- **Beta status** — Requires beta header. May change before GA.
- **OpenRouter compatibility unknown** — Need to verify if OpenRouter passes through `context_management` and beta headers to Anthropic.
- **Less user control** — Automatic only, no manual trigger. For manual control, see the companion client-side stripping issue.

---

## Current State in Hermes Agent

### API Call Path

Hermes builds API kwargs in `_build_api_kwargs()` (run_agent.py L2061-2149):

```python
api_kwargs = {
    "model": self.model,
    "messages": api_messages,
    "tools": self.tools if self.tools else None,
    "timeout": 900.0,
}
extra_body = {}
# ... provider preferences, reasoning config ...
if extra_body:
    api_kwargs["extra_body"] = extra_body
```

The `extra_body` mechanism already exists and is used for OpenRouter provider preferences and reasoning config. The `context_management` parameter could be injected here for Anthropic models.

### Provider Detection

Hermes already detects OpenRouter (`"openrouter" in self.base_url.lower()`) and Nous Portal (`"nousresearch" in self.base_url.lower()`). It would need to also detect direct Anthropic API usage and determine model provider when going through OpenRouter (e.g., `anthropic/claude-*` model strings).

### No Existing Support

- No `context_management` parameter is currently sent
- No Anthropic beta headers are set
- No model-provider-specific API features beyond reasoning config

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **core codebase change** to `run_agent.py` (specifically `_build_api_kwargs()`). Reasons:
- Requires modification of API request parameters at the lowest level
- Needs provider detection logic already in the codebase
- Must integrate with existing context management (coordinate with compression thresholds)
- Not expressible as instructions + shell commands (skill) or a standalone callable (tool)

### What We'd Need

1. **Provider detection** — Determine if the target model is Claude/Anthropic:
   - Direct Anthropic API: base_url contains `anthropic`
   - OpenRouter: model string starts with `anthropic/`
   - Config flag: explicit `context_editing: true` in model config

2. **Context management injection** — In `_build_api_kwargs()`, conditionally add `context_management` to `extra_body` for Anthropic models

3. **Configuration** — User-configurable parameters:
   - Enable/disable context editing
   - Tool use trigger threshold
   - Keep count for tool uses and thinking turns
   - Excluded tools list

4. **Beta header** — For direct Anthropic API access, pass the beta header. For OpenRouter, verify passthrough behavior.

### Phased Rollout

**Phase 1: Basic Integration**
- Add `context_management` to `extra_body` for Anthropic models
- Default conservative settings: trigger at 60% of context window, keep last 5 tool uses, keep last 2 thinking turns
- Config option to enable/disable: `context_editing.enabled: true`
- Exclude critical tools by default: `memory`, `skill_manage`, `todo`

**Phase 2: Configuration & Tuning**
- Expose all parameters in config.yaml:
  ```yaml
  context_editing:
    enabled: true
    trigger_tokens: 100000
    keep_tool_uses: 5
    keep_thinking_turns: 2
    exclude_tools: [memory, skill_manage, todo]
    clear_tool_inputs: false
  ```
- Auto-scale trigger threshold based on model context window
- Coordinate with existing compression: if context editing is active, raise compression threshold (since context editing handles the first layer of cleanup)

**Phase 3: Smart Defaults & Monitoring**
- Auto-detect Anthropic models and enable context editing by default
- Log context editing activity (how many tokens cleared, how often it triggers)
- Surface context editing stats in `/usage` command
- Investigate OpenRouter passthrough behavior and document compatibility

---

## Pros & Cons

### Pros
- **Prompt cache friendly** — The #1 advantage. Server-side editing preserves cached prefixes, reducing costs.
- **Zero client complexity** — Just add a dict to the API request. ~20 lines of code.
- **Proven at scale** — Anthropic uses this internally in Claude Code. 29-39% performance improvement reported.
- **Automatic** — No user action needed. Works transparently.
- **Composable** — Works alongside client-side stripping and LLM compaction for defense-in-depth.

### Cons / Risks
- **Anthropic-only** — Does nothing for OpenAI, Gemini, open-source models. Client-side stripping (companion issue) covers those.
- **Beta API** — May change. Need to handle gracefully if the server rejects the parameter.
- **OpenRouter uncertainty** — OpenRouter may not pass through `context_management` or beta headers. Needs testing.
- **Reduced visibility** — Server-side clearing is invisible to the client. The model may behave differently than expected because old tool results are silently cleared. Need to log when this happens.
- **Coordination complexity** — Must coordinate with existing compression thresholds. If context editing clears 40K tokens server-side, the client's token counting won't reflect this, potentially causing premature compression triggers.

---

## Open Questions

1. **OpenRouter passthrough** — Does OpenRouter forward `context_management` and Anthropic beta headers to the Anthropic backend? This is critical since most Hermes users go through OpenRouter.
2. **Token counting sync** — After server-side editing, the client's token estimate diverges from reality. How do we handle this? Could use the API response's `usage` field to recalibrate.
3. **Coordination with compression** — If context editing is active, should we increase the compression threshold? E.g., from 85% to 95%, since context editing provides a buffer?
4. **Should this be opt-in or opt-out?** — Conservative: opt-in. Aggressive: auto-enable for Anthropic models. Recommended: opt-in in Phase 1, auto-enable in Phase 3 after validation.
5. **Other providers** — Will OpenAI or Google offer similar server-side context editing APIs? Should we design the abstraction to be provider-agnostic from the start?

---

## References

- [Anthropic Context Editing API docs](https://docs.anthropic.com/en/docs/build-with-claude/context-management) — Official documentation
- [@vicnaum's Twitter thread](https://x.com/vicnaum/status/2029579972688379928) — Reverse-engineering that revealed Claude Code's internal microcompact system
- [Anthropic beta headers](https://docs.anthropic.com/en/docs/about-claude/models#model-features-beta) — Beta feature activation
- Companion issue: Surgical Context Stripping Commands (client-side, model-agnostic approach)
- Related issues: #513 (Two-Phase Context Management), #499 (Context Compaction Quality), #415 (Insertion-Time Tool Result Trimming)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Anthropic Context Editing API Integration — Server-Side, Cache-Friendly Tool/Thinking Cleanup for Claude Models #526

Overview

Research Findings

How the Context Editing API Works

Key Advantages Over Client-Side Stripping

Limitations

Current State in Hermes Agent

API Call Path

Provider Detection

No Existing Support

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Anthropic Context Editing API Integration — Server-Side, Cache-Friendly Tool/Thinking Cleanup for Claude Models #526

Description

Overview

Research Findings

How the Context Editing API Works

Key Advantages Over Client-Side Stripping

Limitations

Current State in Hermes Agent

API Call Path

Provider Detection

No Existing Support

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions