You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anthropic has released a server-side Context Editing API (beta: anthropic-beta: context-management-2025-06-27) that automatically manages context by clearing old tool use/result pairs and thinking blocks at the API level, before token counting and after prompt cache lookup. This means context is managed without destroying prompt cache prefixes — a significant advantage over client-side stripping.
This was discovered while researching @vicnaum's reverse-engineering of Claude Code, which revealed that Claude Code already uses internal microcompact logic. Anthropic has now exposed similar functionality as a public API parameter. Anthropic reports 29% performance improvement with context editing alone, and 39% with context editing + memory tool over baseline.
Since Hermes Agent already supports Anthropic/Claude models (via OpenRouter and direct API), integrating this API would provide zero-cost, cache-friendly context management for Claude users with minimal implementation effort.
Research Findings
How the Context Editing API Works
The API adds a context_management parameter to the Messages API request body, containing an ordered list of edits:
Clears oldest tool_use/tool_result pairs when input tokens exceed a trigger threshold
Configurable parameters:
trigger — Input token count to activate (default ~100K)
keep — Number of recent tool pairs to preserve (default 3)
clear_at_least — Minimum tokens to clear per activation
exclude_tools — Tool names never cleared (e.g., important tools like memory, skill)
clear_tool_inputs — Also clear tool input params, not just results (default false)
Edit Type 2: clear_thinking_20251015
Manages extended thinking blocks
Default: keeps only last assistant turn's thinking
Config: keep: {"type": "thinking_turns", "value": N} or keep: "all" to maximize cache hits
Must be listed FIRST in the edits array
Key Advantages Over Client-Side Stripping
Prompt cache preservation — Edits are applied server-side AFTER cache lookup, so existing cached prefixes are reused. Client-side modifications to the message array invalidate the cache.
Zero implementation complexity — Just add parameters to the API request. No message manipulation logic needed.
Anthropic-optimized — Anthropic knows the exact tokenization, can make optimal clearing decisions.
Automatic — No user intervention needed. Works transparently on every API call.
Limitations
Claude-only — Only works with Anthropic's Messages API. Not available for OpenAI, Gemini, or other providers.
Beta status — Requires beta header. May change before GA.
OpenRouter compatibility unknown — Need to verify if OpenRouter passes through context_management and beta headers to Anthropic.
Less user control — Automatic only, no manual trigger. For manual control, see the companion client-side stripping issue.
Current State in Hermes Agent
API Call Path
Hermes builds API kwargs in _build_api_kwargs() (run_agent.py L2061-2149):
The extra_body mechanism already exists and is used for OpenRouter provider preferences and reasoning config. The context_management parameter could be injected here for Anthropic models.
Provider Detection
Hermes already detects OpenRouter ("openrouter" in self.base_url.lower()) and Nous Portal ("nousresearch" in self.base_url.lower()). It would need to also detect direct Anthropic API usage and determine model provider when going through OpenRouter (e.g., anthropic/claude-* model strings).
No Existing Support
No context_management parameter is currently sent
No Anthropic beta headers are set
No model-provider-specific API features beyond reasoning config
Implementation Plan
Skill vs. Tool Classification
This should be a core codebase change to run_agent.py (specifically _build_api_kwargs()). Reasons:
Requires modification of API request parameters at the lowest level
Needs provider detection logic already in the codebase
Must integrate with existing context management (coordinate with compression thresholds)
Not expressible as instructions + shell commands (skill) or a standalone callable (tool)
What We'd Need
Provider detection — Determine if the target model is Claude/Anthropic:
Direct Anthropic API: base_url contains anthropic
OpenRouter: model string starts with anthropic/
Config flag: explicit context_editing: true in model config
Context management injection — In _build_api_kwargs(), conditionally add context_management to extra_body for Anthropic models
Configuration — User-configurable parameters:
Enable/disable context editing
Tool use trigger threshold
Keep count for tool uses and thinking turns
Excluded tools list
Beta header — For direct Anthropic API access, pass the beta header. For OpenRouter, verify passthrough behavior.
Phased Rollout
Phase 1: Basic Integration
Add context_management to extra_body for Anthropic models
Default conservative settings: trigger at 60% of context window, keep last 5 tool uses, keep last 2 thinking turns
Config option to enable/disable: context_editing.enabled: true
Exclude critical tools by default: memory, skill_manage, todo
Auto-scale trigger threshold based on model context window
Coordinate with existing compression: if context editing is active, raise compression threshold (since context editing handles the first layer of cleanup)
Phase 3: Smart Defaults & Monitoring
Auto-detect Anthropic models and enable context editing by default
Log context editing activity (how many tokens cleared, how often it triggers)
Surface context editing stats in /usage command
Investigate OpenRouter passthrough behavior and document compatibility
Zero client complexity — Just add a dict to the API request. ~20 lines of code.
Proven at scale — Anthropic uses this internally in Claude Code. 29-39% performance improvement reported.
Automatic — No user action needed. Works transparently.
Composable — Works alongside client-side stripping and LLM compaction for defense-in-depth.
Cons / Risks
Anthropic-only — Does nothing for OpenAI, Gemini, open-source models. Client-side stripping (companion issue) covers those.
Beta API — May change. Need to handle gracefully if the server rejects the parameter.
OpenRouter uncertainty — OpenRouter may not pass through context_management or beta headers. Needs testing.
Reduced visibility — Server-side clearing is invisible to the client. The model may behave differently than expected because old tool results are silently cleared. Need to log when this happens.
Coordination complexity — Must coordinate with existing compression thresholds. If context editing clears 40K tokens server-side, the client's token counting won't reflect this, potentially causing premature compression triggers.
Open Questions
OpenRouter passthrough — Does OpenRouter forward context_management and Anthropic beta headers to the Anthropic backend? This is critical since most Hermes users go through OpenRouter.
Token counting sync — After server-side editing, the client's token estimate diverges from reality. How do we handle this? Could use the API response's usage field to recalibrate.
Coordination with compression — If context editing is active, should we increase the compression threshold? E.g., from 85% to 95%, since context editing provides a buffer?
Should this be opt-in or opt-out? — Conservative: opt-in. Aggressive: auto-enable for Anthropic models. Recommended: opt-in in Phase 1, auto-enable in Phase 3 after validation.
Other providers — Will OpenAI or Google offer similar server-side context editing APIs? Should we design the abstraction to be provider-agnostic from the start?
Overview
Anthropic has released a server-side Context Editing API (beta:
anthropic-beta: context-management-2025-06-27) that automatically manages context by clearing old tool use/result pairs and thinking blocks at the API level, before token counting and after prompt cache lookup. This means context is managed without destroying prompt cache prefixes — a significant advantage over client-side stripping.This was discovered while researching @vicnaum's reverse-engineering of Claude Code, which revealed that Claude Code already uses internal microcompact logic. Anthropic has now exposed similar functionality as a public API parameter. Anthropic reports 29% performance improvement with context editing alone, and 39% with context editing + memory tool over baseline.
Since Hermes Agent already supports Anthropic/Claude models (via OpenRouter and direct API), integrating this API would provide zero-cost, cache-friendly context management for Claude users with minimal implementation effort.
Research Findings
How the Context Editing API Works
The API adds a
context_managementparameter to the Messages API request body, containing an ordered list ofedits:{ "model": "claude-sonnet-4-20250514", "max_tokens": 8096, "context_management": { "edits": [ { "type": "clear_thinking_20251015", "keep": {"type": "thinking_turns", "value": 2} }, { "type": "clear_tool_uses_20250919", "trigger": {"type": "input_tokens", "value": 50000}, "keep": {"type": "tool_uses", "value": 5}, "clear_at_least": {"type": "input_tokens", "value": 5000}, "exclude_tools": ["web_search"] } ] }, "messages": [...] }Edit Type 1:
clear_tool_uses_20250919trigger— Input token count to activate (default ~100K)keep— Number of recent tool pairs to preserve (default 3)clear_at_least— Minimum tokens to clear per activationexclude_tools— Tool names never cleared (e.g., important tools like memory, skill)clear_tool_inputs— Also clear tool input params, not just results (default false)Edit Type 2:
clear_thinking_20251015keep: {"type": "thinking_turns", "value": N}orkeep: "all"to maximize cache hitsKey Advantages Over Client-Side Stripping
Limitations
context_managementand beta headers to Anthropic.Current State in Hermes Agent
API Call Path
Hermes builds API kwargs in
_build_api_kwargs()(run_agent.py L2061-2149):The
extra_bodymechanism already exists and is used for OpenRouter provider preferences and reasoning config. Thecontext_managementparameter could be injected here for Anthropic models.Provider Detection
Hermes already detects OpenRouter (
"openrouter" in self.base_url.lower()) and Nous Portal ("nousresearch" in self.base_url.lower()). It would need to also detect direct Anthropic API usage and determine model provider when going through OpenRouter (e.g.,anthropic/claude-*model strings).No Existing Support
context_managementparameter is currently sentImplementation Plan
Skill vs. Tool Classification
This should be a core codebase change to
run_agent.py(specifically_build_api_kwargs()). Reasons:What We'd Need
Provider detection — Determine if the target model is Claude/Anthropic:
anthropicanthropic/context_editing: truein model configContext management injection — In
_build_api_kwargs(), conditionally addcontext_managementtoextra_bodyfor Anthropic modelsConfiguration — User-configurable parameters:
Beta header — For direct Anthropic API access, pass the beta header. For OpenRouter, verify passthrough behavior.
Phased Rollout
Phase 1: Basic Integration
context_managementtoextra_bodyfor Anthropic modelscontext_editing.enabled: truememory,skill_manage,todoPhase 2: Configuration & Tuning
Phase 3: Smart Defaults & Monitoring
/usagecommandPros & Cons
Pros
Cons / Risks
context_managementor beta headers. Needs testing.Open Questions
context_managementand Anthropic beta headers to the Anthropic backend? This is critical since most Hermes users go through OpenRouter.usagefield to recalibrate.References