Add optional Anthropic context editing support for Claude models#528
Add optional Anthropic context editing support for Claude models#528aydnOktay wants to merge 3 commits into
Conversation
|
Thanks for putting this together @aydnOktay — the implementation is solid and well-structured, and this is definitely a feature we want to support (per #526). However, after reviewing the integration path, we've identified a fundamental issue: Hermes currently uses the OpenAI SDK ( This means there's no reliable working path today:
What we'd like to do: We're planning to add native Anthropic API support (using the We're going to leave this PR open. If you're interested in updating it with the assumption that it will target the native Anthropic client (once available), that would be great. Otherwise we'll circle back to it once that foundation is in place. A few code-level notes for whenever this gets revisited:
Thanks again for the contribution — the architecture and config design are good, it just needs the right transport layer underneath. |
Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.
Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
build context_management edits in build_anthropic_kwargs() via extra_body;
only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
defaults (disabled, auto-scale triggers to 60%/10% of context window,
keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)
Config (opt-in, add to config.yaml):
context_editing:
enabled: true
trigger_tokens: null # auto: 60% of context window
keep_tool_uses: 5
keep_thinking_turns: 2
exclude_tools: [memory, skill_manage, todo]
clear_tool_inputs: false
clear_at_least_tokens: null # auto: 10% of context window
Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()
Refs: #526, supersedes #528
Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.
Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
build context_management edits in build_anthropic_kwargs() via extra_body;
only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
defaults (disabled, auto-scale triggers to 60%/10% of context window,
keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)
Config (opt-in, add to config.yaml):
context_editing:
enabled: true
trigger_tokens: null # auto: 60% of context window
keep_tool_uses: 5
keep_thinking_turns: 2
exclude_tools: [memory, skill_manage, todo]
clear_tool_inputs: false
clear_at_least_tokens: null # auto: 10% of context window
Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()
Refs: NousResearch#526, supersedes NousResearch#528
Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.
Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
build context_management edits in build_anthropic_kwargs() via extra_body;
only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
defaults (disabled, auto-scale triggers to 60%/10% of context window,
keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)
Config (opt-in, add to config.yaml):
context_editing:
enabled: true
trigger_tokens: null # auto: 60% of context window
keep_tool_uses: 5
keep_thinking_turns: 2
exclude_tools: [memory, skill_manage, todo]
clear_tool_inputs: false
clear_at_least_tokens: null # auto: 10% of context window
Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()
Refs: NousResearch#526, supersedes NousResearch#528
This PR implements a first-phase integration of Anthropic’s server-side Context Editing API for Claude models by conditionally attaching a context_management.edits block to the Messages request body and, for direct Anthropic endpoints, automatically opting into the beta via the anthropic-beta: context-management-2025-06-27 header. The feature is fully opt-in and controlled via a new context_editing section in the CLI config plus CONTEXT_EDITING_* environment variables, which configure two edits (clear_thinking_20251015 and clear_tool_uses_20250919) with conservative defaults derived from the model’s context window to clear old thinking turns and tool use/result pairs while preserving prompt cache prefixes. This keeps the default behavior unchanged for non-Anthropic models, gives Claude users a simple switch to enable cache-friendly automatic context cleanup, and directly addresses the design and requirements described in Issue #526