Skip to content

Add optional Anthropic context editing support for Claude models#528

Closed
aydnOktay wants to merge 3 commits into
NousResearch:mainfrom
aydnOktay:feature/anthropic-context-editing
Closed

Add optional Anthropic context editing support for Claude models#528
aydnOktay wants to merge 3 commits into
NousResearch:mainfrom
aydnOktay:feature/anthropic-context-editing

Conversation

@aydnOktay

Copy link
Copy Markdown
Contributor

This PR implements a first-phase integration of Anthropic’s server-side Context Editing API for Claude models by conditionally attaching a context_management.edits block to the Messages request body and, for direct Anthropic endpoints, automatically opting into the beta via the anthropic-beta: context-management-2025-06-27 header. The feature is fully opt-in and controlled via a new context_editing section in the CLI config plus CONTEXT_EDITING_* environment variables, which configure two edits (clear_thinking_20251015 and clear_tool_uses_20250919) with conservative defaults derived from the model’s context window to clear old thinking turns and tool use/result pairs while preserving prompt cache prefixes. This keeps the default behavior unchanged for non-Anthropic models, gives Claude users a simple switch to enable cache-friendly automatic context cleanup, and directly addresses the design and requirements described in Issue #526

@aydnOktay

Copy link
Copy Markdown
Contributor Author

This issues done sir : #526

Check pls @teknium1

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for putting this together @aydnOktay — the implementation is solid and well-structured, and this is definitely a feature we want to support (per #526).

However, after reviewing the integration path, we've identified a fundamental issue: Hermes currently uses the OpenAI SDK (openai.OpenAI) for all providers, sending requests to /v1/chat/completions. The context_management parameter is specific to Anthropic's native Messages API (/v1/messages).

This means there's no reliable working path today:

  • OpenRouter (our primary path): Their docs list supported Anthropic beta headers, but context-management-2025-06-27 isn't among them. It's unclear whether context_management in extra_body would be forwarded to Anthropic's backend.
  • Direct Anthropic: Anthropic does have an OpenAI-compatible endpoint, but it's pretty niche and unlikely to support Anthropic-specific parameters like context_management.
  • LiteLLM: Does support context_management passthrough, but that's a niche deployment.

What we'd like to do: We're planning to add native Anthropic API support (using the anthropic SDK directly) to Hermes. Once that's in place, this feature would have a clean, reliable path — sending context_management directly via the Anthropic Messages API where it's natively supported.

We're going to leave this PR open. If you're interested in updating it with the assumption that it will target the native Anthropic client (once available), that would be great. Otherwise we'll circle back to it once that foundation is in place.

A few code-level notes for whenever this gets revisited:

  1. Bug: exclude_tools list serialization — str(["memory", "skill_manage", "todo"]) produces Python repr ("['memory', 'skill_manage', 'todo']"), but run_agent.py parses it with .split(","). Needs ,".join() instead.
  2. Missing config key: clear_at_least_tokens is read from env var but has no corresponding key in cli.py defaults.
  3. The unused instance attributes _is_openrouter / _is_nous_portal can be dropped (only _is_anthropic_model is used).

Thanks again for the contribution — the architecture and config design are good, it just needs the right transport layer underneath.

teknium1 added a commit that referenced this pull request Mar 13, 2026
Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.

Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
  build context_management edits in build_anthropic_kwargs() via extra_body;
  only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
  defaults (disabled, auto-scale triggers to 60%/10% of context window,
  keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)

Config (opt-in, add to config.yaml):
  context_editing:
    enabled: true
    trigger_tokens: null       # auto: 60% of context window
    keep_tool_uses: 5
    keep_thinking_turns: 2
    exclude_tools: [memory, skill_manage, todo]
    clear_tool_inputs: false
    clear_at_least_tokens: null  # auto: 10% of context window

Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()

Refs: #526, supersedes #528
@teknium1 teknium1 closed this Mar 13, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.

Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
  build context_management edits in build_anthropic_kwargs() via extra_body;
  only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
  defaults (disabled, auto-scale triggers to 60%/10% of context window,
  keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)

Config (opt-in, add to config.yaml):
  context_editing:
    enabled: true
    trigger_tokens: null       # auto: 60% of context window
    keep_tool_uses: 5
    keep_thinking_turns: 2
    exclude_tools: [memory, skill_manage, todo]
    clear_tool_inputs: false
    clear_at_least_tokens: null  # auto: 10% of context window

Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()

Refs: NousResearch#526, supersedes NousResearch#528
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.

Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
  build context_management edits in build_anthropic_kwargs() via extra_body;
  only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
  defaults (disabled, auto-scale triggers to 60%/10% of context window,
  keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)

Config (opt-in, add to config.yaml):
  context_editing:
    enabled: true
    trigger_tokens: null       # auto: 60% of context window
    keep_tool_uses: 5
    keep_thinking_turns: 2
    exclude_tools: [memory, skill_manage, todo]
    clear_tool_inputs: false
    clear_at_least_tokens: null  # auto: 10% of context window

Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()

Refs: NousResearch#526, supersedes NousResearch#528
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants