fix(agent): enable prompt caching on Anthropic-compatible endpoints by sgaofen · Pull Request #8636 · NousResearch/hermes-agent

sgaofen · 2026-04-12T22:24:08Z

Summary

make Anthropic prompt caching transport-aware instead of relying on provider-name or /anthropic URL heuristics alone
enable cache-control markers for custom api_mode: anthropic_messages providers that expose Claude models on plain /v1/messages endpoints
keep tool-level cache layout native-Anthropic-only so third-party proxies are not over-classified
add regression coverage for init, switch_model, and the non-native cache-layout guard

Root Cause

Prompt caching was still tied to provider-name and URL-shape heuristics. That missed custom Anthropic-compatible providers using api_mode: anthropic_messages when their endpoint did not contain /anthropic, and it also conflated “eligible for cache markers” with “native Anthropic tool-result cache layout.”

Closes #8294.

Validation

python3 -m py_compile run_agent.py tests/run_agent/test_run_agent.py tests/run_agent/test_switch_model_context.py tests/run_agent/test_primary_runtime_restore.py
uv run --extra dev pytest tests/run_agent/test_run_agent.py tests/run_agent/test_switch_model_context.py tests/run_agent/test_primary_runtime_restore.py -q

Platform Tested

macOS 15.x (Apple Silicon)

Contribution Guide Notes

Reviewed CONTRIBUTING.md and checked for existing open PRs before submitting this scoped change.
Ran the targeted verification commands listed above for this PR. I have not claimed a full repo-wide pytest tests/ -q pass unless explicitly noted.

wuweiplus · 2026-04-13T09:55:57Z

@sgaofen

I noticed that the function used to determine Anthropic compatibility relies on hardcoded providers and specific URL paths:

https://github.com/NousResearch/hermes-agent/blame/397eae5d93dc549350e1bc9eb1368b3b7510df5d/agent/auxiliary_client.py#L2124

def _is_anthropic_compat_endpoint(provider: str, base_url: str) -> bool:
    """Detect if an endpoint expects Anthropic-format content blocks.
    Returns True for known Anthropic-compatible providers (MiniMax) and
    any endpoint whose URL contains ``/anthropic`` in the path.
    """
    if provider in _ANTHROPIC_COMPAT_PROVIDERS:
        return True
    url_lower = (base_url or "").lower()
    return "/anthropic" in url_lower

Since this logic is somewhat restrictive, how can we support custom providers (like self-hosted LiteLLM deployment) that doesn't include /anthropic in the URL?
For example, a standard endpoint might just look like this: https://api.example.com/v1/messages

sgaofen · 2026-04-17T19:13:09Z

Follow-up pushed for the custom-provider case mentioned above. The PR now gates prompt-caching eligibility off instead of requiring in the URL, while still keeping the native-Anthropic cache-layout handling separate. I also added targeted regression coverage for init-time activation, , and the non-native cache-layout guard.

sgaofen · 2026-04-17T19:13:15Z

Follow-up pushed for the custom-provider case mentioned above. The PR now gates prompt-caching eligibility off api_mode: anthropic_messages instead of requiring /anthropic in the URL, while still keeping the native-Anthropic cache-layout handling separate. I also added targeted regression coverage for init-time activation, switch_model(...), and the non-native cache-layout guard.

@nocoo

…pport Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

@nocoo

…pport (#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

teknium1 · 2026-04-20T05:43:49Z

Superseded by PR #12846 (merged as commit 65a31ee on main).

Your contribution: Prompt caching helper extraction + native-vs-proxy layout split — simplified to one helper returning (should_cache, use_native_layout) instead of two separate methods.

Your authorship is preserved via a Co-authored-by: trailer on the merge commit so the AUTHOR_MAP attribution script picks you up. Thanks for raising this!

#12846

@nocoo

…pport (NousResearch#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): NousResearch#7410 @nocoo — URL detection helper NousResearch#7393 @keyuyuan — OAuth 5-site guard NousResearch#7367 @n-WN — OAuth guard (narrower cousin, kept comment) NousResearch#8636 @sgaofen — caching helper + native-vs-proxy layout split NousResearch#10954 @Only-Code-A — caching on anthropic_messages+Claude NousResearch#7648 @zhongyueming1121 — aux client anthropic_messages branch NousResearch#6096 @hansnow — /model switch clears stale api_mode NousResearch#9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: NousResearch#7366, NousResearch#8294 (third-party Anthropic identity + caching). Supersedes: NousResearch#7410, NousResearch#7367, NousResearch#7393, NousResearch#8636, NousResearch#10954, NousResearch#7648, NousResearch#6096, NousResearch#9691. Rejects: NousResearch#9621 (OpenAI-wire caching with incomplete blocklist — risky), NousResearch#7242 (superseded by NousResearch#9691, stale branch), NousResearch#8321 (targets smart_model_routing which was removed in NousResearch#12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

@nocoo

…pport (NousResearch#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): NousResearch#7410 @nocoo — URL detection helper NousResearch#7393 @keyuyuan — OAuth 5-site guard NousResearch#7367 @n-WN — OAuth guard (narrower cousin, kept comment) NousResearch#8636 @sgaofen — caching helper + native-vs-proxy layout split NousResearch#10954 @Only-Code-A — caching on anthropic_messages+Claude NousResearch#7648 @zhongyueming1121 — aux client anthropic_messages branch NousResearch#6096 @hansnow — /model switch clears stale api_mode NousResearch#9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: NousResearch#7366, NousResearch#8294 (third-party Anthropic identity + caching). Supersedes: NousResearch#7410, NousResearch#7367, NousResearch#7393, NousResearch#8636, NousResearch#10954, NousResearch#7648, NousResearch#6096, NousResearch#9691. Rejects: NousResearch#9621 (OpenAI-wire caching with incomplete blocklist — risky), NousResearch#7242 (superseded by NousResearch#9691, stale branch), NousResearch#8321 (targets smart_model_routing which was removed in NousResearch#12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

@nocoo

…pport (NousResearch#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): NousResearch#7410 @nocoo — URL detection helper NousResearch#7393 @keyuyuan — OAuth 5-site guard NousResearch#7367 @n-WN — OAuth guard (narrower cousin, kept comment) NousResearch#8636 @sgaofen — caching helper + native-vs-proxy layout split NousResearch#10954 @Only-Code-A — caching on anthropic_messages+Claude NousResearch#7648 @zhongyueming1121 — aux client anthropic_messages branch NousResearch#6096 @hansnow — /model switch clears stale api_mode NousResearch#9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: NousResearch#7366, NousResearch#8294 (third-party Anthropic identity + caching). Supersedes: NousResearch#7410, NousResearch#7367, NousResearch#7393, NousResearch#8636, NousResearch#10954, NousResearch#7648, NousResearch#6096, NousResearch#9691. Rejects: NousResearch#9621 (OpenAI-wire caching with incomplete blocklist — risky), NousResearch#7242 (superseded by NousResearch#9691, stale branch), NousResearch#8321 (targets smart_model_routing which was removed in NousResearch#12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

@nocoo

…pport (NousResearch#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): NousResearch#7410 @nocoo — URL detection helper NousResearch#7393 @keyuyuan — OAuth 5-site guard NousResearch#7367 @n-WN — OAuth guard (narrower cousin, kept comment) NousResearch#8636 @sgaofen — caching helper + native-vs-proxy layout split NousResearch#10954 @Only-Code-A — caching on anthropic_messages+Claude NousResearch#7648 @zhongyueming1121 — aux client anthropic_messages branch NousResearch#6096 @hansnow — /model switch clears stale api_mode NousResearch#9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: NousResearch#7366, NousResearch#8294 (third-party Anthropic identity + caching). Supersedes: NousResearch#7410, NousResearch#7367, NousResearch#7393, NousResearch#8636, NousResearch#10954, NousResearch#7648, NousResearch#6096, NousResearch#9691. Rejects: NousResearch#9621 (OpenAI-wire caching with incomplete blocklist — risky), NousResearch#7242 (superseded by NousResearch#9691, stale branch), NousResearch#8321 (targets smart_model_routing which was removed in NousResearch#12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

@nocoo

…pport (NousResearch#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): NousResearch#7410 @nocoo — URL detection helper NousResearch#7393 @keyuyuan — OAuth 5-site guard NousResearch#7367 @n-WN — OAuth guard (narrower cousin, kept comment) NousResearch#8636 @sgaofen — caching helper + native-vs-proxy layout split NousResearch#10954 @Only-Code-A — caching on anthropic_messages+Claude NousResearch#7648 @zhongyueming1121 — aux client anthropic_messages branch NousResearch#6096 @hansnow — /model switch clears stale api_mode NousResearch#9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: NousResearch#7366, NousResearch#8294 (third-party Anthropic identity + caching). Supersedes: NousResearch#7410, NousResearch#7367, NousResearch#7393, NousResearch#8636, NousResearch#10954, NousResearch#7648, NousResearch#6096, NousResearch#9691. Rejects: NousResearch#9621 (OpenAI-wire caching with incomplete blocklist — risky), NousResearch#7242 (superseded by NousResearch#9691, stale branch), NousResearch#8321 (targets smart_model_routing which was removed in NousResearch#12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

@nocoo

…pport (NousResearch#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): NousResearch#7410 @nocoo — URL detection helper NousResearch#7393 @keyuyuan — OAuth 5-site guard NousResearch#7367 @n-WN — OAuth guard (narrower cousin, kept comment) NousResearch#8636 @sgaofen — caching helper + native-vs-proxy layout split NousResearch#10954 @Only-Code-A — caching on anthropic_messages+Claude NousResearch#7648 @zhongyueming1121 — aux client anthropic_messages branch NousResearch#6096 @hansnow — /model switch clears stale api_mode NousResearch#9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: NousResearch#7366, NousResearch#8294 (third-party Anthropic identity + caching). Supersedes: NousResearch#7410, NousResearch#7367, NousResearch#7393, NousResearch#8636, NousResearch#10954, NousResearch#7648, NousResearch#6096, NousResearch#9691. Rejects: NousResearch#9621 (OpenAI-wire caching with incomplete blocklist — risky), NousResearch#7242 (superseded by NousResearch#9691, stale branch), NousResearch#8321 (targets smart_model_routing which was removed in NousResearch#12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>

fix(run_agent): enable anthropic cache on compat endpoints

777f9ca

sgaofen changed the title ~~[codex] enable prompt caching on Anthropic-compatible endpoints~~ fix(agent): enable prompt caching on Anthropic-compatible endpoints Apr 12, 2026

sgaofen marked this pull request as ready for review April 13, 2026 00:53

fix(agent): support custom anthropic_messages prompt caching

03c3547

teknium1 mentioned this pull request Apr 20, 2026

fix(anthropic): complete third-party Anthropic-compatible provider support #12846

Merged

teknium1 closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): enable prompt caching on Anthropic-compatible endpoints#8636

fix(agent): enable prompt caching on Anthropic-compatible endpoints#8636
sgaofen wants to merge 2 commits into
NousResearch:mainfrom
sgaofen:codex/anthropic-cache-compat

sgaofen commented Apr 12, 2026 •

edited

Loading

Uh oh!

wuweiplus commented Apr 13, 2026

Uh oh!

sgaofen commented Apr 17, 2026

Uh oh!

sgaofen commented Apr 17, 2026

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sgaofen commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Validation

Platform Tested

Contribution Guide Notes

Uh oh!

wuweiplus commented Apr 13, 2026

Uh oh!

sgaofen commented Apr 17, 2026

Uh oh!

sgaofen commented Apr 17, 2026

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sgaofen commented Apr 12, 2026 •

edited

Loading