Skip to content

fix(agent): enable prompt caching for MiniMax own models on anthropic_messages transport#17333

Closed
nicoechaniz wants to merge 3 commits into
NousResearch:mainfrom
nicoechaniz:feat/minimax-prompt-caching
Closed

fix(agent): enable prompt caching for MiniMax own models on anthropic_messages transport#17333
nicoechaniz wants to merge 3 commits into
NousResearch:mainfrom
nicoechaniz:feat/minimax-prompt-caching

Conversation

@nicoechaniz

Copy link
Copy Markdown
Contributor

Closes #17332

PR #12846 enabled Anthropic prompt caching for third-party gateways,
but gated it on is_claude, which excluded providers like MiniMax
that serve their own model families (MiniMax-M2.7, etc.) through the
native Anthropic protocol.

MiniMax documents full cache_control support on its
/anthropic endpoints (global and China). This patch adds MiniMax
detection to _anthropic_prompt_cache_policy() using:

  • Built-in provider id (minimax, minimax-cn), or
  • Known Anthropic-compatible hostname (api.minimax.io,
    api.minimaxi.com)

Both paths receive the native cache_control layout, consistent with
how Hermes already treats native Anthropic and Claude-on-third-party
gateways.

Tests: 3 new cases, 19 total in the policy suite.

Refs: #8294 (related, but only covered Claude-named models on
third-party gateways).

…_messages transport

PR NousResearch#12846 enabled Anthropic prompt caching for third-party gateways,
but gated it on is_claude, which excluded providers like MiniMax
that serve their own model families (MiniMax-M2.7, etc.) through the
native Anthropic protocol.

MiniMax documents full cache_control support on its /anthropic
endpoints (global and China). This patch adds MiniMax detection to
_anthropic_prompt_cache_policy() using:

- Built-in provider id (minimax, minimax-cn), or
- Known Anthropic-compatible hostname (api.minimax.io,
  api.minimaxi.com)

Both paths receive the native cache_control layout.

Refs: NousResearch#8294 (related, but only covered Claude-named models on
third-party gateways).
Closes NousResearch#17332
AIAgent.__init__ now detects provider=minimax/minimax-cn and defaults to:
- api_mode='anthropic_messages' (was 'chat_completions')
- base_url='https://api.minimax.io/anthropic' or 'https://api.minimaxi.com/anthropic'

This ensures prompt caching (and all other Anthropic-protocol features)
work out of the box for AIAgent users, not just CLI users.

Previously, AIAgent(provider='minimax') fell through to chat_completions
because base_url was empty and there was no provider-name detection for
MiniMax in the api_mode resolution logic. The CLI already resolved this
correctly via runtime_provider.py; this change mirrors that behaviour in
the low-level agent constructor.

Tests added:
- 5 new tests in test_minimax_provider.py covering defaults, cn variant,
  explicit base_url preservation, explicit api_mode override, and
  prompt caching enabled by default.
- 2 new tests in test_anthropic_prompt_cache_policy.py covering empty
  base_url with provider=minimax/minimax-cn.
@nicoechaniz nicoechaniz force-pushed the feat/minimax-prompt-caching branch from c481721 to 412d505 Compare April 29, 2026 08:38
…hing

The template was suggesting https://api.minimax.io/v1 as the override
example, which silently disables prompt caching since MiniMax only
supports caching on the /anthropic endpoint. Update both global and
China examples to /anthropic and add a comment explaining why.

Refs: NousResearch#17332
@nicoechaniz

Copy link
Copy Markdown
Contributor Author

Update: Also fixed — the template was suggesting , which silently disables prompt caching because MiniMax only supports caching on the endpoint. Updated both global and China examples to with an explanatory comment.

Commit:

@nicoechaniz nicoechaniz closed this May 1, 2026
@nicoechaniz nicoechaniz deleted the feat/minimax-prompt-caching branch May 1, 2026 01:40
Sanjays2402 added a commit to Sanjays2402/hermes-agent that referenced this pull request May 2, 2026
…ousResearch#17332)

Pre-fix, the gate in _anthropic_prompt_cache_policy enabled caching on
api_mode=anthropic_messages only when the model name contained 'claude'.
That worked for third-party gateways serving Claude (LiteLLM proxy,
Zhipu GLM Anthropic-compat) but silently disabled caching for providers
that use the native Anthropic protocol for their *own* model families
(MiniMax-M2.x). PR NousResearch#17333 fixes the symptom by hardcoding 'minimax' /
'minimax-cn' / two specific hostnames.

The issue's 'Scope beyond MiniMax' section explicitly asks for a
'capability-based or provider-allowlist approach' that's future-proof.
This PR delivers exactly that:

1. ProviderConfig.extra carries an opt-in flag
   {anthropic_cache: True, anthropic_cache_hosts: (\u2026)}
   for built-in providers that document cache_control support for
   their own models. Currently set on 'minimax' and 'minimax-cn'.
   Future entrants flip the flag in one place.

2. agent/anthropic_cache_capability.py exposes
   provider_supports_anthropic_cache(provider, base_url,
   user_configured_hosts=\u2026)
   which resolves in this order:
     a. ProviderConfig.extra['anthropic_cache'] for the provider id
        (case-insensitive).
     b. Hostname match against the union of registry-declared hosts
        and operator-configured hosts \u2014 catches provider=='custom'
        setups pointing at MiniMax's URL.

3. New user config knob agent.anthropic_cache_hosts: [\u2026]. Operators
   can opt-in their own private gateway without waiting for an
   upstream patch. Read once in __init__ into
   self._anthropic_cache_user_hosts_cached, threaded through the
   policy lookup. Malformed values are ignored \u2014 a typo never
   *disables* caching, only fails to *enable* it.

4. _anthropic_prompt_cache_policy gains a new branch that fires only
   when api_mode=='anthropic_messages' and the helper returns True.
   Pre-existing native Anthropic / OpenRouter / Claude-on-third-party
   branches are unchanged, so this is purely additive for the
   non-Claude case.

Why this is better than NousResearch#17333
- One brand-name special-case per provider isn't going to age well.
  Capability registry + config opt-in handle MiniMax now and the
  next entrant for free.
- Operator escape hatch (anthropic_cache_hosts) lets users running
  private LiteLLM / vLLM-anthropic deployments enable caching today
  without forking the repo.
- Eliminates the pre-existing 'Claude substring matches non-claude'
  gotcha by encouraging callers to test against curated host/provider
  lists, not free-text model names.
- Single source of truth: built-in MiniMax support reads from the same
  ProviderConfig that already carries auth/base_url \u2014 no parallel
  list to keep in sync.

Tests
- TestCapabilityRegistryAnthropicCache (9 cases): MiniMax via provider
  id, MiniMax via host (custom config), MiniMax-CN both ways, user-
  configured opt-in, case/whitespace normalization, transport gate,
  unlisted gateway stays off, case-insensitive provider id.
- TestCapabilityHelperUnit (5 cases): direct tests of the helper.
- Existing test test_third_party_without_claude_name_does_not_cache
  was using api.minimax.io as the 'unknown' host \u2014 updated to use a
  truly unknown host to reflect the new behavior, which is exactly
  what the issue calls for.

863 passed, 7 skipped on tests/run_agent/ (full sweep, no regressions).
30 passed on the policy file (15 pre-existing + 14 new + 1 updated).
Fede654 pushed a commit to Fede654/hermes-agent that referenced this pull request May 3, 2026
…base_url

PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the
anthropic_messages transport, but users still had to manually configure
both api_mode and base_url to actually benefit from it.

This patch makes the defaults ergonomic:

- AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and
  defaults to api_mode=anthropic_messages + the correct /anthropic base_url
  (global or China endpoint respectively).
- .env.example suggests the /anthropic endpoints instead of /v1.
- Explicit base_url or api_mode are preserved when the user sets them.

Tests: 5 new cases covering both providers, explicit overrides, and
prompt-caching flags.

Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
nicoechaniz added a commit to nicoechaniz/hermes-agent that referenced this pull request May 24, 2026
…base_url

PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the
anthropic_messages transport, but users still had to manually configure
both api_mode and base_url to actually benefit from it.

This patch makes the defaults ergonomic:

- AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and
  defaults to api_mode=anthropic_messages + the correct /anthropic base_url
  (global or China endpoint respectively).
- .env.example suggests the /anthropic endpoints instead of /v1.
- Explicit base_url or api_mode are preserved when the user sets them.

Tests: 5 new cases covering both providers, explicit overrides, and
prompt-caching flags.

Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
nicoechaniz added a commit to nicoechaniz/hermes-agent that referenced this pull request Jun 1, 2026
…base_url

PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the
anthropic_messages transport, but users still had to manually configure
both api_mode and base_url to actually benefit from it.

This patch makes the defaults ergonomic:

- AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and
  defaults to api_mode=anthropic_messages + the correct /anthropic base_url
  (global or China endpoint respectively).
- .env.example suggests the /anthropic endpoints instead of /v1.
- Explicit base_url or api_mode are preserved when the user sets them.

Tests: 5 new cases covering both providers, explicit overrides, and
prompt-caching flags.

Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/minimax MiniMax (Anthropic transport) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Prompt caching silently disabled for MiniMax's own models (MiniMax-M2.7 etc.) on anthropic_messages transport

2 participants