Skip to content

fix(minimax): enable Anthropic prompt caching for MiniMax's own models#17425

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-b07fb9a7
Apr 29, 2026
Merged

fix(minimax): enable Anthropic prompt caching for MiniMax's own models#17425
teknium1 merged 1 commit into
mainfrom
hermes/hermes-b07fb9a7

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

MiniMax on anthropic_messages transport now caches with the native layout — same cost reduction as Claude traffic through the same gateway.

Root cause

PR #12846 gated third-party Anthropic-wire caching on "claude" in model_lower. MiniMax serves its own model family (MiniMax-M2.7, M2.5, M2.1, M2) on api.minimax.io/anthropic — they don't match, fall through to (False, False), re-pay full input tokens every turn.

MiniMax documents cache_control support on this endpoint: https://platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache (0.1× read, 1.25× write, 5-min TTL).

Changes

  • run_agent.py — new branch in _anthropic_prompt_cache_policy() after the Claude gateway check: if transport is anthropic_messages AND provider is minimax/minimax-cn OR host matches api.minimax.io/api.minimaxi.com, return (True, True).
  • tests/run_agent/test_anthropic_prompt_cache_policy.py — flipped the old pinned-negative test for minimax-m2.7 (which was pinning the bug) into a renamed unknown-host test, added a new TestMiniMaxAnthropicWire class with 5 cases: both provider ids, both hosts, and a chat_completions-transport negative.

Validation

Before After
provider=minimax + minimax-m2.7 on anthropic_messages (False, False) (True, True)
provider=minimax-cn + minimax-m2.5 on anthropic_messages (False, False) (True, True)
custom provider, host api.minimax.io/anthropic (False, False) (True, True)
provider=minimax on chat_completions (not /anthropic) (False, False) (False, False)
Unknown host + non-Claude model on anthropic_messages (False, False) (False, False)

tests/run_agent/test_anthropic_prompt_cache_policy.py — 21 passed (16 existing + 5 new).

Scope

Narrow allowlist mirroring the existing Qwen/Alibaba branch. If a third provider needs this later, a capability flag on ProviderConfig becomes the right factoring — for now the surface is small enough to inline.

Closes #17332

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR #12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes #17332
@teknium1 teknium1 merged commit df0e97a into main Apr 29, 2026
11 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-b07fb9a7 branch April 29, 2026 11:56
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have provider/minimax MiniMax (Anthropic transport) comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 29, 2026
@nicoechaniz

Copy link
Copy Markdown
Contributor

Hi @teknium1 — thanks for merging the policy fix in #17425. We had an earlier PR (#17333) that covered the same caching policy plus a few UX defaults, and I wanted to follow up with the parts that weren't included here.

#18147 extracts just the remaining pieces:

  • now defaults / to + the correct base URL, so users don't have to configure both manually to get caching.
  • updated to suggest the endpoints.
  • 5 tests for the new defaults.

The policy logic from #17425 is left untouched. Hope this is useful!

donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
Fede654 pushed a commit to Fede654/hermes-agent that referenced this pull request May 3, 2026
…base_url

PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the
anthropic_messages transport, but users still had to manually configure
both api_mode and base_url to actually benefit from it.

This patch makes the defaults ergonomic:

- AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and
  defaults to api_mode=anthropic_messages + the correct /anthropic base_url
  (global or China endpoint respectively).
- .env.example suggests the /anthropic endpoints instead of /v1.
- Explicit base_url or api_mode are preserved when the user sets them.

Tests: 5 new cases covering both providers, explicit overrides, and
prompt-caching flags.

Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
nicoechaniz added a commit to nicoechaniz/hermes-agent that referenced this pull request May 24, 2026
…base_url

PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the
anthropic_messages transport, but users still had to manually configure
both api_mode and base_url to actually benefit from it.

This patch makes the defaults ergonomic:

- AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and
  defaults to api_mode=anthropic_messages + the correct /anthropic base_url
  (global or China endpoint respectively).
- .env.example suggests the /anthropic endpoints instead of /v1.
- Explicit base_url or api_mode are preserved when the user sets them.

Tests: 5 new cases covering both providers, explicit overrides, and
prompt-caching flags.

Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
nicoechaniz added a commit to nicoechaniz/hermes-agent that referenced this pull request Jun 1, 2026
…base_url

PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the
anthropic_messages transport, but users still had to manually configure
both api_mode and base_url to actually benefit from it.

This patch makes the defaults ergonomic:

- AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and
  defaults to api_mode=anthropic_messages + the correct /anthropic base_url
  (global or China endpoint respectively).
- .env.example suggests the /anthropic endpoints instead of /v1.
- Explicit base_url or api_mode are preserved when the user sets them.

Tests: 5 new cases covering both providers, explicit overrides, and
prompt-caching flags.

Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
Seven74AI pushed a commit to Seven74AI/hermes-agent that referenced this pull request Jun 13, 2026
NousResearch#17425)

MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes NousResearch#17332
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/minimax MiniMax (Anthropic transport) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Prompt caching silently disabled for MiniMax's own models (MiniMax-M2.7 etc.) on anthropic_messages transport

3 participants