fix(minimax): enable Anthropic prompt caching for MiniMax's own models#17425
Merged
Conversation
MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR #12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes #17332
Contributor
|
Hi @teknium1 — thanks for merging the policy fix in #17425. We had an earlier PR (#17333) that covered the same caching policy plus a few UX defaults, and I wanted to follow up with the parts that weren't included here. #18147 extracts just the remaining pieces:
The policy logic from #17425 is left untouched. Hope this is useful! |
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
Fede654
pushed a commit
to Fede654/hermes-agent
that referenced
this pull request
May 3, 2026
…base_url PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the anthropic_messages transport, but users still had to manually configure both api_mode and base_url to actually benefit from it. This patch makes the defaults ergonomic: - AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and defaults to api_mode=anthropic_messages + the correct /anthropic base_url (global or China endpoint respectively). - .env.example suggests the /anthropic endpoints instead of /v1. - Explicit base_url or api_mode are preserved when the user sets them. Tests: 5 new cases covering both providers, explicit overrides, and prompt-caching flags. Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
jsboige
pushed a commit
to jsboige/hermes-agent
that referenced
this pull request
May 14, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
nicoechaniz
added a commit
to nicoechaniz/hermes-agent
that referenced
this pull request
May 24, 2026
…base_url PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the anthropic_messages transport, but users still had to manually configure both api_mode and base_url to actually benefit from it. This patch makes the defaults ergonomic: - AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and defaults to api_mode=anthropic_messages + the correct /anthropic base_url (global or China endpoint respectively). - .env.example suggests the /anthropic endpoints instead of /v1. - Explicit base_url or api_mode are preserved when the user sets them. Tests: 5 new cases covering both providers, explicit overrides, and prompt-caching flags. Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
nicoechaniz
added a commit
to nicoechaniz/hermes-agent
that referenced
this pull request
Jun 1, 2026
…base_url PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the anthropic_messages transport, but users still had to manually configure both api_mode and base_url to actually benefit from it. This patch makes the defaults ergonomic: - AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and defaults to api_mode=anthropic_messages + the correct /anthropic base_url (global or China endpoint respectively). - .env.example suggests the /anthropic endpoints instead of /v1. - Explicit base_url or api_mode are preserved when the user sets them. Tests: 5 new cases covering both providers, explicit overrides, and prompt-caching flags. Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
Seven74AI
pushed a commit
to Seven74AI/hermes-agent
that referenced
this pull request
Jun 13, 2026
NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MiniMax on
anthropic_messagestransport now caches with the native layout — same cost reduction as Claude traffic through the same gateway.Root cause
PR #12846 gated third-party Anthropic-wire caching on
"claude" in model_lower. MiniMax serves its own model family (MiniMax-M2.7, M2.5, M2.1, M2) onapi.minimax.io/anthropic— they don't match, fall through to(False, False), re-pay full input tokens every turn.MiniMax documents
cache_controlsupport on this endpoint: https://platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache (0.1× read, 1.25× write, 5-min TTL).Changes
run_agent.py— new branch in_anthropic_prompt_cache_policy()after the Claude gateway check: if transport isanthropic_messagesAND provider isminimax/minimax-cnOR host matchesapi.minimax.io/api.minimaxi.com, return(True, True).tests/run_agent/test_anthropic_prompt_cache_policy.py— flipped the old pinned-negative test forminimax-m2.7(which was pinning the bug) into a renamed unknown-host test, added a newTestMiniMaxAnthropicWireclass with 5 cases: both provider ids, both hosts, and a chat_completions-transport negative.Validation
provider=minimax+minimax-m2.7on anthropic_messages(False, False)(True, True)provider=minimax-cn+minimax-m2.5on anthropic_messages(False, False)(True, True)api.minimax.io/anthropic(False, False)(True, True)provider=minimaxon chat_completions (not /anthropic)(False, False)(False, False)(False, False)(False, False)tests/run_agent/test_anthropic_prompt_cache_policy.py— 21 passed (16 existing + 5 new).Scope
Narrow allowlist mirroring the existing Qwen/Alibaba branch. If a third provider needs this later, a capability flag on
ProviderConfigbecomes the right factoring — for now the surface is small enough to inline.Closes #17332