fix(minimax): enable Anthropic prompt caching for MiniMax's own models by teknium1 · Pull Request #17425 · NousResearch/hermes-agent

teknium1 · 2026-04-29T11:54:08Z

MiniMax on anthropic_messages transport now caches with the native layout — same cost reduction as Claude traffic through the same gateway.

Root cause

PR #12846 gated third-party Anthropic-wire caching on "claude" in model_lower. MiniMax serves its own model family (MiniMax-M2.7, M2.5, M2.1, M2) on api.minimax.io/anthropic — they don't match, fall through to (False, False), re-pay full input tokens every turn.

MiniMax documents cache_control support on this endpoint: https://platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache (0.1× read, 1.25× write, 5-min TTL).

Changes

run_agent.py — new branch in _anthropic_prompt_cache_policy() after the Claude gateway check: if transport is anthropic_messages AND provider is minimax/minimax-cn OR host matches api.minimax.io/api.minimaxi.com, return (True, True).
tests/run_agent/test_anthropic_prompt_cache_policy.py — flipped the old pinned-negative test for minimax-m2.7 (which was pinning the bug) into a renamed unknown-host test, added a new TestMiniMaxAnthropicWire class with 5 cases: both provider ids, both hosts, and a chat_completions-transport negative.

Validation

	Before	After
`provider=minimax` + `minimax-m2.7` on anthropic_messages	`(False, False)`	`(True, True)`
`provider=minimax-cn` + `minimax-m2.5` on anthropic_messages	`(False, False)`	`(True, True)`
custom provider, host `api.minimax.io/anthropic`	`(False, False)`	`(True, True)`
`provider=minimax` on chat_completions (not /anthropic)	`(False, False)`	`(False, False)`
Unknown host + non-Claude model on anthropic_messages	`(False, False)`	`(False, False)`

tests/run_agent/test_anthropic_prompt_cache_policy.py — 21 passed (16 existing + 5 new).

Scope

Narrow allowlist mirroring the existing Qwen/Alibaba branch. If a third provider needs this later, a capability flag on ProviderConfig becomes the right factoring — for now the surface is small enough to inline.

Closes #17332

MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR #12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes #17332

nicoechaniz · 2026-05-01T01:25:02Z

Hi @teknium1 — thanks for merging the policy fix in #17425. We had an earlier PR (#17333) that covered the same caching policy plus a few UX defaults, and I wanted to follow up with the parts that weren't included here.

#18147 extracts just the remaining pieces:

now defaults / to + the correct base URL, so users don't have to configure both manually to get caching.
updated to suggest the endpoints.
5 tests for the new defaults.

The policy logic from #17425 is left untouched. Hope this is useful!

NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332

…base_url PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the anthropic_messages transport, but users still had to manually configure both api_mode and base_url to actually benefit from it. This patch makes the defaults ergonomic: - AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and defaults to api_mode=anthropic_messages + the correct /anthropic base_url (global or China endpoint respectively). - .env.example suggests the /anthropic endpoints instead of /v1. - Explicit base_url or api_mode are preserved when the user sets them. Tests: 5 new cases covering both providers, explicit overrides, and prompt-caching flags. Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425

NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332

…base_url PR NousResearch#17425 (merged) enabled prompt caching for MiniMax models on the anthropic_messages transport, but users still had to manually configure both api_mode and base_url to actually benefit from it. This patch makes the defaults ergonomic: - AIAgent.__init__ now auto-detects provider=minimax / minimax-cn and defaults to api_mode=anthropic_messages + the correct /anthropic base_url (global or China endpoint respectively). - .env.example suggests the /anthropic endpoints instead of /v1. - Explicit base_url or api_mode are preserved when the user sets them. Tests: 5 new cases covering both providers, explicit overrides, and prompt-caching flags. Refs: NousResearch#17332, NousResearch#17333, NousResearch#17425

NousResearch#17425) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR NousResearch#12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes NousResearch#17332

teknium1 merged commit df0e97a into main Apr 29, 2026
11 of 12 checks passed

teknium1 deleted the hermes/hermes-b07fb9a7 branch April 29, 2026 11:56

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have provider/minimax MiniMax (Anthropic transport) comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 29, 2026

nicoechaniz mentioned this pull request May 1, 2026

fix(agent): default MiniMax provider to anthropic_messages + correct base_url #18147

Open

teknium1 mentioned this pull request May 3, 2026

feat(cli): add minimax-oauth provider with PKCE browser flow (salvage #15203) #15419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(minimax): enable Anthropic prompt caching for MiniMax's own models#17425

fix(minimax): enable Anthropic prompt caching for MiniMax's own models#17425
teknium1 merged 1 commit into
mainfrom
hermes/hermes-b07fb9a7

teknium1 commented Apr 29, 2026

Uh oh!

Uh oh!

nicoechaniz commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Apr 29, 2026

Root cause

Changes

Validation

Scope

Uh oh!

Uh oh!

nicoechaniz commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants