Skip to content

fix(agent): don't re-pin sessions to provider-default base_url on credential swap#44099

Open
AIalliAI wants to merge 1 commit into
NousResearch:mainfrom
AIalliAI:fix/44070-credential-swap-base-url
Open

fix(agent): don't re-pin sessions to provider-default base_url on credential swap#44099
AIalliAI wants to merge 1 commit into
NousResearch:mainfrom
AIalliAI:fix/44070-credential-swap-base-url

Conversation

@AIalliAI

Copy link
Copy Markdown
Contributor

Fixes #44070

Root cause

Two compounding issues, both in the credential-pool recovery path:

1. _swap_credential discards the configured base_url override.

Pool entries seeded from env keys store only the provider's registry default endpoint (base_url = env_url or pconfig.inference_base_url in agent/credential_pool.py). Session resolution layers the user's model.base_url / <PROVIDER>_BASE_URL override on top (pool_url_is_default in hermes_cli/runtime_provider.py:371-382) — but that override only exists at resolution time; the pool entry on disk keeps the default URL.

AIAgent._swap_credential (run_agent.py) adopted the raw entry URL. So the first recoverable error (401/403/429) that routed through _recover_with_credential_pool silently re-pointed the session at the provider default — permanently, because the swap mutates _client_kwargs (which every per-request client is rebuilt from) and restore_primary_runtime is gated on _fallback_activated, which credential rotation doesn't set.

For the reporter's Xiaomi MiMo token-plan setup the sequence is:

  1. Turn 1 resolves https://token-plan-cn.xiaomimimo.com/v1 (config override over the pool entry's default) → works, including tool calls.
  2. Turn 2 replays the full history + tool schemas → bigger payload → trips the plan's rate limit (or an auth blip) → recover_with_credential_pooltry_refresh_current() / mark_exhausted_and_rotate()_swap_credential(entry) → base_url flips to https://api.xiaomimimo.com/v1 (the entry's stored default).
  3. The token-plan key can't route on the pay-per-use host → the Go backend answers 404 page not found → all 3 retries hit it → API call failed after 3 retries: HTTP 404: 404 page not found.
  4. Every later turn reuses the mutated _client_kwargs → the session never recovers.

This also explains "works intermittently via CLI" — it works until the first 429/401 of a session, then that session is wedged.

(I probed both hosts: …/v1/chat/completions exists on both; wrong-path requests return openresty HTML 404s and bad keys return JSON 401s, so the plain Go-style 404 page not found body the reporter saw is consistent with an authenticated request the backend can't route — i.e. a plan key on the wrong host — not with a malformed path.)

2. switch_model() keeps the old provider's credential pool attached.

The fallback path got this exact fix in #33163 ("leaving it attached means downstream recovery calls _swap_credential with a primary entry which overwrites the agent's base_url back to the primary's endpoint — every fallback request then 404s against the wrong host"), but switch_model() never did. After switching to e.g. DeepSeek mid-conversation, the first recoverable error could swap the agent back onto the old provider's endpoint and credentials — matching the reported "switching models mid-conversation has no effect; the error persists".

Fix

Tests

tests/run_agent/test_44070_credential_swap_base_url.py — 8 new tests covering: override preserved on default-URL entries, per-credential endpoints still adopted, missing-URL entries, no-override pass-through, unknown/custom providers, end-to-end _swap_credential, and cross-provider vs same-provider pool detach on switch_model.

Regression suites pass: test_fallback_credential_isolation.py, test_credential_pool_interrupt.py, test_codex_xai_oauth_recovery.py, test_credential_pool_routing.py, test_switch_model_*.py, test_primary_runtime_restore.py, test_run_agent.py (482 tests).

🤖 Generated with Claude Code

…dential swap

Pool entries seeded from env keys store only the provider's registry
default endpoint (base_url = env_url or pconfig.inference_base_url in
agent/credential_pool.py), while session resolution layers the user's
model.base_url / <PROVIDER>_BASE_URL override on top (pool_url_is_default
in hermes_cli/runtime_provider.py). _swap_credential adopted the raw
entry URL, so the first 401/429 that triggered credential recovery
silently re-pointed the session at the default host — permanently,
since the swap mutates _client_kwargs and restore_primary_runtime is
gated on _fallback_activated.

For a Xiaomi MiMo token-plan setup this meant: turn 1 works against
token-plan-cn.xiaomimimo.com, turn 2's larger payload trips the plan's
rate limit, recovery swaps base_url to api.xiaomimimo.com where the
plan-only key cannot route, and every later attempt fails with
"HTTP 404: 404 page not found" for the rest of the session.

_swap_credential now keeps the agent's current base_url when the pool
entry carries only the registry default and the session is running on a
configured override. Entries with genuinely per-credential endpoints
(kimi/zai region resolution, custom pools seeded from config) still win.

Also detach a credential pool seeded for a different provider on
switch_model(), mirroring the fallback-path fix from NousResearch#33163 — without
this, a later recoverable error on the NEW provider could swap the
agent back onto the old provider's endpoint and credentials, which is
why switching models mid-conversation failed to recover.

Fixes NousResearch#44070

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/xiaomi Xiaomi MiLM labels Jun 11, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

Verification: reviewed diff — clean fix for credential-swap URL contamination.

The new _pool_entry_swap_base_url() method correctly solves the problem: pool entries seeded from env keys store only the provider's registry default endpoint, while session resolution layers the user's model.base_url override on top. Without this fix, a 401/429-triggered credential recovery adopts the raw entry URL and permanently pins the session to the wrong host.

Checked:

  • _pool_entry_swap_base_url compares entry.base_url against PROVIDER_REGISTRY[inference_base_url] — when they match (registry default), the agent's current URL is preserved. Entries with genuinely per-credential endpoints (region resolution, custom pools) still win.
  • switch_model now detaches the credential pool when switching to a different provider (mirrors the fallback-path fix from Fallback to OpenRouter retains primary's base_url — requests go to ChatGPT Codex with openrouter/auto model → HTTP 404 #33163). Same-provider switches keep the pool.
  • Test coverage is thorough: registry-default entry preserves override, per-credential endpoint wins, missing URL keeps current, agent-on-default passes through, unknown provider adopts entry URL, end-to-end _swap_credential preserves override, cross-provider switch detaches pool, same-provider switch keeps pool.

No issues found.

@AIalliAI

Copy link
Copy Markdown
Contributor Author

Requesting maintainer review — this is ready to land from my side. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI (all test shards, typecheck, e2e).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/xiaomi Xiaomi MiLM type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(desktop): mimo provider 404 after first turn; model switch fails to recover

3 participants