fix(agent): don't re-pin sessions to provider-default base_url on credential swap#44099
Open
AIalliAI wants to merge 1 commit into
Open
fix(agent): don't re-pin sessions to provider-default base_url on credential swap#44099AIalliAI wants to merge 1 commit into
AIalliAI wants to merge 1 commit into
Conversation
…dential swap Pool entries seeded from env keys store only the provider's registry default endpoint (base_url = env_url or pconfig.inference_base_url in agent/credential_pool.py), while session resolution layers the user's model.base_url / <PROVIDER>_BASE_URL override on top (pool_url_is_default in hermes_cli/runtime_provider.py). _swap_credential adopted the raw entry URL, so the first 401/429 that triggered credential recovery silently re-pointed the session at the default host — permanently, since the swap mutates _client_kwargs and restore_primary_runtime is gated on _fallback_activated. For a Xiaomi MiMo token-plan setup this meant: turn 1 works against token-plan-cn.xiaomimimo.com, turn 2's larger payload trips the plan's rate limit, recovery swaps base_url to api.xiaomimimo.com where the plan-only key cannot route, and every later attempt fails with "HTTP 404: 404 page not found" for the rest of the session. _swap_credential now keeps the agent's current base_url when the pool entry carries only the registry default and the session is running on a configured override. Entries with genuinely per-credential endpoints (kimi/zai region resolution, custom pools seeded from config) still win. Also detach a credential pool seeded for a different provider on switch_model(), mirroring the fallback-path fix from NousResearch#33163 — without this, a later recoverable error on the NEW provider could swap the agent back onto the old provider's endpoint and credentials, which is why switching models mid-conversation failed to recover. Fixes NousResearch#44070 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
|
Verification: reviewed diff — clean fix for credential-swap URL contamination. The new Checked:
No issues found. |
This was referenced Jun 11, 2026
Contributor
Author
|
Requesting maintainer review — this is ready to land from my side. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI (all test shards, typecheck, e2e). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #44070
Root cause
Two compounding issues, both in the credential-pool recovery path:
1.
_swap_credentialdiscards the configured base_url override.Pool entries seeded from env keys store only the provider's registry default endpoint (
base_url = env_url or pconfig.inference_base_urlinagent/credential_pool.py). Session resolution layers the user'smodel.base_url/<PROVIDER>_BASE_URLoverride on top (pool_url_is_defaultinhermes_cli/runtime_provider.py:371-382) — but that override only exists at resolution time; the pool entry on disk keeps the default URL.AIAgent._swap_credential(run_agent.py) adopted the raw entry URL. So the first recoverable error (401/403/429) that routed through_recover_with_credential_poolsilently re-pointed the session at the provider default — permanently, because the swap mutates_client_kwargs(which every per-request client is rebuilt from) andrestore_primary_runtimeis gated on_fallback_activated, which credential rotation doesn't set.For the reporter's Xiaomi MiMo token-plan setup the sequence is:
https://token-plan-cn.xiaomimimo.com/v1(config override over the pool entry's default) → works, including tool calls.recover_with_credential_pool→try_refresh_current()/mark_exhausted_and_rotate()→_swap_credential(entry)→ base_url flips tohttps://api.xiaomimimo.com/v1(the entry's stored default).404 page not found→ all 3 retries hit it →API call failed after 3 retries: HTTP 404: 404 page not found._client_kwargs→ the session never recovers.This also explains "works intermittently via CLI" — it works until the first 429/401 of a session, then that session is wedged.
(I probed both hosts:
…/v1/chat/completionsexists on both; wrong-path requests return openresty HTML 404s and bad keys return JSON 401s, so the plain Go-style404 page not foundbody the reporter saw is consistent with an authenticated request the backend can't route — i.e. a plan key on the wrong host — not with a malformed path.)2.
switch_model()keeps the old provider's credential pool attached.The fallback path got this exact fix in #33163 ("leaving it attached means downstream recovery calls
_swap_credentialwith a primary entry which overwrites the agent's base_url back to the primary's endpoint — every fallback request then 404s against the wrong host"), butswitch_model()never did. After switching to e.g. DeepSeek mid-conversation, the first recoverable error could swap the agent back onto the old provider's endpoint and credentials — matching the reported "switching models mid-conversation has no effect; the error persists".Fix
_swap_credentialnow resolves the URL via a new_pool_entry_swap_base_urlhelper: when the pool entry carries only the provider's registry-default endpoint and the session is running on a different (configured) URL, the session's URL is preserved; the key still rotates. Entries with genuinely per-credential endpoints (kimi/zai region resolution, custom pools seeded from config, Nousinference_base_url) keep winning, so multi-account pools with distinct endpoints are unaffected.switch_model()detaches a credential pool whose provider differs from the switch target, mirroring Fallback to OpenRouter retains primary's base_url — requests go to ChatGPT Codex with openrouter/auto model → HTTP 404 #33163 and the defensive guard inrecover_with_credential_pool(Fallback provider 429 can exhaust primary provider credential pool #33088).Tests
tests/run_agent/test_44070_credential_swap_base_url.py— 8 new tests covering: override preserved on default-URL entries, per-credential endpoints still adopted, missing-URL entries, no-override pass-through, unknown/custom providers, end-to-end_swap_credential, and cross-provider vs same-provider pool detach onswitch_model.Regression suites pass:
test_fallback_credential_isolation.py,test_credential_pool_interrupt.py,test_codex_xai_oauth_recovery.py,test_credential_pool_routing.py,test_switch_model_*.py,test_primary_runtime_restore.py,test_run_agent.py(482 tests).🤖 Generated with Claude Code