fix: skip stale Nous pool entry when agent_key is expired#6856
fix: skip stale Nous pool entry when agent_key is expired#6856benbarclay wants to merge 1 commit into
Conversation
The credential pool intentionally does NOT refresh Nous entries during selection — that would trigger network calls in non-runtime contexts like 'hermes auth list'. But resolve_runtime_provider() was returning the pool entry's stale agent_key (~30 min TTL) without checking whether it had expired, causing the inference API to reject requests. Now, when the pool returns a Nous entry, we check _agent_key_is_usable() before using it. If the key is expired or missing, pool_api_key is cleared so the existing fallthrough to resolve_nous_runtime_credentials() handles the access_token refresh + agent_key mint cycle.
|
|
Merged via PR #6874. Your commit was cherry-picked onto current main with your authorship preserved in git log. Also fixed an interaction gap between your three PRs where the Codex retry paths bypassed the auth.json write-back. Thanks @benbarclay! |
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR #6856/#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR #6856/#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The credential pool intentionally does NOT refresh Nous entries during selection — that would trigger network calls in non-runtime contexts like 'hermes auth list'. But resolve_runtime_provider() was returning the pool entry's stale agent_key (~30 min TTL) without checking whether it had expired, causing the inference API to reject requests.
Now, when the pool returns a Nous entry, we check _agent_key_is_usable() before using it. If the key is expired or missing, pool_api_key is cleared so the existing fallthrough to resolve_nous_runtime_credentials() handles the access_token refresh + agent_key mint cycle.
What does this PR do?
Related Issue
Fixes #
Type of Change
Changes Made
How to Test
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AFor New Skills
hermes --toolsets skills -q "Use the X skill to do Y"Screenshots / Logs