Skip to content

fix: skip stale Nous pool entry when agent_key is expired#6856

Closed
benbarclay wants to merge 1 commit into
mainfrom
fix/oauth-issue1-nous-entry-needs-refresh
Closed

fix: skip stale Nous pool entry when agent_key is expired#6856
benbarclay wants to merge 1 commit into
mainfrom
fix/oauth-issue1-nous-entry-needs-refresh

Conversation

@benbarclay

@benbarclay benbarclay commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

The credential pool intentionally does NOT refresh Nous entries during selection — that would trigger network calls in non-runtime contexts like 'hermes auth list'. But resolve_runtime_provider() was returning the pool entry's stale agent_key (~30 min TTL) without checking whether it had expired, causing the inference API to reject requests.

Now, when the pool returns a Nous entry, we check _agent_key_is_usable() before using it. If the key is expired or missing, pool_api_key is cleared so the existing fallthrough to resolve_nous_runtime_credentials() handles the access_token refresh + agent_key mint cycle.

What does this PR do?

Related Issue

Fixes #

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

How to Test

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform:

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

  • This skill is broadly useful to most users (if bundled) — see Contributing Guide
  • SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
  • No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
  • I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

The credential pool intentionally does NOT refresh Nous entries during
selection — that would trigger network calls in non-runtime contexts
like 'hermes auth list'. But resolve_runtime_provider() was returning
the pool entry's stale agent_key (~30 min TTL) without checking
whether it had expired, causing the inference API to reject requests.

Now, when the pool returns a Nous entry, we check _agent_key_is_usable()
before using it. If the key is expired or missing, pool_api_key is
cleared so the existing fallthrough to resolve_nous_runtime_credentials()
handles the access_token refresh + agent_key mint cycle.
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: base64 encoding/decoding detected

Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.

Matches (first 20):

4166:+        body = base64.urlsafe_b64decode(payload["body"]["data"]).decode("utf-8", errors="replace")
4170:+                body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")
4175:+                    body = base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8", errors="replace")

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

skills/productivity/google-workspace/scripts/setup.py
tests/skills/test_google_oauth_setup.py

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #6874. Your commit was cherry-picked onto current main with your authorship preserved in git log. Also fixed an interaction gap between your three PRs where the Codex retry paths bypassed the auth.json write-back. Thanks @benbarclay!

akhater pushed a commit to akhater/hermes-agent that referenced this pull request Apr 17, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
teknium1 pushed a commit that referenced this pull request Apr 18, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR #6856/#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
teknium1 pushed a commit that referenced this pull request Apr 18, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR #6856/#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.

Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.

Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR NousResearch#6856/NousResearch#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants