Skip to content

[Bug]: Nous OAuth refresh in credential pool lacks cross-process sync — concurrent crons revoke session #10147

@banozz0

Description

@banozz0

Bug Description

When multiple cron jobs (or gateway + cron) refresh the Nous OAuth token concurrently via the credential pool path, the second process sends an
already-consumed single-use refresh token. The Nous Portal detects "Refresh token reuse", revokes the entire session, and all Nous access dies until
manual hermes model re-auth.

Happens 2-3x/day with normal cron density (e.g. ace-exits every 30m + gateway_health_monitor every 30m).

Related: #8040 (general pool locking). This issue is the Nous-specific consequence — the missing pre-refresh sync that Anthropic and Codex both have.

Steps to Reproduce

  1. Set up 2+ cron jobs that use Nous as provider (e.g. every 30 minutes each)
  2. Wait for the 15-minute access token to expire
  3. Both crons trigger refresh simultaneously
  4. Second process sends the already-consumed refresh token
  5. Nous Portal logs: "Refresh token reuse detected" → "Refresh session has been revoked"
  6. credential_pool marks the single entry as exhausted (1hr cooldown)
  7. After cooldown, refresh fails again because the session is permanently revoked
  8. All Nous access dead until manual hermes model re-auth

Expected Behavior

Only one process should refresh the token. The second process should detect that the token was already refreshed (by reading the updated auth.json) and
use the new token — the same way Anthropic and Codex already work via _sync_anthropic_entry_from_credentials_file() and _sync_codex_entry_from_cli().

Actual Behavior

Both processes call refresh_nous_oauth_from_state() simultaneously with the same refresh token. The HTTP call happens outside _auth_store_lock, so there's
no cross-process coordination. The Nous Portal rotates the token on the first call, then rejects and revokes the session on the second.

Logs show:
[22:00:59] "Refresh token reuse detected"
[22:01:01] "marking device_code exhausted (status=401)"
[22:05:24] "no available entries (all exhausted or empty)"
[22:30:03] "Refresh session has been revoked"

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Debug report uploaded:
  Report       https://paste.rs/Oph3A
  agent.log    https://paste.rs/7N7v9
  gateway.log  https://paste.rs/X8guj

Operating System

macOS 15.3 (Darwin 24.3.0)

Python Version

3.11

Hermes Version

0.9.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

_refresh_entry() in credential_pool.py performs the Nous HTTP refresh outside _auth_store_lock:

  refreshed = auth_mod.refresh_nous_oauth_from_state(
      nous_state, ...
  )

The write-back (_sync_device_code_entry_to_auth_store) acquires the lock after the HTTP call — but by then another process already used the old refresh
token.

Compare with resolve_nous_runtime_credentials() in auth.py, which holds _auth_store_lock() for the entire load → refresh → persist cycle. That path is
safe. The credential pool path is not.

Anthropic and Codex both have pre-refresh sync methods that check if another process already refreshed:

  • Anthropic → _sync_anthropic_entry_from_credentials_file() in credential_pool.py
  • Codex → _sync_codex_entry_from_cli() in credential_pool.py
  • Nous → Nothing. Goes straight to HTTP with whatever (possibly stale) token the pool has.

Proposed Fix (optional)

Add a _sync_nous_entry_from_auth_store() method mirroring the Anthropic/Codex pattern:

  1. Read providers.nous from auth.json under _auth_store_lock
  2. If access_token or refresh_token differ from pool entry → adopt the newer tokens, skip HTTP refresh
  3. Call this before refresh_nous_oauth_from_state() in _refresh_entry()

The pool also only has 1 entry (device_code singleton) so there's zero rotation resilience. Adding the sync would at least prevent the race from killing
the session.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions