Skip to content

[Bug]: Anthropic OAuth credentials desync between ~/.claude/.credentials.json and auth-profiles.json — silent subagent failures #44919

@IIIyban

Description

@IIIyban

Environment

  • Version: OpenClaw 2026.3.11 (29dc654)
  • OS: Ubuntu 24.04 (linux 6.8.0-101-generic x64)
  • Provider: Anthropic (claude-opus-4-6, claude-sonnet-4-6) via OAuth
  • Setup: Multi-topic Telegram bot with subagent orchestration

Describe the bug

OpenClaw maintains OAuth credentials for Anthropic in auth-profiles.json, but these credentials desync from ~/.claude/.credentials.json (the source of truth updated by openclaw models auth login and Claude CLI). When the token in auth-profiles.json expires and its refresh token has been invalidated (because ~/.claude/ already used a newer refresh), all subagent completions fail silently.

The gateway stays alive, openclaw status looks healthy, but subagent announce calls fail with:

OAuth token refresh failed for anthropic: Failed to refresh OAuth token for anthropic. Please try again or re-authenticate.

This is particularly insidious because:

  1. The main session may work fine (using a different auth path or cached token)
  2. Subagent spawns succeed (task is accepted)
  3. Only the completion announce fails — the result is computed but never delivered
  4. The subagent is then pruned as orphan: Subagent orphan run pruned ... reason=missing-session-entry
  5. From user perspective: subagent "silently disappears" with no error message

Steps to reproduce

  1. Authenticate Anthropic via OAuth (openclaw models auth login)
  2. Tokens saved in both ~/.claude/.credentials.json AND auth-profiles.json
  3. Wait for token expiry (~2-10 hours depending on grant)
  4. ~/.claude/.credentials.json gets refreshed (by CLI or auto-refresh)
  5. auth-profiles.json still has the OLD refresh token (now invalidated by Anthropic)
  6. Spawn a subagent via sessions_spawn
  7. Subagent completes the task successfully
  8. Announce back to parent session fails with OAuth 401
  9. After retries, system gives up: Subagent announce give up (retry-limit)
  10. No user-visible error. Result is lost.

Root cause

Two independent credential stores with no automatic sync:

  • ~/.claude/.credentials.json — updated on login/refresh
  • ~/.openclaw/agents/<id>/agent/auth-profiles.json — used by gateway for model calls

When Anthropic rotates the refresh token (standard OAuth behavior — old refresh is invalidated when a new one is issued), the auth-profiles.json copy becomes permanently broken until manual intervention.

Additional context

Multiple agents (main, hr-bp, etc.) each have their own auth-profiles.json, all needing sync. With N agents, the desync probability multiplies.

Workaround

We wrote a systemd timer that syncs ~/.claude/.credentials.json → all auth-profiles.json every 30 minutes and alerts when expiry < 1 hour. Happy to share the script.

Proposed fix

  1. Single source of truth: auth-profiles.json should be the canonical store, and refreshOAuthTokenWithLock should update it atomically on every refresh
  2. Cross-store sync: When ~/.claude/.credentials.json is updated, propagate to all agent auth-profiles.json (or read from a single location)
  3. Announce resilience: If OAuth fails during subagent announce, surface the error to the parent session instead of silently pruning
  4. Health alert: When OAuth refresh fails, emit a user-visible warning (relates to [Bug]: No notification when all models fail auth — gateway silently dead for hours #20036)

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions