Skip to content

[Bug]: saveAuthProfileStore overwrites runtime auth-profile snapshot with external-CLI-filtered view, dropping OAuth credentials from in-process state #85521

@cobenrogers

Description

@cobenrogers

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Any call to saveAuthProfileStore (in src/agents/auth-profiles/store.ts) clobbers the runtime auth-profile snapshot with an external-CLI-filtered view, silently dropping the anthropic:claude-cli OAuth credential from in-process state while leaving the on-disk file intact — every subsequent embedded-agent run then fails with FailoverError: No credentials found for profile "anthropic:claude-cli" until the openclaw models ... CLI bootstrap re-runs.

Steps to reproduce

  1. Start OpenClaw with an anthropic:claude-cli OAuth profile active and the embedded agent responding normally.
  2. Trigger any call to saveAuthProfileStore — confirmed triggers include:
    • A channel-secrets reload after mutating openclaw.json (e.g., SecretRef migration of channels.discord.token)
    • Any openclaw update that rewrites config on boot
  3. Observe gateway logs: config change detected → channel reload → Secrets reloaded.
  4. Submit any embedded-agent request: FailoverError: No credentials found for profile "anthropic:claude-cli".
  5. Confirm ~/.openclaw/agents/main/agent/auth-profiles.json is still present on disk with valid credentials.
  6. Run openclaw models status (forces auth-profiles bootstrap); restart gateway: agent recovers.

Reproduced twice on v2026.5.18 (2026-05-19 and 2026-05-20) on Pop!_OS 24.04.

Expected behavior

Any saveAuthProfileStore call that writes auth-profiles.json should either preserve or immediately re-attach external-CLI OAuth profiles in the runtime snapshot. The overlayExternalAuthProfiles bootstrap path that openclaw models ... triggers should also run on the gateway's own write path.

Actual behavior

After saveAuthProfileStore writes, the runtime credential reference for anthropic:claude-cli is empty. Subsequent calls to ensureAuthProfileStore → resolveRuntimeAuthProfileStore return the stripped snapshot (external profiles absent) without re-reading disk. overlayExternalAuthProfiles is bypassed on the runtime fast-path. Result: every embedded-agent run fails with No credentials found for profile "anthropic:claude-cli" until the CLI bootstrap re-runs.

Root cause (traced to source)

In src/agents/auth-profiles/store.ts, saveAuthProfileStore ends with:

const localStore = buildLocalAuthProfileStoreForSave({ store, agentDir, options });
saveJsonFile(authPath, buildPersistedAuthProfileSecretsStore(localStore, ...));
savePersistedAuthProfileState(localStore, agentDir);
writeCachedAuthProfileStore({ authPath, ..., store: localStore });
if (hasRuntimeAuthProfileStoreSnapshot(agentDir))
  setRuntimeAuthProfileStoreSnapshot(localStore, agentDir);  // ← bug

buildLocalAuthProfileStoreForSave intentionally filters out external-CLI profiles for disk persistence (correct — those credentials are owned by the external CLI). But the same filtered localStore is then written to the runtime snapshot via setRuntimeAuthProfileStoreSnapshot. Subsequent credential lookups go through ensureAuthProfileStore → resolveRuntimeAuthProfileStore, which short-circuits on the cached snapshot without re-reading disk. The overlayExternalAuthProfiles re-attachment is bypassed at runtime (suspected: runtime-fast-path-BLTCPu20.js bundle path — not yet fully traced).

Fix direction: setRuntimeAuthProfileStoreSnapshot should receive the full store (with external profiles preserved) rather than the disk-filtered localStore. Alternatively, resolveRuntimeAuthProfileStore should always call overlayExternalAuthProfiles before returning.

OpenClaw version

2026.5.18

Operating system

Pop!_OS 24.04 (Linux 6.18.7-76061807-generic x64)

Install method

npm global

Model

anthropic/claude-sonnet-4-6

Provider / routing chain

openclaw → anthropic:claude-cli (OAuth, external-CLI profile)

Additional provider/model setup details

anthropic:claude-cli is the OAuth profile managed by the Claude CLI at ~/.claude/.credentials.json. OpenClaw uses it as an external-CLI profile overlay. On-disk: ~/.openclaw/agents/main/agent/auth-profiles.json. The openclaw security audit flags this profile as [LEGACY_RESIDUE] after the bug fires — the disk file survives but the in-memory reference is gone.

Logs, screenshots, and evidence

# Gateway log sequence (2026-05-19 observed):
config change detected; evaluating reload (channels.discord.token)
config change requires channel reload (discord)
Secrets reloaded.
# ... 8h later, first embedded-agent request:
FailoverError: No credentials found for profile "anthropic:claude-cli"

# Recovery sequence:
$ openclaw models status
# expiring expires in 8h  ← bootstrap re-runs from ~/.claude/.credentials.json
$ systemctl --user restart openclaw-gateway
# wait ~8s
$ curl -s http://localhost:18789/health
{"ok":true,"status":"live"}
# Agent recovers; responds normally to next message

Note on related issues: Upstream #85125 is a /models perf prewarm task and is not related to this bug. Commit a483f70a clears the prepared provider-auth map on auth failure — a separate in-memory cache from the runtime auth-profile snapshot this bug clobbers; its relevance to our specific failure is unverified.

v5.17 shipped Auth: serialize provider login writes through the auth-profile lock so a live Gateway cannot overwrite freshly refreshed OAuth credentials with an expired in-memory snapshot — we're on 5.18 (includes it) yet still see this bug, indicating the dominant trigger is the saveAuthProfileStore snapshot clobber on config-write, not an OAuth-refresh race.

v5.19 ships Gateway/config: keep config writes from failing on unrelated unresolved auth-profile SecretRefs while preserving live auth-profile runtime snapshots — directly relevant to the config-write trigger. Upgrading to ≥5.20 is expected to close the most common trigger but the root-cause saveAuthProfileStore snapshot bug should still be fixed at the source.

Impact and severity

  • Affected: Any OpenClaw instance using anthropic:claude-cli (or other external-CLI OAuth profile) when a config write triggers saveAuthProfileStore
  • Severity: High — blocks all embedded-agent runs; gateway appears healthy (/health returns ok) while agent is silently broken
  • Frequency: Triggered on every config-mutating event (channel reload, openclaw update); intermittent from user perspective because the gateway stays up
  • Consequence: Agent stops responding until a manual openclaw models status + gateway restart recovery sequence is performed; silent failure is particularly dangerous in unattended setups

Additional information

Tracking issue in our fleet: cobenrogers/mission-control#7

Local workaround (documented for our fleet):

  1. openclaw models status (forces auth-profiles bootstrap from ~/.claude/.credentials.json)
  2. systemctl --user restart openclaw-gateway
  3. Wait ~8s for HTTP rebind; verify curl -s http://localhost:18789/health returns {"ok":true,"status":"live"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions