Skip to content

[Bug]: OAuth 403 misclassified as rate_limit → infinite cooldown loop, no CLI recovery #13909

@onurcinar92-stack

Description

@onurcinar92-stack

Environment

  • OpenClaw 2026.2.9 (33c75cb)
  • WSL2 Ubuntu, Node 22.22.0
  • Telegram channel, 1 agent (main)
  • Auth: 2 Anthropic OAuth tokens in auth-profiles.json
  • Default model: anthropic/claude-opus-4-6

Bug Description

An Anthropic OAuth token missing the user:profile scope returns HTTP 403. OpenClaw misclassifies this as rate_limit instead of auth_error, putting the entire Anthropic provider into cooldown. Since all fallback models (opus-4-6, opus-4-5, haiku-4-5) share the same provider, all models become unavailable simultaneously.

The heartbeat cron job and incoming Telegram messages continuously retry the failed provider, resetting the cooldown timer each time → infinite cooldown loop. The bot becomes completely unresponsive with no CLI path to recovery.

Expected Behavior

  1. HTTP 403 with scope error should be classified as auth_error, not rate_limit
  2. Clear error message indicating the OAuth scope issue
  3. Cooldown should not block all models when the issue is token-specific
  4. A CLI command should exist to reset provider cooldowns

Actual Behavior

error Embedded agent failed before reply: All models failed (3):
  anthropic/claude-opus-4-6: Provider anthropic is in cooldown (all profiles unavailable) (rate_limit)
  anthropic/claude-opus-4-5: Provider anthropic is in cooldown (all profiles unavailable) (rate_limit)
  anthropic/claude-haiku-4-5: Provider anthropic is in cooldown (all profiles unavailable) (rate_limit)

openclaw status shows:

Usage: Claude: HTTP 403: OAuth token does not meet scope requirement user:profile

openclaw models status shows:

anthropic:default=token:sk-ant-o...V1piGgAA [cooldown 5m]

Recovery Attempts (all failed)

Command Result
openclaw gateway restart Cooldown persisted (loaded from auth-profiles.json)
openclaw provider refresh anthropic Command doesn't exist
openclaw system refresh --providers Command doesn't exist
openclaw login anthropic Command doesn't exist
openclaw agent restart main Requires -m flag, then rejects it ("too many arguments")
openclaw config set agents.defaults.model "..." Validation error: "expected object, received string"

Actual Fix (manual)

  1. Edit auth-profiles.json: remove cooldownUntil, set errorCount: 0, failureCounts: {}
  2. Switch model via openclaw models set anthropic/claude-3-5-sonnet-latest
  3. openclaw gateway restart

DX Impact: AI assistants can't help users recover

Because no CLI recovery path exists, users who ask AI assistants (Gemini, ChatGPT, etc.) for help get stuck in a loop of non-existent commands. The AI correctly identifies what should exist (provider refresh, cooldown reset, login) but these commands don't exist in OpenClaw 2026.2.9.

In a real-world debugging session, an external AI assistant suggested the following commands — none of which exist:

  • openclaw provider refresh anthropic
  • openclaw system refresh --providers
  • openclaw login anthropic
  • openclaw gateway install --force
  • openclaw config --path

This resulted in ~45 minutes of troubleshooting with zero progress until the user manually edited auth-profiles.json. The AI also misdiagnosed the root cause repeatedly (blaming rate limits, security warnings, model tiers, WSL2 networking, and memory plugin status — none of which were the actual problem).

This highlights that the missing CLI commands aren't just nice-to-haves — they're essential for recoverability.

Suggestions

  1. Classify 403 correctly — Auth/scope errors are not rate limits. A 403 with "scope requirement" should surface as auth_error with a clear message like: Token missing required scope 'user:profile'. Re-authenticate with: openclaw auth refresh anthropic

  2. Add openclaw provider reset <name> — CLI command to clear cooldown state for a provider without editing JSON files manually

  3. Add openclaw auth refresh <provider> — Re-validate or re-authenticate tokens when scope/permission issues arise

  4. Per-profile cooldown, not per-provider — Don't take down all models when one token/profile is bad. If anthropic:default fails, anthropic:gohan should still be tried

  5. Circuit breaker for heartbeat/cron — Stop retrying (and extending cooldown) after N consecutive auth failures. Log a persistent warning instead

  6. openclaw agent restart fix — The -m flag handling is broken in 2026.2.9 (requires it, then rejects it)

  7. Cooldown should not persist across gateway restarts — Or at minimum, openclaw gateway restart should have a --clear-cooldowns flag

  8. openclaw doctor — A diagnostic command that checks for common issues (expired tokens, cooldown loops, scope errors) and suggests fixes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions