Skip to content

fix(auth): classify permission_error as auth_permanent for profile fallback#31324

Merged
vincentkoc merged 3 commits intoopenclaw:mainfrom
Sid-Qin:fix/auth-profile-403-fallback-31306
Mar 2, 2026
Merged

fix(auth): classify permission_error as auth_permanent for profile fallback#31324
vincentkoc merged 3 commits intoopenclaw:mainfrom
Sid-Qin:fix/auth-profile-403-fallback-31306

Conversation

@Sid-Qin
Copy link
Contributor

@Sid-Qin Sid-Qin commented Mar 2, 2026

Summary

  • Problem: When an OAuth auth profile returns HTTP 403 with permission_error (e.g. expired organization plan), the error is classified as generic auth instead of auth_permanent. The auth classification applies only a short cooldown, so the gateway keeps retrying the same broken profile indefinitely (20+ consecutive 403s observed).
  • Why it matters: Users lose access to their AI assistant for extended periods because profile rotation never triggers effectively. Manual intervention is required.
  • What changed: Added "permission_error" and "not allowed for this organization" to the authPermanent error patterns in ERROR_PATTERNS. This causes these errors to receive the longer disabledUntil backoff window (same treatment as revoked keys and billing errors), enabling proper rotation to the next healthy profile.
  • What did NOT change: The resolveFailoverReasonFromError function already returns auth for 403 status codes — this change makes it return auth_permanent when the error message contains permission_error. Regular 403s without permission_error still classify as auth.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  1. OAuth profiles that return permission_error are now placed into a longer disable window (exponential backoff) instead of a short cooldown.
  2. The gateway automatically rotates to the next available auth profile when an OAuth profile's organization plan expires.
  3. lastGood is cleared for the failing profile, preventing immediate re-selection.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS / Linux
  • Runtime: Node.js 22+
  • Provider: Anthropic (OAuth auth profile)

Steps

  1. Configure two auth profiles: anthropic:100 (token, valid) and anthropic:200 (OAuth, expired plan)
  2. Let lastGood.anthropic point to anthropic:200
  3. Send a message

Expected

  • Gateway tries :200, gets 403 permission_error, classifies as auth_permanent, disables profile, rotates to :100

Actual

  • Before: Gateway keeps retrying :200 with short cooldown, never effectively rotating
  • After: Gateway disables :200 with exponential backoff and rotates to :100

Evidence

  • Failing test/log before + passing after

Three new test cases in failover-error.test.ts:

  • 403 permission_error returns auth_permanent (via status + message)
  • permission_error in error message string classifies as auth_permanent (via coercion)
  • "not allowed for this organization" classifies as auth_permanent

Human Verification (required)

  • Verified scenarios: permission_error with status 403; permission_error in error message string; "not allowed for this organization" variant; regular 403 without permission_error still returns auth
  • Edge cases checked: existing authPermanent patterns still work (revoked key, invalid_api_key, deactivated account)
  • What you did not verify: Live OAuth profile rotation with expired Anthropic plan

Compatibility / Migration

  • Backward compatible? Yes — only changes classification of a previously under-handled error
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert: Remove "permission_error" and "not allowed for this organization" from authPermanent patterns in src/agents/pi-embedded-helpers/errors.ts
  • Files to restore: src/agents/pi-embedded-helpers/errors.ts

Risks and Mitigations

  • Risk: Transient permission errors (not plan-related) might get classified as permanent
    • Mitigation: permission_error is a specific Anthropic error type for org-level permission issues, not transient API errors. Adding the more specific "not allowed for this organization" string narrows the match. Generic 403s still classify as auth (shorter cooldown).

…llback

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR fixes a critical failover bug where OAuth profiles returning HTTP 403 with permission_error (e.g., expired Anthropic organization plans) were classified as generic auth errors instead of auth_permanent, causing the gateway to retry broken profiles with only short cooldowns instead of rotating to healthy alternatives.

The fix adds two specific error patterns to the authPermanent classification in src/agents/pi-embedded-helpers/errors.ts:

  • "permission_error" - Anthropic's specific error type for org-level permission issues
  • "not allowed for this organization" - OAuth org restriction error text

These patterns now trigger the same exponential backoff as billing errors (via disabledUntil), enabling proper profile rotation.

Key changes:

  • Added 2 string patterns to ERROR_PATTERNS.authPermanent array
  • Added 3 comprehensive test cases covering both error classification paths
  • Maintained backward compatibility: regular 403s without these patterns still use short auth cooldown

Impact:

  • Prevents indefinite retries of broken OAuth profiles
  • Automatically rotates to healthy profiles when org plans expire
  • Applies appropriate backoff window for org-level permission failures

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects a well-contained bug fix with comprehensive test coverage, clear backward compatibility, and straightforward rollback path. The change is minimal (2 patterns added), addresses a real production issue (20+ consecutive 403s), and has no security or breaking change implications.
  • No files require special attention

Last reviewed commit: 571d29f

@vincentkoc vincentkoc merged commit 40e078a into openclaw:main Mar 2, 2026
5 checks passed
robertchang-ga pushed a commit to robertchang-ga/openclaw that referenced this pull request Mar 2, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
hanqizheng pushed a commit to hanqizheng/openclaw that referenced this pull request Mar 2, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
sachinkundu pushed a commit to sachinkundu/openclaw that referenced this pull request Mar 6, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
atlastacticalbot pushed a commit to tensakulabs/atlasbot that referenced this pull request Mar 6, 2026
…llback (openclaw#31324)

When an OAuth auth profile returns HTTP 403 with permission_error
(e.g. expired plan), the error was not matched by the authPermanent
patterns. This caused the profile to receive only a short cooldown
instead of being disabled, so the gateway kept retrying the same
broken profile indefinitely.

Add "permission_error" and "not allowed for this organization" to
the authPermanent error patterns so these errors trigger the longer
billing/auth_permanent disable window and proper profile rotation.

Closes openclaw#31306

Made-with: Cursor

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
(cherry picked from commit 40e078a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auth profile fallback doesn't trigger on OAuth 403 permission_error (expired plan)

2 participants