Skip to content

fix(gateway): prevent browser rate-limit death spiral on missing credentials#36138

Closed
ademczuk wants to merge 5 commits intoopenclaw:mainfrom
ademczuk:fix/browser-ratelimit-death-spiral
Closed

fix(gateway): prevent browser rate-limit death spiral on missing credentials#36138
ademczuk wants to merge 5 commits intoopenclaw:mainfrom
ademczuk:fix/browser-ratelimit-death-spiral

Conversation

@ademczuk
Copy link
Copy Markdown
Contributor

@ademczuk ademczuk commented Mar 5, 2026

Summary

  • Problem: Opening the web UI without ?token= triggers a self-DoS: missing credentials count as rate-limit failures, and the browser auto-reconnects, exhausting the limit in ~40 seconds → 5-minute lockout for ALL browser connections from that IP.
  • Why it matters: Any misconfigured or first-time browser open locks out the entire IP, including valid authenticated connections.
  • What changed: (1) auth.ts — stop calling recordFailure() for token_missing/password_missing; only actual mismatches burn rate-limit slots. (2) gateway.ts — stop auto-reconnecting on non-recoverable auth errors (AUTH_TOKEN_MISSING, AUTH_PASSWORD_MISSING, AUTH_RATE_LIMITED).
  • What did NOT change (scope boundary): browserRateLimiter config (including exemptLoopback: false), 198.18.0.1 synthetic IP mapping, rate limiter internals, device-token fallback flow (AUTH_TOKEN_MISMATCH deliberately NOT in the non-recoverable list).

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Browser connections without a token now show an auth error immediately without triggering rate-limiting or auto-reconnect.
  • Properly-configured clients (token present) see no behavioral change.
  • Rate-limited status no longer triggers auto-reconnect (previously accelerated lockout).

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? Yes — narrowed what counts as a rate-limit failure. Missing credentials are no longer penalized; only wrong credentials (mismatch) burn slots.
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • Risk + mitigation: Removing recordFailure() for missing tokens means a client that repeatedly connects without any token won't be rate-limited. This is acceptable because: (a) missing-token connections fail immediately with no server-side resource consumption, (b) actual brute-force probes use wrong tokens (mismatch), which are still rate-limited, (c) the gateway should not be exposed without auth in production.

Repro + Verification

Environment

  • OS: Any (reproduced on Windows + Docker)
  • Runtime/container: Node 22+, standard gateway
  • Model/provider: N/A (auth layer, pre-model)
  • Integration/channel: Web UI (browser WebSocket)
  • Relevant config: gateway.auth.rateLimit.maxAttempts: 10 (default)

Steps

  1. Start gateway with token auth enabled
  2. Open web UI without ?token= parameter
  3. Observe WebSocket reconnection behavior in browser DevTools

Expected

Browser shows auth error, does NOT auto-reconnect, no rate-limit penalty.

Actual

Before: 10 auto-reconnects in ~40s → rate-limit lockout for 5 minutes → all browser connections rejected.
After: Single connection attempt → auth error displayed → no reconnect → no rate-limit impact.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

50 tests pass: auth.test.ts (31), auth-rate-limit.test.ts (18), reconnect gating test (1).

Human Verification (required)

  • Verified scenarios: Missing token → no reconnect; wrong token → reconnect (device-token fallback preserved); rate-limited → no reconnect; correct token → normal flow unchanged.
  • Edge cases checked: AUTH_TOKEN_MISMATCH deliberately excluded from non-recoverable list to preserve device-token → shared-token fallback flow (stale device token → mismatch → sendConnect() clears → reconnect with shared token succeeds).
  • What I did NOT verify: Behavior under concurrent multi-tab rate-limit pressure (same IP, mixed valid/invalid tokens). Rate-limiter internals are unchanged so this should be unaffected.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to revert: git revert <sha> on the merge commit, or restore these two files:
    • src/gateway/auth.ts — re-add recordFailure() calls for token_missing/password_missing
    • ui/src/ui/gateway.ts — remove isNonRecoverableAuthError() check before scheduleReconnect()
  • Known bad symptoms: If the fix is reverted, the original cascade returns: browser without token → 5-min lockout. If the fix is wrong, watch for: brute-force attempts not being rate-limited (check token_mismatch still triggers recordFailure()).

Risks and Mitigations

  • Risk: No-token connections are no longer rate-limited, theoretically allowing connection flooding without penalty.

    • Mitigation: Missing-token connections are rejected immediately with minimal server cost (no model calls, no session creation). Real brute-force attacks use wrong tokens (mismatch), which remain rate-limited. The browserRateLimiter with exemptLoopback: false still protects against remote attackers.
  • Risk: Non-recoverable error list may need updating if new auth error codes are added upstream.

    • Mitigation: The list (AUTH_TOKEN_MISSING, AUTH_PASSWORD_MISSING, AUTH_RATE_LIMITED) is conservative — only errors that provably cannot resolve without user action. New error codes default to allowing reconnect (safe default).

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime size: XS labels Mar 5, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e818b1537

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/src/ui/gateway.ts
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 5, 2026

Greptile Summary

This PR fixes a self-DoS cascade where opening the browser UI without credentials could trigger the rate-limiter and lock out all connections from that IP for 5 minutes. The fix has two parts: on the server, recordFailure() is no longer called for missing credentials (token_missing, password_missing), since a bare browser open is not an attack; on the client, a new isNonRecoverableAuthError() guard stops the WebSocket from auto-reconnecting on errors that cannot self-resolve without user action.

Key design decisions are sound:

  • The four non-recoverable codes chosen (AUTH_TOKEN_MISSING, AUTH_PASSWORD_MISSING, AUTH_PASSWORD_MISMATCH, AUTH_RATE_LIMITED) are all conditions that only a human can fix.
  • AUTH_TOKEN_MISMATCH is intentionally excluded: the browser client has a device-token fallback flow where a stale cached device token triggers a mismatch, sendConnect() clears it, and the next reconnect retries with the shared gateway token. This is well-documented in the new code comment.
  • A residual reconnect storm is still theoretically possible for persistently wrong tokens (not missing — wrong), but it is bounded: the rate-limiter's failure counter still increments on token_mismatch / password_mismatch, and once AUTH_RATE_LIMITED is returned the new client-side guard halts all further reconnects.

No issues found in the changed files.

Confidence Score: 4/5

  • This PR is safe to merge; the targeted changes are minimal, well-reasoned, and backed by passing test suites.
  • Both changes are narrow and correct. The server-side fix simply removes two recordFailure() calls for credentials that were never provided — the rate-limit protection for actually wrong credentials is untouched. The client-side guard correctly identifies the four error codes that cannot self-resolve and blocks the reconnect loop for them. The code comments explain the intentional AUTH_TOKEN_MISMATCH exclusion in full. The only remaining edge case (reconnect storm for persistently wrong tokens) is bounded by the rate-limiter and was the subject of prior review discussion; the final design is documented and reasonable. Score stops at 4 rather than 5 because the bounded wrong-credential storm is still present and the manual test items in the test plan are not yet checked off.
  • No files require special attention.

Last reviewed commit: 19f68ad

Comment thread ui/src/ui/gateway.ts
@ademczuk
Copy link
Copy Markdown
Contributor Author

ademczuk commented Mar 5, 2026

Updates (d123c9b)

CI fix: The check job failed on lint:plugins:no-monolithic-plugin-sdk-entry-imports — a pre-existing violation in extensions/mattermost/ introduced by #27160. This check only runs on PRs (skipped on main pushes), so it was invisible on main.

Fixed in a separate commit (d123c9b):

  • Added resolveChannelGroupRequireMention export to src/plugin-sdk/mattermost.ts
  • Migrated 3 mattermost files from monolithic openclaw/plugin-sdk → scoped openclaw/plugin-sdk/mattermost

Bot feedback (f91a757): Added AUTH_TOKEN_MISMATCH and AUTH_PASSWORD_MISMATCH to the non-recoverable auth error gate per greptile's suggestion — wrong credentials won't self-correct without user action.

All checks pass locally: pnpm check (0 errors), 37 tests green (auth + mattermost).

@openclaw-barnacle openclaw-barnacle Bot added the channel: mattermost Channel integration: mattermost label Mar 5, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d123c9b173

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/src/ui/gateway.ts Outdated
@ademczuk ademczuk force-pushed the fix/browser-ratelimit-death-spiral branch from 9796adb to c5667de Compare March 5, 2026 15:35
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5667def48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/auth.ts
@ademczuk ademczuk force-pushed the fix/browser-ratelimit-death-spiral branch from c5667de to 19f68ad Compare March 5, 2026 18:02
@ademczuk ademczuk closed this Mar 5, 2026
@ademczuk ademczuk reopened this Mar 5, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the channel: mattermost Channel integration: mattermost label Mar 5, 2026
@ademczuk
Copy link
Copy Markdown
Contributor Author

ademczuk commented Mar 5, 2026

@Takhoffman — This is a focused gateway auth fix (2 files, ~30 lines).

Summary: The browser WebSocket client can trigger a rate-limit death spiral when connecting without a pre-configured token:

  1. token_missing calls recordFailure() — burns rate-limit slots for a misconfigured client (not a brute-force attack)
  2. Client auto-reconnects unconditionally, even after non-recoverable auth errors
  3. 10 failures in ~40s → ALL browser connections locked out for 5 minutes

Fix:

  • src/gateway/auth.ts: Stop calling recordFailure() for token_missing/password_missing (missing credentials ≠ wrong credentials)
  • ui/src/ui/gateway.ts: Gate scheduleReconnect() — don't auto-reconnect on non-recoverable auth errors (AUTH_TOKEN_MISSING, AUTH_PASSWORD_MISSING, AUTH_PASSWORD_MISMATCH, AUTH_RATE_LIMITED)

Note: AUTH_TOKEN_MISMATCH is intentionally NOT in the non-recoverable set — the device-token fallback flow in sendConnect() needs reconnect to retry with the shared gateway token after clearing a stale device token.

Rebased cleanly onto current main (dropped the mattermost import commit since your #36480 already landed).

@ademczuk
Copy link
Copy Markdown
Contributor Author

ademczuk commented Mar 5, 2026

@joshavant — would you mind reviewing this? Small 2-file fix for a browser rate-limiter self-DoS. CI all green.

The bug: a single browser tab without a pre-configured token triggers an unrecoverable cascade — token_missing burns rate-limit slots via recordFailure(), unconditional scheduleReconnect() amplifies it, 10 failures in ~40s locks out ALL browser connections for 5 minutes.

Fix 1 (auth.ts): Don't count token_missing/password_missing as rate-limit failures (missing creds != wrong creds)
Fix 2 (gateway.ts): Stop auto-reconnecting on non-recoverable auth errors (rate-limited, token missing)

@ademczuk ademczuk force-pushed the fix/browser-ratelimit-death-spiral branch from 19f68ad to 417deb5 Compare March 6, 2026 06:34
@ademczuk
Copy link
Copy Markdown
Contributor Author

ademczuk commented Mar 6, 2026

Test Coverage Added (417deb5)

Added unit tests verifying both fixes:

Fix 1 — auth.ts (4 new tests):

  • token_missing does NOT call recordFailure() — misconfigured client, not brute-force
  • password_missing does NOT call recordFailure() — same reasoning
  • token_mismatch DOES call recordFailure() — actual wrong credentials
  • password_mismatch DOES call recordFailure() — actual wrong credentials

Fix 2 — gateway.ts reconnect gating (8 new tests):

  • AUTH_TOKEN_MISSING blocks reconnect (won't resolve without user action)
  • AUTH_PASSWORD_MISSING blocks reconnect
  • AUTH_PASSWORD_MISMATCH blocks reconnect (wrong password won't self-correct)
  • AUTH_RATE_LIMITED blocks reconnect (reconnecting burns more rate-limit slots)
  • AUTH_TOKEN_MISMATCH allows reconnect (device-token fallback flow needs retry)
  • Undefined error allows reconnect (normal disconnect)
  • No detail code allows reconnect (network issue)
  • Unknown code allows reconnect (future-proof)

All 44 gateway tests pass (36 auth + 8 reconnect gating).

@ademczuk
Copy link
Copy Markdown
Contributor Author

ademczuk commented Mar 6, 2026

@vincentkoc — small gateway auth fix (2 files, +139/-4, size:S). CI all green, all bot comments addressed.

The bug: A single browser tab without a pre-configured token triggers a rate-limit death spiral — token_missing burns rate-limit slots via recordFailure(), the browser auto-reconnects unconditionally, and within ~40 seconds the entire IP is locked out for 5 minutes.

The fix: (1) Stop counting missing credentials as failures — only actual mismatches burn slots. (2) Stop auto-reconnecting on non-recoverable auth errors.

Would appreciate a quick look when you have a moment.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66f4360692

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/auth.ts
ademczuk and others added 5 commits March 6, 2026 21:28
…entials

Don't count missing credentials (no token/password provided) as auth
failures in the rate limiter — only actual mismatches should burn slots.
Previously, opening the web UI without a token triggered recordFailure()
on every connect attempt, exhausting the rate limit in ~40 seconds and
locking the user out for 5 minutes.

Also stop the browser client from auto-reconnecting on non-recoverable
auth errors (missing token, missing password, rate-limited), since these
won't resolve without user action and the reconnect loop accelerates the
lockout cascade.

Fixes openclaw#28997

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- auth.test.ts: verify token_missing and password_missing do NOT call
  recordFailure (misconfigured client ≠ brute-force)
- auth.test.ts: verify token_mismatch and password_mismatch still DO
  call recordFailure (actual brute-force attempts)
- reconnect-gating.test.ts: verify isNonRecoverableAuthError blocks
  reconnect for missing/mismatch/rate-limited but allows reconnect for
  token_mismatch (device-token fallback flow) and unknown codes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ademczuk ademczuk force-pushed the fix/browser-ratelimit-death-spiral branch from 66f4360 to 56eda54 Compare March 6, 2026 20:30
@openclaw-barnacle openclaw-barnacle Bot added the channel: feishu Channel integration: feishu label Mar 6, 2026
@vincentkoc
Copy link
Copy Markdown
Member

Unrelated changes in this PR

@vincentkoc vincentkoc closed this Mar 7, 2026
@ademczuk ademczuk deleted the fix/browser-ratelimit-death-spiral branch March 7, 2026 08:58
ademczuk added a commit to ademczuk/openclaw that referenced this pull request Mar 7, 2026
Stop counting missing credentials (token_missing, password_missing) as
rate-limit failures — a misconfigured browser is not a brute-force attack.
Stop auto-reconnecting on non-recoverable auth errors (missing token,
missing password, wrong password, rate-limited). Preserve device-token
fallback by allowing reconnect on token mismatch.

Includes unit tests for both fixes (12 new tests, all passing).

Continuation of openclaw#36138 (closed due to unrelated changes drifting in).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: web-ui App: web-ui channel: feishu Channel integration: feishu gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Control UI webchat loses auth token on WebSocket reconnect, triggers rate limiter

2 participants