-
-
Notifications
You must be signed in to change notification settings - Fork 52.5k
Description
Summary
After a WebSocket client successfully authenticates (handshake completes), there is no per-connection rate limiting or circuit breaker for subsequent method calls. If the client repeatedly calls a method it is not authorized for, the gateway processes and logs every single rejection without throttling or disconnecting the client. This can render the gateway completely unresponsive.
Reproduction
A macOS app connected with role: "node" and repeatedly called the health method, which is not in NODE_ROLE_METHODS. The gateway rejected every call via authorizeGatewayMethod() but kept the connection open. The client had no backoff and retried immediately, creating an infinite loop.
Observed impact
- 8.9 million
unauthorized role: noderejections accumulated - Gateway log grew to 1.4 GB
- Gateway CPU: 90% (processing rejections)
- macOS app CPU: 172% (retrying)
- CLI
gateway status(RPC probe): timeout — gateway had no capacity to respond - Gateway was effectively down despite the process still running
Recovery
Killing the misbehaving client immediately restored gateway responsiveness. CPU dropped from 90% to <1%, RPC probe succeeded.
Root Cause
src/gateway/server-methods.ts — authorizeGatewayMethod() returns an error response for unauthorized calls, but the connection remains open. There is no:
- Per-connection error counter — no tracking of how many times a connection has been rejected
- Automatic disconnection — the client can fail authorization indefinitely without being kicked
- Log deduplication — each rejection is logged individually (no sampling or rate-based suppression)
- Post-handshake message rate limiting — the existing
auth-rate-limit.tsonly covers the handshake phase
Note: The handshake-phase rate limiter also exempts localhost connections (isLoopback check), so local clients bypass even that layer.
Suggested Fix
Minimal (addresses the DoS):
Add a per-connection counter in the unauthorized path. After N consecutive unauthorized calls (e.g., 10), close the WebSocket with code 1008:
// In authorizeGatewayMethod or the message handler
connection.unauthorizedCount = (connection.unauthorizedCount || 0) + 1;
if (connection.unauthorizedCount > MAX_UNAUTHORIZED_ATTEMPTS) {
socket.close(1008, "repeated unauthorized calls");
return;
}Additional hardening:
- Log deduplication: For repeated identical errors from the same connection, log once + count (e.g.,
"unauthorized 'health' from connection X — suppressed 8.9M repeats") - Post-handshake rate limiting: Extend the sliding-window rate limiter to cover all method calls, not just the auth phase
- Localhost exemption review: Consider whether loopback connections should bypass rate limiting, since local clients (macOS/iOS apps) can also misbehave
Related Issues
- fix: iOS chat broken — node role unauthorized + session key mismatch causes messages to vanish #6767 — node role missing
healthmethod authorization (the specific trigger for this flood, but fixing it alone doesn't prevent future floods from other method mismatches) - [Security] Voice-Call Channel: Service Disruption via Webhook Flooding #12544 — [Security] zero rate limiting on webhook endpoints (same class of vulnerability, different attack surface)
- Heartbeat flood: exec-event bypasses interval check, causing runaway heartbeat runs #17797 — heartbeat flood via exec-event bypass (another instance of missing backoff/rate limiting)
Environment
- OpenClaw: v2026.2.17
- OS: macOS 15.3.2 (Darwin 25.3.0)
- Node.js: v22.20.0
- Gateway: loopback mode (127.0.0.1:18789), LaunchAgent
- Flood source: macOS OpenClaw.app (PID connecting as
role: "node")