security: bootstrap token path exposes mutex-stall DoS β no rate limit, no lockout, no alert#76322
security: bootstrap token path exposes mutex-stall DoS β no rate limit, no lockout, no alert#76322fede-kamel wants to merge 4 commits into
Conversation
|
Codex review: needs changes before merge. Reviewed May 31, 2026, 8:22 AM ET / 12:22 UTC. Summary PR surface: Source +35, Tests +255. Total +290 across 3 files. Reproducibility: yes. for the review finding from source: an invalid bootstrap token sets pendingBootstrapFailure, a locked device-token bucket sets deviceTokenRateLimited, and the only bootstrap recordFailure calls are skipped on that path. I did not run tests because this was a read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge
Security Review findings
Review detailsBest possible solution: Land one canonical bootstrap-token rate-limit fix that gates before verify, records invalid bootstrap attempts even when later fallback is rate-limited, preserves the no-charge invariant for a valid device-token fallback, and carries focused regression proof. Do we have a high-confidence way to reproduce the issue? Yes for the review finding from source: an invalid bootstrap token sets pendingBootstrapFailure, a locked device-token bucket sets deviceTokenRateLimited, and the only bootstrap recordFailure calls are skipped on that path. I did not run tests because this was a read-only review. Is this the best way to solve the issue? No as submitted. The pre-verify rate-limit gate is the right shape, but the deferred failure accounting must also charge the bootstrap bucket when fallback is rate-limited, or maintainers should choose the companion branch as the canonical fix. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 2e254005a016. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +35, Tests +255. Total +290 across 3 files. View PR surface stats
Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
f0e1f7d to
e7377e4
Compare
|
This is ready for merge review. All maintainer feedback addressed:
The DoS vector is closed: attacker connections are blocked at |
|
@clawsweeper additional feedback? |
|
@clawsweeper additional feedback? |
7498236 to
fc20092
Compare
|
/clawsweeper re-review |
|
Friendly bump for re-review.
cc @vincentkoc β happy to reshape, split further, or rebase if you'd prefer a different surface; same footing as the trust-pinning rework. |
|
@steipete when you have a moment β small security fix (bootstrap mutex-stall DoS PoC), CI green and currently mergeable. Happy to address feedback. Thanks! |
|
cc @openclaw/openclaw-secops for visibility β this is the PoC that motivates the rate-limit fix in #77492. Mergeable. No rush. Thanks! |
adf0cdc to
f3b4859
Compare
b742e55 to
f3b4859
Compare
79f7a9f to
70178b5
Compare
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
ClawSweeper PR egg π₯ Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress. Hatch commandComment Hatchability rules:
What is this egg doing here?
|
Finding 1: verifyDeviceBootstrapToken() has no rate limiting unlike the
device-token path, and serializes all calls through a module-level mutex.
An attacker with N concurrent connections can stall legitimate device
pairing proportionally with no IP lockout or alert triggered.
poc-bootstrap-dos.mjs β in-process PoC (confirmed x74 stall)
poc-bootstrap-dos-real.mjs β full WS PoC with Ed25519-signed device
identities, real OpenClaw-protocol server
poc-polyfill-node18.mjs β Node 18 polyfill helper for toSorted
Record bootstrap rate-limit failure only after all fallback auth paths (device-token) have run and auth is still false. A device presenting a stale bootstrap token alongside a valid device token is a legitimate connection and must not accumulate bootstrap-scope failures. Adds a test covering the cross-path case: stale bootstrap + valid device token β authOk true, bootstrap recordFailure never called. Removes runnable PoC scripts (poc-bootstrap-dos*.mjs, .ts) ahead of maintainer review; the rate-limiting proof lives in the test suite.
70178b5 to
0b30833
Compare
|
Closing as superseded by The bootstrap-token rate limiting this PR adds already landed on
Rebasing this branch onto current Linked issue #77980 is fixed by the same commit and is being closed alongside. What would reopen this: a concrete bootstrap path on |
Closes #77980
Summary
rateLimiter.check()+rateLimiter.recordFailure()protectionverifyDeviceBootstrapTokenmutex, stalling legitimate device pairing with no IP lockout or alertrecordFailureis only flushed if the entire handshake fails. A device with a stale bootstrap token + valid device token is a legitimate connection β the pending failure is discarded when device-token succeedsChanges
src/gateway/auth-rate-limit.tsAUTH_RATE_LIMIT_SCOPE_BOOTSTRAP_TOKEN = "bootstrap-token"constantsrc/gateway/server/ws-connection/auth-context.tspendingBootstrapFailureflag set on rejection;recordFailurecalled only after all fallback paths (device-token) complete and auth is still falseTest plan
17/17 passing (
pnpm test src/gateway/server/ws-connection/auth-context.test.ts):Unit (mock limiter)
verifyBootstrapTokenwhen rate-limitedrecordFailurecalled on rejected bootstrap token (no device-token fallback)resetcalled on successful bootstrap tokencheckcalled with"bootstrap-token", never"device-token"recordFailureNOT called β covers the deferred-failure invariantIntegration (real
AuthRateLimiter)verifyBootstrapTokenruns (mutex never acquired)Real behavior proof
verifyDeviceBootstrapTokenhad no rate limit. An attacker hammering bogus bootstrap tokens from one IP serialised through the sharedwithLockmutex (disk read + crypto compare + disk write per attempt) and starved legitimate pairing handshakes. With this PR,rateLimiter.check()runs beforeverifyDeviceBootstrapToken, so attempts past the threshold short-circuit without acquiring the mutex.src/gateway/server/ws-connection/auth-context.tsandsrc/gateway/auth-rate-limit.tsfrom this branch (security/bootstrap-dos-pocHEADf3b4859b1b). No vitest, no mocks of the patched modules. The reproducer drivesresolveConnectAuthState+resolveConnectAuthDecisiondirectly with a realcreateAuthRateLimiterinstance and a syntheticverifyBootstrapTokenwhose body holds an async mutex for 5 ms to model the realwithLock-serialised disk I/O.node --import tsx <local-copy>.mtsfrom a worktree on this branch.verifyDeviceBootstrapTokenmutex. Phase 2 β while the attacker IP was hammering in a tight loop for 500 ms, only 5 verifier acquisitions occurred (post-lockout the IP cannot acquire the mutex at all); the victim from a separate IP completed bootstrap auth at avg 5.24 ms per attempt (one mutex hold), not NΓmutex_hold as the pre-patch code would have produced. Phase 3 β a third clean IP authenticated normally in 5.26 ms, confirming the lockout is per-IP and does not leak across IPs.wsserver upgrade and IP extraction). The handshake-decision rate-limit gate is the unit changed by this PR; transport handling is upstream ofresolveConnectAuthDecisionand unmodified.Notes for reviewer
withLockmutex insrc/infra/device-bootstrap.tsis unchanged; the fix is purely a pre-flight rate-limit check that prevents the mutex from being acquired by IPs that have exhausted their attempt budget.exemptLoopback: falsewas used in the reproducer because the policy default exempts127.0.0.1/::1so localhost CLI sessions are never locked out; the attack vector targets non-loopback callers reaching the gateway WS port.