Skip to content

fix(browser): use CDP command probe for cdpReady health#31421

Merged
vincentkoc merged 5 commits intoopenclaw:mainfrom
vincentkoc:vincentkoc-code/fix-cdp-ready-command-probe-23427
Mar 2, 2026
Merged

fix(browser): use CDP command probe for cdpReady health#31421
vincentkoc merged 5 commits intoopenclaw:mainfrom
vincentkoc:vincentkoc-code/fix-cdp-ready-command-probe-23427

Conversation

@vincentkoc
Copy link
Member

Summary

  • Problem: browser status could report cdpReady: true when the websocket was open but the CDP command channel was stale, matching idle-time failure reports.
  • Why it matters: operators see healthy status while act/snapshot fail, delaying recovery and diagnosis.
  • What changed: isChromeCdpReady now requires a successful Browser.getVersion command response over websocket, not just handshake/open success.
  • What did NOT change (scope boundary): no behavior changes to browser action execution paths and no automatic browser restart logic in this PR.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • browser status now reports cdpReady based on real command-channel health, reducing false healthy states after idle CDP degradation.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 + pnpm
  • Model/provider: n/a
  • Integration/channel (if any): Browser CDP helpers
  • Relevant config (redacted): default browser config

Steps

  1. Serve /json/version with a websocket endpoint.
  2. Verify isChromeCdpReady returns true only when websocket command response for Browser.getVersion is received.
  3. Verify websocket-open-without-command-response returns false.

Expected

  • cdpReady requires command-channel responsiveness.

Actual

  • Verified via new helper tests.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: pnpm vitest run src/browser/chrome.test.ts
  • Edge cases checked: websocket opens but command never responds (cdpReady=false).
  • What you did not verify: long-idle live macOS/browser sessions end-to-end.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert this PR.
  • Files/config to restore: src/browser/chrome.ts.
  • Known bad symptoms reviewers should watch for: false negatives if custom CDP endpoints intentionally block Browser domain commands.

Risks and Mitigations

  • Risk: some non-standard CDP endpoints could accept websocket handshakes but reject Browser.getVersion despite being partially usable.
    • Mitigation: this is a deliberate stricter health definition for status accuracy; extension relay path explicitly supports Browser.getVersion.

@openclaw-barnacle openclaw-barnacle bot added size: S maintainer Maintainer-authored PR labels Mar 2, 2026
@vincentkoc
Copy link
Member Author

Rebased this PR onto current main and refreshed the branch head.

Local verification on updated head:

  • pnpm exec oxfmt --check src/browser/chrome.ts src/browser/chrome.test.ts CHANGELOG.md
  • pnpm vitest run src/browser/chrome.test.ts

This keeps the same behavior: cdpReady now requires a successful CDP Browser.getVersion command response over websocket, not just socket-open.

@vincentkoc vincentkoc force-pushed the vincentkoc-code/fix-cdp-ready-command-probe-23427 branch from 564a0ce to 2424ebd Compare March 2, 2026 22:53
@vincentkoc vincentkoc marked this pull request as ready for review March 2, 2026 22:55
@aisle-research-bot
Copy link

aisle-research-bot bot commented Mar 2, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium SSRF / outbound WebSocket interaction via user-controlled CDP URL in readiness check
2 🔵 Low Unbounded WebSocket message parsing in CDP health check can cause memory/CPU DoS

1. 🟡 SSRF / outbound WebSocket interaction via user-controlled CDP URL in readiness check

Property Value
Severity Medium
CWE CWE-918
Location src/browser/chrome.ts:203-213

Description

The Chrome CDP readiness probe makes outbound HTTP and WebSocket connections based on cdpUrl and then sends a CDP JSON command over the established WebSocket.

If an attacker can influence cdpUrl (e.g., by creating a profile with an arbitrary cdpUrl), the server will:

  • Fetch GET {cdpUrl}/json/version (outbound HTTP request)
  • Extract webSocketDebuggerUrl from the response and normalize it
  • Open a WebSocket connection to that URL
  • Send a JSON command (Browser.getVersion) to the remote endpoint

This is an SSRF/outbound network interaction primitive (HTTP + WebSocket) that can be used for internal network reachability probing and for interacting with arbitrary WebSocket services.

Relevant code:

const wsUrl = await getChromeWebSocketUrl(cdpUrl, timeoutMs);
...
return await canRunCdpHealthCommand(wsUrl, handshakeTimeoutMs);

Source of attacker control: profiles can be created with arbitrary remote cdpUrl via the browser control API (validated only as http/https), e.g. POST /profiles/create passes user-provided cdpUrl into config/state and later into isChromeCdpReady.

Note: The browser control server typically binds to 127.0.0.1 and is protected by auth middleware, which reduces exposure in default deployments; however, in any scenario where an untrusted party can reach these endpoints (or a trusted token is compromised), this becomes a practical SSRF vector.

Recommendation

Treat cdpUrl (and the derived wsUrl) as SSRF-sensitive inputs.

Mitigations (choose based on intended product behavior):

  1. Default-deny remote CDP: only allow loopback cdpUrl unless an explicit config flag/allowlist enables remote CDP.
  2. Enforce an allowlist / SSRF policy for CDP endpoints:
    • Block private/special-use IPs unless explicitly allowed
    • Allow only specific hostnames/domains
  3. Validate that the websocket URL is same-origin with the CDP base URL (same hostname/port) before connecting, and reject mismatches.
  4. Restrict WebSocket schemes to ws:/wss: only.

Example (sketch):

const cdp = new URL(cdpUrl);
const ws = new URL(wsUrl);

if (!['ws:', 'wss:'].includes(ws.protocol)) throw new Error('invalid ws protocol');
if (ws.hostname !== cdp.hostname) throw new Error('wsUrl host mismatch');// optionally apply infra/net/ssrf policy checks here

This reduces the ability for attacker-controlled CDP endpoints (or injected responses) to redirect the probe to arbitrary internal/external websocket services.


2. 🔵 Unbounded WebSocket message parsing in CDP health check can cause memory/CPU DoS

Property Value
Severity Low
CWE CWE-400
Location src/browser/chrome.ts:137-145

Description

The new CDP readiness probe (canRunCdpHealthCommand) parses the first matching websocket message by converting raw frames to a string and calling JSON.parse without enforcing any message size limit.

Impact:

  • A malicious or compromised CDP websocket endpoint (remote profile, SSRF’d endpoint, etc.) can send an extremely large frame.
  • rawDataToString() will eagerly allocate memory (e.g., Buffer.concat(data) for Buffer[], then .toString()), and JSON.parse() can further amplify CPU/memory usage.
  • openCdpWebSocket() does not set the ws client's maxPayload option, so the effective limit is whatever the library default is (commonly very large), enabling resource-exhaustion.

Vulnerable code (new path):

parsed = JSON.parse(rawDataToString(raw))

Supporting references:

  • rawDataToString() concatenates buffer arrays with no length cap.
  • openCdpWebSocket() sets handshakeTimeout but no maxPayload.

Recommendation

Enforce a strict upper bound on CDP message sizes before converting to string / parsing, and configure the websocket client to reject oversized frames.

Recommended changes:

  1. Set a conservative maxPayload on the ws client (CDP responses for Browser.getVersion should be tiny):
// in openCdpWebSocket
return new WebSocket(wsUrl, {
  handshakeTimeout: handshakeTimeoutMs,
  maxPayload: 1024 * 1024, // 1 MiB (tune as needed)
  ...(Object.keys(headers).length ? { headers } : {}),
  ...(agent ? { agent } : {}),
});
  1. Add a defensive size check in canRunCdpHealthCommand prior to parsing (handle string, Buffer, Buffer[], ArrayBuffer):
const MAX_CDP_HEALTH_MSG_BYTES = 1024 * 1024;

function rawSize(raw: WebSocket.RawData): number {
  if (typeof raw === "string") return Buffer.byteLength(raw, "utf8");
  if (Buffer.isBuffer(raw)) return raw.length;
  if (Array.isArray(raw)) return raw.reduce((n, b) => n + b.length, 0);
  if (raw instanceof ArrayBuffer) return raw.byteLength;
  return Buffer.byteLength(String(raw), "utf8");
}

if (rawSize(raw) > MAX_CDP_HEALTH_MSG_BYTES) {
  finish(false);
  return;
}

This prevents large allocations in Buffer.concat(...).toString() and reduces the risk of CPU/memory exhaustion from JSON.parse() on attacker-controlled payloads.


Analyzed PR: #31421 at commit bc3f4b8

Last updated on: 2026-03-02T23:27:18Z

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR fixes a false-positive health status bug where cdpReady was reported as true whenever the CDP WebSocket handshake succeeded, even if the command channel was stale. The fix replaces the "socket open" probe with a real Browser.getVersion CDP command round-trip inside the renamed canRunCdpHealthCommand function, ensuring that only a live, responsive command channel is reported as healthy.

Key changes:

  • canOpenWebSocketcanRunCdpHealthCommand: after a successful WS open, sends { id: 1, method: "Browser.getVersion" } and waits for a response with id === 1 and a non-null result object before resolving true.
  • A settled / finish guard pattern prevents the Promise from being resolved more than once across the open, message, error, close events and the outer safety timeout.
  • isChromeCdpReady gains a third parameter handshakeTimeoutMs (default CHROME_WS_READY_TIMEOUT_MS) giving callers fine-grained control over the command-response window independently of the HTTP reachability timeout.
  • Two new integration tests spin up real HTTP+WS servers to verify the happy path (responds to Browser.getVersiontrue) and the stale path (WS opens, never responds → false after timeout).
  • Minor: the persistent ws.on("message", …) listener is never explicitly removed inside finish. While the settled guard and socket teardown make this harmless, removing the listener in finish would be slightly cleaner.

Confidence Score: 4/5

  • This PR is safe to merge — the logic is correct, backward-compatible, and well-tested; one minor listener-cleanup style issue noted.
  • The core settled/finish guard correctly prevents double-resolution across all WebSocket event paths (open, message, error, close, timeout). The Browser.getVersion response check properly handles CDP error replies (missing result) and valid replies (non-null object result). The two new tests cover the happy path and the stale-channel path with a real local server. The change is backward-compatible (new handshakeTimeoutMs param has a default). The only minor concern is that the ws.on("message", …) listener is never explicitly removed in finish, but the settled guard and socket teardown make this benign and it is noted as a style point.
  • No files require special attention beyond the minor listener-cleanup style note in src/browser/chrome.ts.

Last reviewed commit: e1d7123

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@vincentkoc
Copy link
Member Author

Addressed the ws-listener hygiene point.

Change made in src/browser/chrome.ts:

  • message handler is now named (onMessage) and explicitly detached in finish() via ws.off("message", onMessage) before close.
  • Behavior is unchanged; this is cleanup hygiene to avoid lingering listener windows during teardown.

Validation:

  • pnpm exec oxfmt --check src/browser/chrome.ts
  • pnpm vitest run src/browser/chrome.test.ts

@vincentkoc vincentkoc merged commit 8f3eb0f into openclaw:main Mar 2, 2026
11 of 13 checks passed
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc3f4b890d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 171 to 177
} catch {
// ignore
}
resolve(false);
finish(false);
},
Math.max(50, timeoutMs + 25),
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restart health timeout after CDP socket opens

The timeout window starts before the websocket handshake and is never reset on open, so once the connection is established the probe only has the leftover budget (often ~25ms when handshake is near handshakeTimeoutMs) to receive Browser.getVersion. In slow or high-latency environments this can mark a healthy CDP endpoint as unhealthy and trigger avoidable attach failures/restarts in isChromeCdpReady callers, even though command execution would succeed with a normal post-open timeout window.

Useful? React with 👍 / 👎.

dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
* fix(browser): validate cdp command channel health

* test(browser): cover stale cdp command channel readiness

* changelog: note cdp command-channel readiness check

* browser(cdp): detach ws message listener on health-probe cleanup
OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026
* fix(browser): validate cdp command channel health

* test(browser): cover stale cdp command channel readiness

* changelog: note cdp command-channel readiness check

* browser(cdp): detach ws message listener on health-probe cleanup
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
* fix(browser): validate cdp command channel health

* test(browser): cover stale cdp command channel readiness

* changelog: note cdp command-channel readiness check

* browser(cdp): detach ws message listener on health-probe cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Browser CDP connection silently dies after idle period — gateway reports cdpReady but act/snapshot times out

1 participant