Skip to content

fix(browser): wait for CDP readiness after start to avoid PortInUseError races (#21149)#29538

Merged
steipete merged 3 commits intoopenclaw:mainfrom
AaronWander:fix/21149-browser-start-waits-cdp
Mar 2, 2026
Merged

fix(browser): wait for CDP readiness after start to avoid PortInUseError races (#21149)#29538
steipete merged 3 commits intoopenclaw:mainfrom
AaronWander:fix/21149-browser-start-waits-cdp

Conversation

@AaronWander
Copy link
Contributor

Summary

  • Problem: browser(action="start")/browser(action="open") can return before Chrome’s CDP HTTP+WS is actually ready, so a follow-up snapshot/screenshot races and tries to launch again on the same port, triggering PortInUseError ([Bug]: browser start/open blocks snapshot/screenshot with PortInUseError on Linux (port 18800 conflict) #21149).
  • Why it matters: After the race, browser tool calls can stay broken for the rest of the gateway session (until restart), making browser automation unreliable.
  • What changed: After launching local Chrome, ensureBrowserAvailable() now polls briefly for CDP readiness before returning; on timeout it cleans up and fails with a clear error. Added a regression test.
  • What did NOT change (scope boundary): No changes to port selection, SSRF policy, extension driver behavior, or snapshot/screenshot output.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • browser(action="start") and browser(action="open") are less likely to cause PortInUseError races with subsequent snapshot/screenshot calls because start/open now waits for CDP readiness before returning.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: macOS (dev)
  • Runtime/container: Node 22+
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): N/A

Steps

  1. Start browser via browser(action="start") (or browser(action="open"))
  2. Immediately call browser(action="snapshot") / browser(action="screenshot")
  3. Observe that the follow-up no longer races into a second launch attempt on the same port

Expected

  • No PortInUseError race; follow-up calls work without requiring a gateway restart.

Actual

  • Previously could fail with PortInUseError: Port 18800 is already in use and remain broken until restart.

Evidence

  • Failing test/log before + passing after

Human Verification (required)

  • Verified scenarios: ran unit test + full repo pnpm check
  • Edge cases checked: CDP not ready immediately after launch (polling path)
  • What you did not verify: live Linux reproduction with real Chrome process timing / real gateway session

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Failure Recovery (if this breaks)

  • Revert this PR commit.
  • Known bad symptoms: browser start/open taking slightly longer, or start failing with “CDP websocket … not reachable after start”.

Risks and Mitigations

  • Risk: start/open waits slightly longer on slow machines.
    • Mitigation: bounded polling with a small max wait; only applies to the “we just launched Chrome” path.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e2bfddccc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// launch again on the same port. Poll briefly so browser(action="start"/"open") is stable.
const maxAttempts = 50;
for (let attempt = 0; attempt < maxAttempts; attempt++) {
if (await isReachable(1200)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound CDP post-launch wait by wall-clock time

The new post-launch loop can block browser(action="start"/"open") much longer than intended when Chrome never becomes CDP-ready: each iteration calls isReachable(1200), and for loopback profiles that translates to up to a 2s WS handshake timeout before the extra 100ms sleep, repeated 50 times. In cases where /json/version responds but the WS endpoint never opens, this path can stall tool calls for roughly 100s+ instead of failing quickly; consider enforcing a real elapsed-time deadline (or reducing per-attempt timeout) to keep startup failures bounded.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — good catch. The original post-launch loop was bounded by attempt count, but each attempt could block on isReachable(1200) (loopback WS handshake), so in the worst case it could stall browser(action="start"/"open") far longer than intended.
I updated the wait to be bounded by a wall-clock deadline (~8s) and reduced per-attempt timeouts, so startup failures fail fast while still avoiding the follow-up PortInUseError race. Added/adjusted the regression test to cover the post-launch readiness wait behavior.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

Added polling for Chrome CDP readiness after launch to prevent race conditions where follow-up browser calls (snapshot/screenshot) attempt to launch Chrome again on an already-in-use port, causing PortInUseError.

  • Introduced waitForCdpReadyAfterLaunch() that polls isReachable() up to 50 times with 100ms intervals
  • On timeout, properly cleans up by stopping Chrome and clearing running state before throwing a clear error
  • Added regression test that verifies CDP readiness polling works correctly

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix directly addresses a documented race condition with a straightforward polling approach. Error handling includes proper cleanup (stopping Chrome, clearing state) if readiness check times out. The bounded retry loop (50 attempts × 100ms) prevents hanging. The regression test validates the happy path. No changes to port selection, SSRF policy, or other critical browser infrastructure.
  • No files require special attention

Last reviewed commit: 9e2bfdd

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b2857199f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const launched = await launchOpenClawChrome(current.resolved, profile);
attachRunning(launched);
try {
await waitForCdpReadyAfterLaunch();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile start/open request timeout with extra CDP wait

ensureBrowserAvailable() now always waits for waitForCdpReadyAfterLaunch() after spawning Chrome, which can add up to 8s (plus cleanup) on top of launchOpenClawChrome()'s existing 15s startup wait in src/browser/chrome.ts, but the client-side browserStart/browserOpenTab calls still abort at 15s in src/browser/client.ts. In slow-start scenarios (for example, /json/version becomes reachable near the end of the 15s launch window and WebSocket readiness trails by a few seconds), the request will time out and surface a transport error even though the browser process may become ready shortly after, which is a user-visible regression for browser(action="start"|"open").

Useful? React with 👍 / 👎.

@steipete steipete force-pushed the fix/21149-browser-start-waits-cdp branch from 3b28571 to d12dfe0 Compare March 2, 2026 13:53
@steipete steipete merged commit 99ee26d into openclaw:main Mar 2, 2026
@steipete
Copy link
Contributor

steipete commented Mar 2, 2026

Landed via temp rebase onto main.

  • Gate:
    • pnpm check (fails on pre-existing TS errors in src/process/exec.windows.test.ts and src/slack/monitor/events/messages.ts)
    • pnpm test -- src/browser/server-context.ensure-browser-available.waits-for-cdp-ready.test.ts src/browser/server-context.remote-tab-ops.test.ts
  • Land commit: d12dfe0
  • Merge commit: 99ee26d

Thanks @AaronWander!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d12dfe0ddb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const launched = await launchOpenClawChrome(current.resolved, profile);
attachRunning(launched);
try {
await waitForCdpReadyAfterLaunch();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Increase browser start/open request timeout for extra CDP wait

After launchOpenClawChrome() returns, this new waitForCdpReadyAfterLaunch() adds up to ~8s of extra blocking, but the local browser clients still time out at 15s (browserStart/browserOpenTab in src/browser/client.ts), while launchOpenClawChrome() itself already spends up to 15s waiting for /json/version (src/browser/chrome.ts). In slow-start cases where HTTP comes up near the end of that 15s window and WS readiness lags slightly, browser(action="start"|"open") now fails with a client timeout even though Chrome becomes ready moments later.

Useful? React with 👍 / 👎.

hanqizheng pushed a commit to hanqizheng/openclaw that referenced this pull request Mar 2, 2026
Linux2010 pushed a commit to Linux2010/openclaw that referenced this pull request Mar 2, 2026
execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026
@AaronWander AaronWander deleted the fix/21149-browser-start-waits-cdp branch March 3, 2026 00:36
dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
OWALabuy pushed a commit to kcinzgg/openclaw that referenced this pull request Mar 4, 2026
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: browser start/open blocks snapshot/screenshot with PortInUseError on Linux (port 18800 conflict)

2 participants