Skip to content

Chrome CDP launch fails despite diagnostic confirming readiness (isChromeReachable vs diagnoseChromeCdp disagreement) #82904

@kmanan

Description

@kmanan

Summary

launchOpenClawChrome throws Failed to start Chrome CDP on port <port> for profile "<name>" while its own attached diagnostic confirms Chrome is reachable. The error message literally contains CDP diagnostic: ready after 67ms; cdp=http://127.0.0.1:18800; websocket=...; browser=Chrome/146... — i.e. the failure path's diagnostic says the system is healthy.

Reproduces intermittently on cold-start. The next call to the browser plugin succeeds immediately without restarting Chrome.

Environment

  • OpenClaw 2026.5.12 (f066dd2)
  • macOS 26.0.0
  • Chrome 146.0.7680.165
  • Profile: managed openclaw profile (not host attach)

Reproduction

  1. Ensure no Chrome instance is running on port 18800.
  2. Run any browser command via the gateway, e.g. openclaw browser --timeout 25000 navigate "about:blank".
  3. The first invocation after a cold start (or after a recent gateway restart) intermittently fails with the error above.
  4. Re-running the exact same command within 1-2 seconds succeeds.

I hit this 5 times over 5 weeks (Apr 14, Apr 18, Apr 22, Apr 25, May 16), always on profile openclaw, always with the diagnostic reporting reachable. Reproduced again interactively just now while diagnosing it.

Root cause

In dist/chrome-CP2x5dZ8.js (built from whatever the source file is — likely src/browser/chrome.ts), launchOpenClawChrome uses two different readiness criteria for the same Chrome instance:

Function What it checks
isChromeReachable (line 1291) HTTP /json/version only, with a 500ms AbortController timeout. Silently catches every error → returns false.
diagnoseChromeCdp (line 877) HTTP /json/version + WebSocket handshake + Browser.getVersion round-trip, with detailed failure codes.

The launch path uses the weaker probe to decide success/failure, but the stronger probe to format the diagnostic:

// line 1407-1430 (built JS)
const readyDeadline = Date.now() + (resolved.localLaunchTimeoutMs ?? CHROME_LAUNCH_READY_WINDOW_MS);
while (Date.now() < readyDeadline) {
    if (await isChromeReachable(profile.cdpUrl)) break;
    await new Promise((r) => setTimeout(r, 200));
}
if (!await isChromeReachable(profile.cdpUrl)) {       // <-- weak check decides failure
    const diagnosticText = await diagnoseChromeCdp(profile.cdpUrl)... // <-- strong check confirms ready
    ...
    throw new Error(`Failed to start Chrome CDP on port ${profile.cdpPort} for profile "${profile.name}". ${diagnosticText}${launchHints}${stderrHint}`);
}

When isChromeReachable returns false on its last call near the deadline (transient HTTP probe error, AbortController firing, or just narrow timing margin), the code throws. The diagnostic, fired immediately after, succeeds in 60-200ms because Chrome is in fact ready. The user sees a self-contradicting error and the launched Chrome process is then SIGKILLed (line 1428), so the next call has to launch all over again.

Proposed fix

Use the same probe for both the decision and the diagnostic. isChromeCdpReady (defined at line 1322) already wraps diagnoseChromeCdp and returns its .ok — or capture the diagnostic once and use both its .ok and its formatted text:

const finalDiagnostic = await diagnoseChromeCdp(profile.cdpUrl).catch((err) => ({
    ok: false,
    cdpUrl: profile.cdpUrl,
    code: "diagnostic_threw",
    message: safeChromeCdpErrorMessage(err),
    elapsedMs: 0,
}));
if (!finalDiagnostic.ok) {
    const diagnosticText = formatChromeCdpDiagnostic(finalDiagnostic);
    const stderrOutput = normalizeOptionalString(Buffer.concat(stderrChunks).toString("utf8")) ?? "";
    if (allowSingletonRecovery && CHROME_SINGLETON_IN_USE_PATTERN.test(stderrOutput) && clearStaleChromeSingletonLocks(userDataDir)) {
        log.warn(`Removed stale Chromium Singleton* locks for profile "${profile.name}" and retrying launch.`);
        await terminateChromeForRetry(proc, userDataDir);
        return await launchOnceAndWait(false);
    }
    const stderrHint = stderrOutput ? `\nChrome stderr:\n${stderrOutput.slice(0, CHROME_STDERR_HINT_MAX_CHARS)}` : "";
    const launchHints = chromeLaunchHints({ stderrOutput, resolved, profile, launchOptions });
    try { proc.kill("SIGKILL"); } catch {}
    throw new Error(`Failed to start Chrome CDP on port ${profile.cdpPort} for profile "${profile.name}". ${diagnosticText}${launchHints}${stderrHint}`);
}
// Chrome is ready — proceed.

This:

  • Eliminates the contradiction (one probe makes the call).
  • Uses the stronger probe (HTTP + WS + Browser.getVersion), so we don't proceed if WS is broken even though /json/version answers.
  • Avoids the redundant second probe (we now call diagnose exactly once instead of isChromeReachable then diagnoseChromeCdp).

Suggested branch: fix/chrome-cdp-launch-readiness-uses-diagnostic.

Workaround in user code

Until fixed, catching Failed to start Chrome CDP in stderr and retrying once after ~3s clears it. Confirmed: the second attempt always succeeds because Chrome's already up.

Why this matters

Any cron-driven browser automation that fires once per day or per scheduled window cannot tolerate this failure — there's no human to retry. Mine sends an outbox alert and skips the day's run. Five missed runs in five weeks is from this single bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions