Chrome CDP launch fails despite diagnostic confirming readiness (isChromeReachable vs diagnoseChromeCdp disagreement)

## Summary

`launchOpenClawChrome` throws `Failed to start Chrome CDP on port <port> for profile "<name>"` while its own attached diagnostic confirms Chrome is reachable. The error message literally contains `CDP diagnostic: ready after 67ms; cdp=http://127.0.0.1:18800; websocket=...; browser=Chrome/146...` — i.e. the failure path's diagnostic says the system is healthy.

Reproduces intermittently on cold-start. The next call to the browser plugin succeeds immediately without restarting Chrome.

## Environment

- OpenClaw 2026.5.12 (f066dd2)
- macOS 26.0.0
- Chrome 146.0.7680.165
- Profile: managed `openclaw` profile (not host attach)

## Reproduction

1. Ensure no Chrome instance is running on port 18800.
2. Run any browser command via the gateway, e.g. `openclaw browser --timeout 25000 navigate "about:blank"`.
3. The first invocation after a cold start (or after a recent gateway restart) intermittently fails with the error above.
4. Re-running the exact same command within 1-2 seconds succeeds.

I hit this 5 times over 5 weeks (Apr 14, Apr 18, Apr 22, Apr 25, May 16), always on profile `openclaw`, always with the diagnostic reporting reachable. Reproduced again interactively just now while diagnosing it.

## Root cause

In `dist/chrome-CP2x5dZ8.js` (built from whatever the source file is — likely `src/browser/chrome.ts`), `launchOpenClawChrome` uses **two different readiness criteria** for the same Chrome instance:

| Function | What it checks |
|---|---|
| `isChromeReachable` (line 1291) | HTTP `/json/version` only, with a 500ms `AbortController` timeout. Silently catches every error → returns `false`. |
| `diagnoseChromeCdp` (line 877) | HTTP `/json/version` + WebSocket handshake + `Browser.getVersion` round-trip, with detailed failure codes. |

The launch path uses the weaker probe to decide success/failure, but the stronger probe to format the diagnostic:

```js
// line 1407-1430 (built JS)
const readyDeadline = Date.now() + (resolved.localLaunchTimeoutMs ?? CHROME_LAUNCH_READY_WINDOW_MS);
while (Date.now() < readyDeadline) {
    if (await isChromeReachable(profile.cdpUrl)) break;
    await new Promise((r) => setTimeout(r, 200));
}
if (!await isChromeReachable(profile.cdpUrl)) {       // <-- weak check decides failure
    const diagnosticText = await diagnoseChromeCdp(profile.cdpUrl)... // <-- strong check confirms ready
    ...
    throw new Error(`Failed to start Chrome CDP on port ${profile.cdpPort} for profile "${profile.name}". ${diagnosticText}${launchHints}${stderrHint}`);
}
```

When `isChromeReachable` returns false on its last call near the deadline (transient HTTP probe error, AbortController firing, or just narrow timing margin), the code throws. The diagnostic, fired immediately after, succeeds in 60-200ms because Chrome is in fact ready. The user sees a self-contradicting error and the launched Chrome process is then SIGKILLed (line 1428), so the next call has to launch all over again.

## Proposed fix

Use the same probe for both the decision and the diagnostic. `isChromeCdpReady` (defined at line 1322) already wraps `diagnoseChromeCdp` and returns its `.ok` — or capture the diagnostic once and use both its `.ok` and its formatted text:

```js
const finalDiagnostic = await diagnoseChromeCdp(profile.cdpUrl).catch((err) => ({
    ok: false,
    cdpUrl: profile.cdpUrl,
    code: "diagnostic_threw",
    message: safeChromeCdpErrorMessage(err),
    elapsedMs: 0,
}));
if (!finalDiagnostic.ok) {
    const diagnosticText = formatChromeCdpDiagnostic(finalDiagnostic);
    const stderrOutput = normalizeOptionalString(Buffer.concat(stderrChunks).toString("utf8")) ?? "";
    if (allowSingletonRecovery && CHROME_SINGLETON_IN_USE_PATTERN.test(stderrOutput) && clearStaleChromeSingletonLocks(userDataDir)) {
        log.warn(`Removed stale Chromium Singleton* locks for profile "${profile.name}" and retrying launch.`);
        await terminateChromeForRetry(proc, userDataDir);
        return await launchOnceAndWait(false);
    }
    const stderrHint = stderrOutput ? `\nChrome stderr:\n${stderrOutput.slice(0, CHROME_STDERR_HINT_MAX_CHARS)}` : "";
    const launchHints = chromeLaunchHints({ stderrOutput, resolved, profile, launchOptions });
    try { proc.kill("SIGKILL"); } catch {}
    throw new Error(`Failed to start Chrome CDP on port ${profile.cdpPort} for profile "${profile.name}". ${diagnosticText}${launchHints}${stderrHint}`);
}
// Chrome is ready — proceed.
```

This:
- Eliminates the contradiction (one probe makes the call).
- Uses the stronger probe (HTTP + WS + Browser.getVersion), so we don't proceed if WS is broken even though `/json/version` answers.
- Avoids the redundant second probe (we now call diagnose exactly once instead of `isChromeReachable` then `diagnoseChromeCdp`).

Suggested branch: `fix/chrome-cdp-launch-readiness-uses-diagnostic`.

## Workaround in user code

Until fixed, catching `Failed to start Chrome CDP` in stderr and retrying once after ~3s clears it. Confirmed: the second attempt always succeeds because Chrome's already up.

## Why this matters

Any cron-driven browser automation that fires once per day or per scheduled window cannot tolerate this failure — there's no human to retry. Mine sends an outbox alert and skips the day's run. Five missed runs in five weeks is from this single bug.

Function	What it checks
`isChromeReachable` (line 1291)	HTTP `/json/version` only, with a 500ms `AbortController` timeout. Silently catches every error → returns `false`.
`diagnoseChromeCdp` (line 877)	HTTP `/json/version` + WebSocket handshake + `Browser.getVersion` round-trip, with detailed failure codes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chrome CDP launch fails despite diagnostic confirming readiness (isChromeReachable vs diagnoseChromeCdp disagreement) #82904

Summary

Environment

Reproduction

Root cause

Proposed fix

Workaround in user code

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Chrome CDP launch fails despite diagnostic confirming readiness (isChromeReachable vs diagnoseChromeCdp disagreement) #82904

Description

Summary

Environment

Reproduction

Root cause

Proposed fix

Workaround in user code

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions