Skip to content

Headed browser "keeps crashing": HTTP-unresponsive daemon → orphaned Chromium → SingletonLock → restart crash-loop, and silent headless respawns #1781

@tonyjzhou

Description

@tonyjzhou

Headed browser "keeps crashing": HTTP-unresponsive daemon → orphaned Chromium → SingletonLock → restart crash-loop, and silent headless respawns

Summary

When driving the headed browser (/open-gstack-browser / $B connect) against a heavy page, the CLI repeatedly reports Server crashed twice in a row — aborting, the user loses the visible window, and reconnects fail. Investigation shows Chromium never actually crashes (zero macOS .ips crash reports during the session). The real failure is a cascade of three independent defects in the daemon lifecycle + restart path.

Environment

  • gstack 1.51.0.0 (browse dist .version 19770ea)
  • macOS (Darwin arm64), bun 1.3.14
  • Chromium: Google Chrome for Testing 145.0.7632.6 (ms-playwright chromium-1208)
  • Reproduced on a Lever apply page (jobs.lever.co/.../apply) that fires POST .../li/track LinkedIn beacons every ~2s, plus the sidebar extension loaded.

Root causes (with evidence)

1. The crash detector can't distinguish "busy" from "dead"

sendCommand treats any ECONNREFUSED/ECONNRESET/fetch failed (and the 30s AbortSignal.timeout) as a crash and triggers a restart:

  • browse/src/cli.ts:486-509 — connection error ⇒ killServer(old)startServer() ⇒ retry; second failure ⇒ throw 'Server crashed twice in a row — aborting' (cli.ts:494).

Under the beacon load + extension renderers, the single-threaded bun daemon briefly stops answering HTTP. It is alive but busy, yet the CLI kills and restarts it. Direct evidence: a later $B disconnect reported Disconnected (server was unresponsive — force cleaned) while 10 Chromium processes were still alive (orphaned children of the daemon PID).

2. The restart path never clears the Chromium profile lock

startServer (browse/src/cli.ts:215-289) unlinks stateFile and browse-startup-error.log, but does not remove chromium-profile/Singleton{Lock,Socket,Cookie}. That cleanup only exists in the manual /open-gstack-browser Step 0 preamble, not in the in-CLI auto-restart path.

Sequence: daemon killed → its Chromium not fully exited → new Chromium can't acquire the --user-data-dir SingletonLock → launch fails → "crashed" → retry → still locked → fails → aborting. The crash-loop is self-inflicted by the restart, not by Chromium.

3. Auto-restarts silently come back headless (invisible)

Headed mode is only set when the invocation carries it:

  • cli.ts:503-508 (restart env) re-applies BROWSE_HEADED=1 only if _globalFlags?.headed is true.
  • connect sets BROWSE_HEADED=1 + BROWSE_PARENT_PID=0; the server gates the watchdog on BROWSE_HEADED (browse/src/server.ts:675-712).

So a restart triggered by a plain command (goto, status, fill — no --headed flag) brings the daemon back in headless mode. The user's window is gone and never returns, even though status reports healthy. Observed: after the failure, status showed Mode: launched / about:blank, and $B focus returned focus requires headed mode.

Secondary: headed window placement

The headed app launches as "Google Chrome for Testing" and macOS often opens it on a different Space / behind the active window. $B focus activates the app but does not reliably pull it to the user's current Space. This is the first thing users perceive as "I can't see the browser" — before any actual daemon issue.

What is NOT the cause (ruled out)

  • Not a Chromium segfault — no ~/Library/Logs/DiagnosticReports/*.ips during the session window.
  • Not the parent-PID watchdog / shell-session reaping — process tree shows the daemon as PPID=1, PGID=self, STAT=Ss (proper setsid detachment). A survival test kept a headed daemon alive and responsive for 50s idle and through a full goto + 107KB upload cycle.
  • Not idle timeout — headed mode early-returns from the idle check.

Reproduction

  1. $B connect (headed).
  2. $B goto <a page with frequent background XHR/beacons + extension active>.
  3. Run a burst of commands (snapshot/fill/upload). During a brief unresponsive window, run any $B command.
  4. Observe: Server crashed twice in a row — aborting; subsequent status shows headless Mode: launched; orphaned Chromium processes persist holding the profile lock.

Proposed fixes

  1. Distinguish busy from dead (cli.ts sendCommand): before declaring a crash on timeout, probe /health with a short retry/backoff; only restart if the process is truly gone (isProcessAlive(state.pid) false) or health fails N consecutive times. Consider raising the 30s timeout for known-chatty pages.
  2. Clean the profile lock + await old-Chromium exit on restart (startServer): unlink chromium-profile/Singleton{Lock,Socket,Cookie} and wait for the previous Chromium PIDs to exit before relaunch — mirror the /open-gstack-browser Step 0 cleanup in the automatic path.
  3. Persist and re-apply headed mode (cli.ts + state file): store headed/mode in browse.json and have startServer's restart path re-apply BROWSE_HEADED=1 from saved state, not just from _globalFlags.headed. A restart must never silently downgrade a headed session to headless.
  4. Don't leak Chromium on force-clean disconnect: when disconnect force-cleans an unresponsive server, also kill the daemon's Chromium child tree (by PPID) and clear the lock, so the next connect starts clean.
  5. (Nice-to-have) Surface the window reliably: on connect/focus, raise "Google Chrome for Testing" to the active macOS Space (AppleScript set frontmost), and/or print a hint to check Mission Control.

Impact

Any headed session against a moderately active page (analytics beacons, polling, live extensions) can tip the daemon into a self-inflicted crash-loop, leak Chromium processes, and silently drop to an invisible headless browser — making interactive flows (form fills, logged-in QA) unreliable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions