Headed browser "keeps crashing": HTTP-unresponsive daemon → orphaned Chromium → SingletonLock → restart crash-loop, and silent headless respawns
Summary
When driving the headed browser (/open-gstack-browser / $B connect) against a heavy page, the CLI repeatedly reports Server crashed twice in a row — aborting, the user loses the visible window, and reconnects fail. Investigation shows Chromium never actually crashes (zero macOS .ips crash reports during the session). The real failure is a cascade of three independent defects in the daemon lifecycle + restart path.
Environment
- gstack
1.51.0.0 (browse dist .version 19770ea)
- macOS (Darwin arm64), bun 1.3.14
- Chromium: Google Chrome for Testing 145.0.7632.6 (ms-playwright chromium-1208)
- Reproduced on a Lever apply page (
jobs.lever.co/.../apply) that fires POST .../li/track LinkedIn beacons every ~2s, plus the sidebar extension loaded.
Root causes (with evidence)
1. The crash detector can't distinguish "busy" from "dead"
sendCommand treats any ECONNREFUSED/ECONNRESET/fetch failed (and the 30s AbortSignal.timeout) as a crash and triggers a restart:
browse/src/cli.ts:486-509 — connection error ⇒ killServer(old) ⇒ startServer() ⇒ retry; second failure ⇒ throw 'Server crashed twice in a row — aborting' (cli.ts:494).
Under the beacon load + extension renderers, the single-threaded bun daemon briefly stops answering HTTP. It is alive but busy, yet the CLI kills and restarts it. Direct evidence: a later $B disconnect reported Disconnected (server was unresponsive — force cleaned) while 10 Chromium processes were still alive (orphaned children of the daemon PID).
2. The restart path never clears the Chromium profile lock
startServer (browse/src/cli.ts:215-289) unlinks stateFile and browse-startup-error.log, but does not remove chromium-profile/Singleton{Lock,Socket,Cookie}. That cleanup only exists in the manual /open-gstack-browser Step 0 preamble, not in the in-CLI auto-restart path.
Sequence: daemon killed → its Chromium not fully exited → new Chromium can't acquire the --user-data-dir SingletonLock → launch fails → "crashed" → retry → still locked → fails → aborting. The crash-loop is self-inflicted by the restart, not by Chromium.
3. Auto-restarts silently come back headless (invisible)
Headed mode is only set when the invocation carries it:
cli.ts:503-508 (restart env) re-applies BROWSE_HEADED=1 only if _globalFlags?.headed is true.
connect sets BROWSE_HEADED=1 + BROWSE_PARENT_PID=0; the server gates the watchdog on BROWSE_HEADED (browse/src/server.ts:675-712).
So a restart triggered by a plain command (goto, status, fill — no --headed flag) brings the daemon back in headless mode. The user's window is gone and never returns, even though status reports healthy. Observed: after the failure, status showed Mode: launched / about:blank, and $B focus returned focus requires headed mode.
Secondary: headed window placement
The headed app launches as "Google Chrome for Testing" and macOS often opens it on a different Space / behind the active window. $B focus activates the app but does not reliably pull it to the user's current Space. This is the first thing users perceive as "I can't see the browser" — before any actual daemon issue.
What is NOT the cause (ruled out)
- Not a Chromium segfault — no
~/Library/Logs/DiagnosticReports/*.ips during the session window.
- Not the parent-PID watchdog / shell-session reaping — process tree shows the daemon as
PPID=1, PGID=self, STAT=Ss (proper setsid detachment). A survival test kept a headed daemon alive and responsive for 50s idle and through a full goto + 107KB upload cycle.
- Not idle timeout — headed mode early-returns from the idle check.
Reproduction
$B connect (headed).
$B goto <a page with frequent background XHR/beacons + extension active>.
- Run a burst of commands (snapshot/fill/upload). During a brief unresponsive window, run any
$B command.
- Observe:
Server crashed twice in a row — aborting; subsequent status shows headless Mode: launched; orphaned Chromium processes persist holding the profile lock.
Proposed fixes
- Distinguish busy from dead (
cli.ts sendCommand): before declaring a crash on timeout, probe /health with a short retry/backoff; only restart if the process is truly gone (isProcessAlive(state.pid) false) or health fails N consecutive times. Consider raising the 30s timeout for known-chatty pages.
- Clean the profile lock + await old-Chromium exit on restart (
startServer): unlink chromium-profile/Singleton{Lock,Socket,Cookie} and wait for the previous Chromium PIDs to exit before relaunch — mirror the /open-gstack-browser Step 0 cleanup in the automatic path.
- Persist and re-apply headed mode (
cli.ts + state file): store headed/mode in browse.json and have startServer's restart path re-apply BROWSE_HEADED=1 from saved state, not just from _globalFlags.headed. A restart must never silently downgrade a headed session to headless.
- Don't leak Chromium on force-clean disconnect: when
disconnect force-cleans an unresponsive server, also kill the daemon's Chromium child tree (by PPID) and clear the lock, so the next connect starts clean.
- (Nice-to-have) Surface the window reliably: on
connect/focus, raise "Google Chrome for Testing" to the active macOS Space (AppleScript set frontmost), and/or print a hint to check Mission Control.
Impact
Any headed session against a moderately active page (analytics beacons, polling, live extensions) can tip the daemon into a self-inflicted crash-loop, leak Chromium processes, and silently drop to an invisible headless browser — making interactive flows (form fills, logged-in QA) unreliable.
Headed browser "keeps crashing": HTTP-unresponsive daemon → orphaned Chromium → SingletonLock → restart crash-loop, and silent headless respawns
Summary
When driving the headed browser (
/open-gstack-browser/$B connect) against a heavy page, the CLI repeatedly reportsServer crashed twice in a row — aborting, the user loses the visible window, and reconnects fail. Investigation shows Chromium never actually crashes (zero macOS.ipscrash reports during the session). The real failure is a cascade of three independent defects in the daemon lifecycle + restart path.Environment
1.51.0.0(browse dist.version 19770ea)jobs.lever.co/.../apply) that firesPOST .../li/trackLinkedIn beacons every ~2s, plus the sidebar extension loaded.Root causes (with evidence)
1. The crash detector can't distinguish "busy" from "dead"
sendCommandtreats anyECONNREFUSED/ECONNRESET/fetch failed(and the 30sAbortSignal.timeout) as a crash and triggers a restart:browse/src/cli.ts:486-509— connection error ⇒killServer(old)⇒startServer()⇒ retry; second failure ⇒throw 'Server crashed twice in a row — aborting'(cli.ts:494).Under the beacon load + extension renderers, the single-threaded bun daemon briefly stops answering HTTP. It is alive but busy, yet the CLI kills and restarts it. Direct evidence: a later
$B disconnectreportedDisconnected (server was unresponsive — force cleaned)while 10 Chromium processes were still alive (orphaned children of the daemon PID).2. The restart path never clears the Chromium profile lock
startServer(browse/src/cli.ts:215-289) unlinksstateFileandbrowse-startup-error.log, but does not removechromium-profile/Singleton{Lock,Socket,Cookie}. That cleanup only exists in the manual/open-gstack-browserStep 0 preamble, not in the in-CLI auto-restart path.Sequence: daemon killed → its Chromium not fully exited → new Chromium can't acquire the
--user-data-dirSingletonLock → launch fails → "crashed" → retry → still locked → fails →aborting. The crash-loop is self-inflicted by the restart, not by Chromium.3. Auto-restarts silently come back headless (invisible)
Headed mode is only set when the invocation carries it:
cli.ts:503-508(restart env) re-appliesBROWSE_HEADED=1only if_globalFlags?.headedis true.connectsetsBROWSE_HEADED=1+BROWSE_PARENT_PID=0; the server gates the watchdog onBROWSE_HEADED(browse/src/server.ts:675-712).So a restart triggered by a plain command (
goto,status,fill— no--headedflag) brings the daemon back in headless mode. The user's window is gone and never returns, even thoughstatusreportshealthy. Observed: after the failure,statusshowedMode: launched/about:blank, and$B focusreturnedfocus requires headed mode.Secondary: headed window placement
The headed app launches as "Google Chrome for Testing" and macOS often opens it on a different Space / behind the active window.
$B focusactivates the app but does not reliably pull it to the user's current Space. This is the first thing users perceive as "I can't see the browser" — before any actual daemon issue.What is NOT the cause (ruled out)
~/Library/Logs/DiagnosticReports/*.ipsduring the session window.PPID=1, PGID=self, STAT=Ss(propersetsiddetachment). A survival test kept a headed daemon alive and responsive for 50s idle and through a fullgoto+ 107KBuploadcycle.Reproduction
$B connect(headed).$B goto <a page with frequent background XHR/beacons + extension active>.$Bcommand.Server crashed twice in a row — aborting; subsequentstatusshows headlessMode: launched; orphaned Chromium processes persist holding the profile lock.Proposed fixes
cli.tssendCommand): before declaring a crash on timeout, probe/healthwith a short retry/backoff; only restart if the process is truly gone (isProcessAlive(state.pid)false) or health fails N consecutive times. Consider raising the 30s timeout for known-chatty pages.startServer): unlinkchromium-profile/Singleton{Lock,Socket,Cookie}and wait for the previous Chromium PIDs to exit before relaunch — mirror the/open-gstack-browserStep 0 cleanup in the automatic path.cli.ts+ state file): storeheaded/modeinbrowse.jsonand havestartServer's restart path re-applyBROWSE_HEADED=1from saved state, not just from_globalFlags.headed. A restart must never silently downgrade a headed session to headless.disconnectforce-cleans an unresponsive server, also kill the daemon's Chromium child tree (by PPID) and clear the lock, so the nextconnectstarts clean.connect/focus, raise "Google Chrome for Testing" to the active macOS Space (AppleScriptset frontmost), and/or print a hint to check Mission Control.Impact
Any headed session against a moderately active page (analytics beacons, polling, live extensions) can tip the daemon into a self-inflicted crash-loop, leak Chromium processes, and silently drop to an invisible headless browser — making interactive flows (form fills, logged-in QA) unreliable.