fix(desktop): v0.1.7 hotfix — restore session data & fix quit race#526
fix(desktop): v0.1.7 hotfix — restore session data & fix quit race#526
Conversation
Replace reserved close code 1008 (Policy Violation) with private code 4008 when closing WebSocket connections on error. Code 1008 is reserved for server use and Node.js 22's native WebSocket throws DOMException [InvalidAccessError] when clients attempt to use it, causing the controller process to crash on authentication failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Cherry-pick WebSocket close code fix from PR #365 - Change launchd namespace from com.nexu.* to io.nexu.* - Add progress tracking directory with STATUS, DECISIONS, ISSUES Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 1-3 of launchd architecture refactor: - LaunchdManager: wrapper for launchctl commands (install, start, stop, status, graceful shutdown) - PlistGenerator: generates launchd plist XML for Controller and OpenClaw services with proper env vars and dependencies - EmbeddedWebServer: serves static files and proxies API requests to Controller, replacing the web sidecar process Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- launchd-bootstrap.ts: complete bootstrap flow for launchd-based startup (install services, start controller, start openclaw, start embedded web server) - Feature flag NEXU_USE_LAUNCHD=1 for gradual rollout - Unified log directory at ~/.nexu/logs/ - Path resolution for packaged vs dev environments - Index file exporting all services Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- quit-handler.ts: handles before-quit event with dialog - Options: Quit Completely (stop services), Run in Background, Cancel - Graceful shutdown of launchd services - Exported via services/index.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
scripts/dev-launchd.sh provides: - start: generate plists, bootstrap and start services - stop: gracefully stop services - restart: stop then start - status: show launchd service status - logs: tail all log files Uses io.nexu.*.dev labels and ~/.nexu/logs/ for logging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add launchd service imports - Add runLaunchdColdStart function that uses bootstrapWithLaunchd - Check NEXU_USE_LAUNCHD=1 flag to choose bootstrap mode - Install launchd quit handler after successful launchd bootstrap - Modify before-quit handler to skip orchestrator cleanup in launchd mode - Derive openclaw paths from nexuHome config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- launchd-manager.test.ts: tests for LaunchdManager class and SERVICE_LABELS - plist-generator.test.ts: tests for generatePlist function Tests cover: - Platform check (darwin only) - Default and custom plist directories - UID-based domain construction - Dev vs prod label generation - Plist XML generation with correct structure - XML character escaping - Log path configuration - Service dependencies Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix OpenClaw config paths to match controller defaults in env.ts (OPENCLAW_STATE_DIR=~/.nexu/runtime/openclaw/state) - Add `gateway` subcommand to OpenClaw plist generation - Use OPENCLAW_CONFIG_PATH env var instead of --config argument - Add --auth none for dev mode to simplify local development - Update tests to verify OPENCLAW_CONFIG_PATH env var presence Tested with ./scripts/dev-launchd.sh - Controller and OpenClaw WebSocket connection verified working. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… workflow - Add "starting" RuntimeStatus: when OpenClaw gateway is unreachable but process is alive, show "启动中" instead of "已离线" - Parallelize launchd service install/start + web server (Promise.all) - Use adaptive readiness polling (50ms→250ms) instead of fixed 250ms - Fix dev-launchd.sh stop: use bootout directly instead of SIGTERM+bootout race with KeepAlive; use SIGKILL for Electron to bypass quit handler - Dev quit handler keeps services running (run-in-background) so vite HMR restarts don't kill launchd services - Add tool progress prompt to nexu-platform-bootstrap plugin - Disable humanDelay in config compiler - Cold start time reduced from ~5s to ~2s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Stop: wait for ports to free after bootout, SIGKILL orphans including chrome_crashpad_handler - Fix resolveLaunchdPaths for packaged mode: OpenClaw is at runtime/openclaw/node_modules/openclaw/openclaw.mjs, not runtime/openclaw-runtime/openclaw.mjs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tartup When the WebSocket to OpenClaw gateway isn't connected yet (during startup), channels were shown as "disconnected" (red). Now they show as "connecting" (yellow pulse) when the runtime is still starting, giving users a much less alarming startup experience. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit "booting" → "ready" lifecycle to ControllerRuntimeState. During boot, gateway-unreachable is always treated as "starting" (not "unhealthy"), regardless of whether the process manager owns the OpenClaw process (fixes launchd mode where processManager.isAlive() returns false). Channel live status also uses bootPhase to show "connecting" during startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The embedded web server serves static files from apps/web/dist, so code changes to the web app require a build step before starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bootPhase was set to "ready" immediately after wsClient.connect(), but the WS handshake hadn't completed yet. Health loop then saw gateway-unreachable + bootPhase=ready → "unhealthy" → UI showed "已离线" during startup. Now bootPhase transitions to "ready" inside the onConnected callback, so the entire startup shows "starting" → "active" cleanly. Also adds temporary debug logs to home.tsx for startup diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Channel error status now shows translated lastError (e.g. "会话已过期" instead of generic "错误") - Controller maps WeChat "not configured" + not running to "session expired" for better UX - Add i18n keys for common channel errors (session expired, not configured, disabled) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use warning color (orange) instead of danger (red) for known recoverable errors like session expired, with actionable label "请重新连接" / "Reconnect required". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The exponential backoff for OpenClaw WebSocket reconnection could reach 16s+ during startup, causing the UI to stay in "starting" state for 20+ seconds. Cap at 4s so retry sequence is 1→2→4→4→4s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…way up When the health loop detects the gateway HTTP endpoint becomes reachable, it calls wsClient.retryNow() to cancel the backoff timer and connect immediately. This eliminates the 4-16s gap between gateway ready and WS connected during startup. Also replaces the ugly "Starting local services..." loading screen with a minimal Nexu logo pulse animation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- main.tsx had a duplicate SurfaceFrame with old loading text; replaced with Nexu logo pulse animation matching surface-frame.tsx - dev-launchd.sh now checks for dist/index.html and rebuilds desktop if missing, preventing blank screen after accidental dist deletion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace plain loading screen with animated Nexu logo matching the design system prototype (NexuLoader.tsx). Four quadrants light up sequentially in brand colors: orange, green, pink, gold. Pure CSS animation, no framer-motion dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
main.tsx had its own SurfaceFrame copy with the old loading screen. Now imports from components/surface-frame.tsx so both Runtime Console and Desktop Shell views use the same 4-color Nexu loader. Background updated to dark radial gradient matching desktop theme. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loader now overlays on top of the webview instead of replacing it. The webview loads silently in the background while the Nexu logo animation plays. When the webview fires dom-ready, the loader disappears — no blank frames, no intermediate Loader2 spinner. Background uses warm radial gradient for polished appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The spinning circle was briefly visible between the Nexu splash loader and the actual UI. Replace with an empty div since the desktop splash overlay already covers the loading period. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Run openclawProcess.prepare(), ensureRuntimeModelPlugin(), and prepareDesktopCloudModelsForBootstrap() in parallel (were sequential) - Remove redundant compileCurrentConfig() call for preSeedConfigHash — doSync() already seeds the hash via noteConfigWritten() - Reduce WS initial backoff from 1000ms to 500ms (sequence: 500→1000→2000→4000→4000...) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Quit dialog now shows Chinese or English based on app.getLocale(). Chinese users see "完全退出 / 后台运行 / 取消". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Packaged macOS builds now use launchd mode by default (no env var needed). This enables the quit dialog, crash recovery, and background service support in production. Can be explicitly disabled with NEXU_USE_LAUNCHD=0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix path traversal vulnerability in embedded-web-server (sanitize URL pathname, reject paths outside webRoot) - Install launchd quit handler after try/catch so it works even if auth bootstrap fails - Add error handling to quitWithDecision (was missing try/catch) - Fix isAlive() handling undefined pid from failed spawn - Rename NODE_PATH to NODE_BIN in dev script to avoid Node.js env var conflict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- openclaw-config-writer: additional config writing logic - plist-generator: updated plist generation + tests - package.json: script updates for launchd dev workflow - openclaw-weixin accounts.ts: prior session changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On startup, detect already-running launchd services and reuse them instead of cold-starting. This enables instant resume after "Run in Background" (packaged) or dev restart. Attach flow: 1. Read runtime-ports.json for port metadata from previous session 2. Validate isDev mode and NEXU_HOME match 3. Extract env vars from running services via launchctl print 4. Probe controller /health and openclaw port 5. If all healthy, start embedded web server and attach Fallback: if attach fails (stale services, env mismatch, unhealthy), tear down and cold start as before. Also: - LaunchdManager.getServiceStatus() now parses environment variables - runtime-ports.json written on cold start, deleted on quit-completely - Port occupier detection kills rogue processes on openclaw port - index.ts overrides runtimeConfig with attached ports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
esbuild doesn't support typeof import() expressions. Use static
import { createConnection } from "node:net" instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Electron allocates a non-default port (e.g. 18790 because 18789 is occupied), the controller needs to know this port for both the config compiler and WS/health connections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node.js rmSync with recursive+force can fail with ENOTEMPTY on macOS. Fall back to execFileSync rm -rf which handles this reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node.js rmSync can silently fail on macOS (ENOTEMPTY race). Now uses rm -rf exclusively with existence check, and retries up to 3 times with 1s pause between attempts. This ensures first-launch sidecar extraction succeeds reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace binary attach-or-cold-start with unified per-service flow: - Each service independently checked: running+healthy → keep, else restart - Ports recovered from runtime-ports.json when any service is still running - NEXU_HOME validated to prevent cross-environment attach - Missing services started with correct recovered ports - Unhealthy running services torn down and restarted This enables partial attach: if only OpenClaw survived a crash, the next launch reuses its port and only cold-starts the controller. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getLogDir() now accepts nexuHome param so dev mode writes launchd service logs to .tmp/desktop/nexu-home/logs/ instead of ~/.nexu/logs/. Also updates AGENTS.md with correct directory layout for dev vs packaged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites desktop-startup-flow.md with full implementation details: - Directory layout for dev vs packaged (with tree diagrams) - Label isolation between dev (.dev) and packaged modes - Unified bootstrap flow (attach + cold start per-service) - Port architecture and auto-allocation - Attach mechanism (full, partial, fallback) - Status display timeline - File watch hot reload - Exit behavior - OpenClaw sidecar extraction - Complete key files reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts keeping our attach/isolation/hot-reload improvements while incorporating main's changes: - Brand name: "Nexu" → "nexu" (lowercase, from bb1ddcc) - UI polish from 2fb384c - surface-frame: white background from main's design update Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix web port fallback: increment port instead of port 0, record actual port in effectivePorts for runtime-ports.json - Fix quit handler: catch deleteRuntimePorts errors to prevent blocking quit - Fix dev script: initialize watcher PIDs before trap to avoid set -u errors - Fix docs: probe endpoint is /api/auth/get-session not /health Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two critical fixes for v0.1.7 hotfix: 1. State directory regression: v0.1.6 changed OPENCLAW_STATE_DIR from Electron userData (~/.../Application Support/@nexu/desktop/runtime/openclaw/state) to NEXU_HOME (~/.nexu/runtime/openclaw/state), silently losing all historical conversations. This restores userData as the canonical path for packaged builds and adds a one-time migration to merge any data created under ~/.nexu during the v0.1.6 window. 2. Quit race condition: bootoutService() sends an async unregister to launchd but doesn't wait for the process to exit. Relaunching immediately hits port conflicts / "open failed". Now calls stopServiceGracefully() (SIGTERM + wait + SIGKILL) before bootout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMode-dependent OpenClaw state path selection was introduced (dev vs packaged). Packaged mode may migrate legacy state from Changes
Sequence DiagramsequenceDiagram
participant App as Desktop App (Cold Start)
participant Mode as Mode Detector
participant Path as Path Resolver
participant Migration as State Migration
participant Plist as Plist Generator
participant Launchd as Launchd Bootstrap
App->>Mode: determine mode (dev / packaged)
Mode-->>Path: provide mode
Path->>Path: compute openclawStateDir (dev: nexuHome/... | packaged: userData/...)
alt packaged && legacy differs
Path->>Migration: migrate legacy state (source: nexuHome -> target: userData)
Migration-->>Path: migration complete
end
Path-->>App: final openclawStateDir
App->>Plist: include OPENCLAW_STATE_DIR in env
Plist-->>Launchd: generate plist with env
App->>Launchd: bootstrap services with state dir
Launchd-->>App: services started
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Deploying nexu-docs with
|
| Latest commit: |
0457102
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://432b4216.nexu-docs.pages.dev |
| Branch Preview URL: | https://refactor-launchd-process-arc.nexu-docs.pages.dev |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04571023d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!existsSync(sourceStateDir)) { | ||
| log(`source not found: ${sourceStateDir}, nothing to migrate`); | ||
| writeStamp(stampPath); |
There was a problem hiding this comment.
Create target state directory before writing migration stamp
When sourceStateDir is missing, this branch writes .v016-migration-done immediately, but targetStateDir is only created later. On fresh packaged installs (no legacy ~/.nexu/runtime/openclaw/state yet), writeFileSync throws ENOENT because .../runtime/openclaw/state does not exist, causing cold-start failure before launchd bootstrap completes. Ensure mkdirSync(targetStateDir, { recursive: true }) runs before any stamp write path.
Useful? React with 👍 / 👎.
| if (!existsSync(targetFile)) { | ||
| cpSync(sourceFile, targetFile); |
There was a problem hiding this comment.
Merge conflicting session files instead of skipping them
This migration only copies session files when the target file does not exist, so if the same session key exists in both locations (for example stable channel/session keys that map to the same *.jsonl filename), all conversation updates produced during v0.1.6 in sourceStateDir are silently dropped. The hotfix goal is to restore sessions, but this logic preserves the older target file and discards newer source history on filename collisions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/desktop/main/services/quit-handler.ts (1)
209-220: Consider extracting the shared shutdown logic.The stop-and-bootout loop is duplicated from
installLaunchdQuitHandler. Extracting to a helper function (e.g.,stopAndBootoutServices) would reduce duplication and simplify future changes.♻️ Proposed refactor
async function stopAndBootoutServices( launchd: LaunchdManager, labels: string[], ): Promise<void> { for (const label of labels) { try { await launchd.stopServiceGracefully(label, 5000); } catch { // May already be stopped } try { await launchd.bootoutService(label); } catch (err) { console.error(`Error booting out ${label}:`, err); } } }Then use it in both locations:
await stopAndBootoutServices(opts.launchd, [opts.labels.openclaw, opts.labels.controller]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/quit-handler.ts` around lines 209 - 220, Extract the duplicated stop-and-bootout loop into a helper async function (e.g., stopAndBootoutServices) that accepts a LaunchdManager and an array of labels, then call that helper from both installLaunchdQuitHandler and the current quit-handler code; the helper should iterate labels and for each call launchd.stopServiceGracefully(label, 5000) with a try/catch (silent on failure) and then call launchd.bootoutService(label) with a try/catch that logs the error (retain the existing console.error message). Update the two call sites to invoke stopAndBootoutServices(opts.launchd, [opts.labels.openclaw, opts.labels.controller]) to remove duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 209-220: Extract the duplicated stop-and-bootout loop into a
helper async function (e.g., stopAndBootoutServices) that accepts a
LaunchdManager and an array of labels, then call that helper from both
installLaunchdQuitHandler and the current quit-handler code; the helper should
iterate labels and for each call launchd.stopServiceGracefully(label, 5000) with
a try/catch (silent on failure) and then call launchd.bootoutService(label) with
a try/catch that logs the error (retain the existing console.error message).
Update the two call sites to invoke stopAndBootoutServices(opts.launchd,
[opts.labels.openclaw, opts.labels.controller]) to remove duplication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2dc332cf-bfbd-4d37-b66d-b50a00c484d6
📒 Files selected for processing (5)
apps/desktop/main/index.tsapps/desktop/main/services/plist-generator.tsapps/desktop/main/services/quit-handler.tsapps/desktop/main/services/state-migration.tsapps/desktop/package.json
- Fix quit flow: bootout first (unregister from launchd), then wait for process exit. Previous order (SIGTERM → bootout) caused launchd KeepAlive to respawn the process before bootout could unregister it. - Add LaunchdManager.waitForExit() — polls status after bootout, falls back to SIGKILL by PID if process persists beyond timeout. - Document packaged app directory layout in AGENTS.md: NEXU_HOME (~/.nexu) for user config, Electron userData for OpenClaw runtime state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/main/services/launchd-manager.ts`:
- Around line 251-256: The loop in waitForExit incorrectly treats a transient
getServiceStatus failure mapped to status "unknown" as a definite exit; update
waitForExit (and mirror the same fix in stopServiceGracefully) to distinguish
definite states ("stopped") from transient "unknown" by retrying on "unknown"
instead of immediately returning: keep polling until timeoutMs elapses, count
consecutive "unknown" responses (or allow retries for a short backoff window,
e.g., a few attempts/poll intervals) and only give up/return early if you
observe a definitive non-"running" state such as "stopped" or if the
unknown-count exceeds a small threshold, ensuring the existing SIGKILL/timeout
fallback still fires when appropriate; reference getServiceStatus, waitForExit,
stopServiceGracefully and the status values "running"/"stopped"/"unknown" when
making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b088a0a7-9000-40d7-bea2-d715404cfd56
📒 Files selected for processing (3)
AGENTS.mdapps/desktop/main/services/launchd-manager.tsapps/desktop/main/services/quit-handler.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/desktop/main/services/quit-handler.ts
After app.quit(), dangling handles (timers, sockets) can keep the Electron event loop alive indefinitely. Add a 3s safety net that calls process.exit(0) if the process hasn't exited by then. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
apps/desktop/main/services/quit-handler.ts (1)
138-151: Extract shared shutdown flow to a helper to prevent drift.The same bootout/wait/force-exit sequence now exists in two places; centralizing it will reduce divergence risk in future hotfixes.
Also applies to: 217-228, 234-239
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/quit-handler.ts` around lines 138 - 151, Extract the bootout/wait/ignore-failure sequence into a single helper (e.g., shutdownLaunchdService or bootoutAndAwaitExit) that accepts a launchd client and a service label, calls opts.launchd.bootoutService(label), awaits opts.launchd.waitForExit(label, 5000), catches and logs bootout errors and ignores wait errors (best-effort), and use that helper in place of the duplicated for-loops that currently call opts.launchd.bootoutService and opts.launchd.waitForExit (the blocks referencing opts.labels.openclaw, opts.labels.controller and the duplicated ranges noted) so both sites call the new helper instead of repeating the sequence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 146-150: The empty catch around opts.launchd.waitForExit(label,
5000) swallows real errors; update both occurrences (the try/catch blocks around
opts.launchd.waitForExit at the two spots) to catch the error and log it with
context (include label and timeout) using the existing logger (e.g.,
processLogger or opts.logger) or rethrow if it indicates a real failure, rather
than leaving the catch empty. Ensure the log message clearly states the
operation ("waitForExit" for label) and includes the error details to aid
debugging.
- Around line 216-217: The code currently calls app.quit() and schedules
process.exit(0) even when quitWithDecision(decision) is called with
"run-in-background"; update the control flow in quitWithDecision so only the
"quit-completely" branch performs app.quit() and schedules process.exit(0) (and
sends shutdown messages to opts.labels.openclaw / opts.labels.controller), while
the "run-in-background" branch must avoid calling app.quit() or process.exit and
instead only hides/minimizes or releases resources as intended; locate and
remove or relocate any app.quit() and process.exit(0) calls outside the strict
decision === "quit-completely" check (including the similar block around the
code handling lines ~231-239) so the decision contract for "run-in-background"
is preserved.
- Around line 143-145: The catch blocks currently call console.error with
free-form strings and the raw err object (e.g., the block that logs `Error
booting out ${label}:`, err) — change these to structured JSON logs and avoid
passing raw error objects: replace the free-form console.error calls with a JSON
object that includes a descriptive message field, the label variable, and a
sanitized error object (only message, code, and truncated stack or sanitized
fields), ensuring no credentials are included; apply the same change to the
other catch sites referenced (the console.error uses around the label handling
and the ones at the other noted ranges).
---
Nitpick comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 138-151: Extract the bootout/wait/ignore-failure sequence into a
single helper (e.g., shutdownLaunchdService or bootoutAndAwaitExit) that accepts
a launchd client and a service label, calls opts.launchd.bootoutService(label),
awaits opts.launchd.waitForExit(label, 5000), catches and logs bootout errors
and ignores wait errors (best-effort), and use that helper in place of the
duplicated for-loops that currently call opts.launchd.bootoutService and
opts.launchd.waitForExit (the blocks referencing opts.labels.openclaw,
opts.labels.controller and the duplicated ranges noted) so both sites call the
new helper instead of repeating the sequence.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e168aa53-b7aa-45c1-82ea-b30feae75f09
📒 Files selected for processing (1)
apps/desktop/main/services/quit-handler.ts
| } catch (err) { | ||
| console.error(`Error booting out ${label}:`, err); | ||
| } |
There was a problem hiding this comment.
Use structured/sanitized logs in the new quit path.
New logging uses free-form strings and raw err objects; switch to structured JSON fields and avoid logging raw errors directly.
As per coding guidelines, "Logging: structured (pino or console JSON), never log credentials."
Also applies to: 166-168, 220-222, 235-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/main/services/quit-handler.ts` around lines 143 - 145, The catch
blocks currently call console.error with free-form strings and the raw err
object (e.g., the block that logs `Error booting out ${label}:`, err) — change
these to structured JSON logs and avoid passing raw error objects: replace the
free-form console.error calls with a JSON object that includes a descriptive
message field, the label variable, and a sanitized error object (only message,
code, and truncated stack or sanitized fields), ensuring no credentials are
included; apply the same change to the other catch sites referenced (the
console.error uses around the label handling and the ones at the other noted
ranges).
app.quit() only requests a quit — event handlers can delay or prevent
it, and dangling handles keep the process alive. This left stale
SingletonLock files that blocked relaunch ("app can no longer be opened").
Switch to app.exit(0) which terminates immediately after services have
been bootout'd. All cleanup (flush logs, stop services, delete ports
file) happens before this call, so nothing is lost.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
apps/desktop/main/services/quit-handler.ts (2)
143-150:⚠️ Potential issue | 🟠 MajorUse structured, sanitized logs for launchd shutdown failures.
These catch blocks still log raw
errvalues forbootoutService()and suppresswaitForExit()failures entirely. That violates the repo logging rules and leaves the quit-race path blind whenlaunchctlor status polling fails.As per coding guidelines, "Logging: structured (pino or console JSON), never log credentials."
Also applies to: 213-220
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/quit-handler.ts` around lines 143 - 150, Replace raw/unsanitized logging in the bootoutService() catch and the opts.launchd.waitForExit() catch with structured JSON logs (using the repo logger, e.g., processLogger or pino) that include a short context string, the service label, and only sanitized error fields (e.g., err.message and err.code) — do not log full stacks or raw err objects or any sensitive fields; for the waitForExit() catch, emit a best-effort level log (warn/info) noting that exit polling failed with the sanitized error and label so the quit-race path is observable; apply the same change to the similar block around lines 213-220.
197-225:⚠️ Potential issue | 🟠 Major
quitWithDecision("run-in-background")still tears down and exits.This path now runs quit-only cleanup before branching, then unconditionally calls
app.exit(0). It no longer matches the interactive"run-in-background"behavior, which just hides the window and keeps the app alive.Proposed fix
export async function quitWithDecision( decision: "quit-completely" | "run-in-background", opts: QuitHandlerOptions, ): Promise<void> { + if (decision === "run-in-background") { + BrowserWindow.getAllWindows()[0]?.hide(); + return; + } + try { await opts.onBeforeQuit?.(); } catch (err) { console.error("Error in onBeforeQuit:", err); } @@ - if (decision === "quit-completely") { - for (const label of [opts.labels.openclaw, opts.labels.controller]) { - try { - await opts.launchd.bootoutService(label); - } catch (err) { - console.error(`Error booting out ${label}:`, err); - } - try { - await opts.launchd.waitForExit(label, 5000); - } catch { - // Best effort - } - } + for (const label of [opts.labels.openclaw, opts.labels.controller]) { + try { + await opts.launchd.bootoutService(label); + } catch (err) { + console.error(`Error booting out ${label}:`, err); + } + try { + await opts.launchd.waitForExit(label, 5000); + } catch { + // Best effort + } } (app as unknown as Record<string, unknown>).__nexuForceQuit = true; app.exit(0); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/quit-handler.ts` around lines 197 - 225, The current flow always performs teardown and calls app.exit(0) regardless of decision; change it so only the quit-completely path performs teardown and exit while run-in-background just hides/keeps the app alive. Concretely: move the webServer closing, launchd bootout/waitForExit loop, the __nexuForceQuit assignment, and app.exit(0) into the branch that checks decision === "quit-completely"; for decision === "run-in-background" call the existing window-hide logic (or return early) and do not call app.exit; keep the existing try/catch blocks for opts.onBeforeQuit, opts.webServer, and launchd methods (bootoutService, waitForExit) but only invoke them when handling quit-completely so run-in-background does not tear down the process.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 138-139: waitForExit() can return immediately after sending
SIGKILL on timeout, allowing deleteRuntimePorts() and app.exit(0) to run while
the service is still tearing down and causing a relaunch race; update
waitForExit() (called from launchd-manager.ts paths around lines referenced) to,
on timeout->SIGKILL, re-poll the target process (with a short loop and backoff)
until it is actually gone or a second hard timeout elapses, and only then return
success, or alternatively treat the initial timeout as a hard failure and
propagate an error to prevent calling deleteRuntimePorts()/app.exit(0); modify
callers (where waitForExit() is used) to respect the propagated failure if you
choose the hard-fail approach.
---
Duplicate comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 143-150: Replace raw/unsanitized logging in the bootoutService()
catch and the opts.launchd.waitForExit() catch with structured JSON logs (using
the repo logger, e.g., processLogger or pino) that include a short context
string, the service label, and only sanitized error fields (e.g., err.message
and err.code) — do not log full stacks or raw err objects or any sensitive
fields; for the waitForExit() catch, emit a best-effort level log (warn/info)
noting that exit polling failed with the sanitized error and label so the
quit-race path is observable; apply the same change to the similar block around
lines 213-220.
- Around line 197-225: The current flow always performs teardown and calls
app.exit(0) regardless of decision; change it so only the quit-completely path
performs teardown and exit while run-in-background just hides/keeps the app
alive. Concretely: move the webServer closing, launchd bootout/waitForExit loop,
the __nexuForceQuit assignment, and app.exit(0) into the branch that checks
decision === "quit-completely"; for decision === "run-in-background" call the
existing window-hide logic (or return early) and do not call app.exit; keep the
existing try/catch blocks for opts.onBeforeQuit, opts.webServer, and launchd
methods (bootoutService, waitForExit) but only invoke them when handling
quit-completely so run-in-background does not tear down the process.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 63d35049-6b1d-42e8-919c-c819df5bea7c
📒 Files selected for processing (1)
apps/desktop/main/services/quit-handler.ts
- state-migration: mkdirSync targetStateDir before writing stamp on fresh installs where source doesn't exist (P0 — prevents ENOENT) - waitForExit: distinguish "stopped" from "unknown" status, retry on transient "unknown" reads (3 consecutive = treat as exited), re-poll after SIGKILL to confirm process is actually gone - quitWithDecision: "run-in-background" no longer calls app.exit(), just hides window and keeps services running - Replace empty catches with console.warn for waitForExit failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/main/services/launchd-manager.ts`:
- Around line 277-285: The current waitForExit flow swallows failures: the empty
catch after attempting to kill the process and the silent success when the
recheck loop still finds the service running can hide failed shutdowns; update
the waitForExit implementation (the try/catch around the kill logic and the
re-poll loop that calls getServiceStatus(label)) to surface failures—either
throw a descriptive error or call the appropriate logger when the kill attempt
throws and when, after the 5 rechecks, recheck.status === "running" still
holds—include the label and any caught error details in the message so callers
can detect and handle failed force-kills.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e99dc0b-c130-49ba-93cc-3e67199766c9
📒 Files selected for processing (3)
apps/desktop/main/services/launchd-manager.tsapps/desktop/main/services/quit-handler.tsapps/desktop/main/services/state-migration.ts
✅ Files skipped from review due to trivial changes (1)
- apps/desktop/main/services/quit-handler.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/desktop/main/services/state-migration.ts
| } catch { | ||
| // Process may have exited between check and kill | ||
| } | ||
| // Re-poll briefly to confirm kill took effect | ||
| for (let i = 0; i < 5; i++) { | ||
| await new Promise((r) => setTimeout(r, 200)); | ||
| const recheck = await this.getServiceStatus(label); | ||
| if (recheck.status !== "running") return; | ||
| } |
There was a problem hiding this comment.
Don't silently succeed when force-kill fails.
On Line 277 and Line 284 paths, waitForExit can fail to terminate the process and still return with no warning. That makes shutdown failures invisible and can mask stale-lock/respawn regressions.
Proposed fix
if (status.pid) {
try {
process.kill(status.pid, "SIGKILL");
- } catch {
- // Process may have exited between check and kill
+ } catch (err) {
+ console.warn(
+ `Failed to SIGKILL ${label} (pid ${status.pid}):`,
+ err instanceof Error ? err.message : err,
+ );
}
// Re-poll briefly to confirm kill took effect
for (let i = 0; i < 5; i++) {
await new Promise((r) => setTimeout(r, 200));
const recheck = await this.getServiceStatus(label);
if (recheck.status !== "running") return;
}
+ console.warn(
+ `Service ${label} still appears running after SIGKILL fallback`,
+ );
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/desktop/main/services/launchd-manager.ts` around lines 277 - 285, The
current waitForExit flow swallows failures: the empty catch after attempting to
kill the process and the silent success when the recheck loop still finds the
service running can hide failed shutdowns; update the waitForExit implementation
(the try/catch around the kill logic and the re-poll loop that calls
getServiceStatus(label)) to surface failures—either throw a descriptive error or
call the appropriate logger when the kill attempt throws and when, after the 5
rechecks, recheck.status === "running" still holds—include the label and any
caught error details in the message so callers can detect and handle failed
force-kills.
… in launchd mode Controller plist was missing 13 critical environment variables compared to the daemon-supervisor manifest path (OPENCLAW_CONFIG_PATH, OPENCLAW_SKILLS_DIR, NODE_PATH, WEB_URL, HOST, etc.), causing the launchd-managed controller to fail skill loading, config compilation, and module resolution. Also skip module-level port probing when launchd mode is active — the bootstrap has its own port recovery via runtime-ports.json and handles leftover processes gracefully. This prevents startup crashes when residual services occupy the preferred ports. Additionally includes orchestrator launchd integration: enableLaunchdMode(), refreshLaunchdUnits(), launchd log tailing, and start/stop delegation to LaunchdManager. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Superseded by #544 — consolidated hotfix targeting release/v0.1.6 with all fixes squashed into a single commit. |
…, not path.resolve Replace 70 trivial path.resolve assertions with 27 behavior-focused tests that call real functions and check real output: generatePlist output (25 tests): Call the real function with production-realistic inputs, parse the XML, and verify every env var value matches expected paths. Key checks: - NEXU_HOME → ~/.nexu (not under userData) - OPENCLAW_STATE_DIR → under userData (not under NEXU_HOME) - OPENCLAW_CONFIG_PATH consistent with OPENCLAW_STATE_DIR - OPENCLAW_SKILLS_DIR consistent with OPENCLAW_STATE_DIR - OpenClaw plist has NO NEXU_HOME (it doesn't use it) These would have caught the #526 bug. runtime-config resolution (2 tests): Call getDesktopRuntimeConfig with different env objects, verify the NEXU_HOME priority chain: env > buildConfig > ~/.nexu default. Real launchd env verification (macOS, 2 tests): Start a real launchd service, read launchctl print output, parse the environment block, verify NEXU_HOME and OPENCLAW_STATE_DIR are what we set and are different directories.
…563) * fix(desktop): robust lifecycle teardown for quit and update-install The update-install path previously called orchestrator.dispose() without properly booting out launchd services, causing macOS to report "app is still running" when the installer tried to replace the .app bundle. The quit-completely path had a similar issue: after bootout, waitForExit could not SIGKILL processes whose launchd label was already unregistered. Changes: - LaunchdManager.bootoutAndWaitForExit: captures PID before bootout so the SIGKILL fallback works even after the label is unregistered. - LaunchdManager.waitForExit: accepts optional knownPid parameter; uses process.kill(pid, 0) to verify death when launchctl print returns "unknown". - teardownLaunchdServices: new shared function used by both quit-handler and update-manager. Bootouts each service with PID-aware waiting, deletes runtime-ports.json, and kills orphan processes via pgrep. - ensureNexuProcessesDead: polling verification gate that loops pgrep + SIGKILL until all Nexu sidecar processes are confirmed dead (max 15s). - quitAndInstall: now three-phase — (1) teardown + dispose wrapped in try/catch so failures never block the install, (2) ensureNexuProcessesDead as the hard verification gate, (3) autoUpdater.quitAndInstall. - Bootstrap adds killOrphanNexuProcesses on cold start to clean up residual processes from a previously failed update. Tests: 27 new tests across 3 files covering teardown, PID-aware shutdown, verification gate, and the full update-install sequence. * chore(ci): add pnpm test to CI and launchd lifecycle e2e test - ci.yml: add a `test` job that runs `pnpm test` on ubuntu-latest, covering all 247+ vitest unit tests that were previously not run in CI. - desktop-ci-dist.yml: add real launchd lifecycle e2e test that runs on macOS CI runners before the packaged app build. The test exercises: 1. Bootstrap: register plist → kickstart → verify running + port 2. Teardown: bootout → verify label unregistered → verify process dead 3. SIGKILL fallback: bootout → saved-PID SIGKILL → verify dead 4. Orphan detection: spawn fake orphan → detect via lsof → SIGKILL 5. Re-bootstrap: fresh cold start after full cleanup - scripts/launchd-lifecycle-e2e.sh: standalone e2e test script (15 checks) that validates the launchd process management primitives used by the desktop app's quit and update-install paths. * fix(desktop): prevent Dock icon proliferation + add comprehensive tests Fixes: - dev-env.sh: add Launch Services cache flush (lsregister) after patching LSUIElement=true on the dev Electron binary. Without this, macOS uses cached plist data and still shows Dock icons for child processes. Add verification logging on success/failure. - daemon-supervisor: force ELECTRON_RUN_AS_NODE=1 on all spawn() calls that use process.execPath (Electron binary) as a safety net, even if the manifest env omits it. Tests: - daemon-supervisor.test.ts (10 tests): constructor, startAutoStart, stopUnit SIGTERM/SIGKILL escalation, 5s deadline, stopAll parallel, dispose, skip non-managed, ELECTRON_RUN_AS_NODE safety net, stoppedByUser suppresses restart, dependent stop order. - quit-handler.test.ts (8 tests): quit-completely full sequence, __nexuForceQuit flag, app.exit(0), error resilience for onBeforeQuit and webServer.close, no-plistDir skip, run-in-background hide. - desktop-stop-smoke.sh: post-stop verification script — checks no residual processes, free ports, no launchd labels, no stale state. Integrated into desktop-check-dev.sh for CI. * fix(desktop): route pnpm start through dev-env.sh for LSUIElement patch dev-launchd.sh (pnpm start) was launching Electron directly without going through dev-env.sh, bypassing the LSUIElement=true plist patch and Launch Services cache flush. This is the direct cause of the Dock icon proliferation reported by users running pnpm start. Now pnpm start → dev-launchd.sh → dev-env.sh → electron, matching the same path that pnpm dev uses via dev.sh → dev-run.sh → dev-env.sh. * test(desktop): add dev toolchain invariant tests Static analysis tests that guard critical invariants across the launch, environment, and shutdown scripts. These catch regressions like the dev-launchd.sh bypass of dev-env.sh that caused Dock icon proliferation. 19 invariant checks covering: - Launch paths: all Electron launch commands go through dev-env.sh - LSUIElement: plist patch + LS cache flush present - ELECTRON_RUN_AS_NODE: set in plists, manifests, daemon-supervisor, openclaw-process, and catalog-manager - Shutdown: bootout, orphan kill, port wait, teardown via shared function, try/catch wrapping, verification gate ordering * test(desktop): comprehensive coverage for lifecycle, quit, and update paths Bring test coverage for the 6 core lifecycle files from 63.5% to 83.8% overall statements, with function coverage at 93.8%. Per-file improvements: - update-manager.ts: 63.1% → 98.5% (bindEvents, checkNow, downloadUpdate, periodicCheck, setChannel/setSource, send to webviews) - quit-handler.ts: 36.1% → 97.0% (installLaunchdQuitHandler dialog flow, force-quit bypass, dev-mode bypass, Cmd+Q interception, dialogOpen guard) - launchd-manager.ts: 72.7% → 94.9% (uninstallService, stopServiceGracefully SIGKILL escalation, restartService, rebootstrapFromPlist, hasPlistFile) - daemon-supervisor.ts: 46.5% → 69.5% (startUnit port probe, auto-restart backoff, refreshDelegatedUnits pgrep, stdout/stderr capture, queryEvents) - launchd-bootstrap.ts: 92.6% → 93.1% (isLaunchdBootstrapEnabled packaged heuristic, ensureNexuProcessesDead edge cases) - plist-generator.ts: 100% (unchanged) New test files: daemon-supervisor.test.ts (27), quit-handler-full.test.ts (13), update-manager-full.test.ts (41), launchd-manager-ops-extended.test.ts (11), launchd-bootstrap-edge.test.ts (12). Total: 391 tests across 40 files, all passing. * fix(desktop): stopPeriodicCheck race + real launchd integration tests Fixes: - update-manager: save initial setTimeout ID so stopPeriodicCheck can cancel it during the initial delay window. Previously, calling stopPeriodicCheck before the initial delay expired was a no-op, allowing the interval to start during teardown. - update-manager: call stopPeriodicCheck() at the start of quitAndInstall() to prevent periodic checks from firing mid-teardown. Tests: - launchd-integration.test.ts: 8 tests that run REAL launchd on macOS (skipped on other platforms). Covers: 1. installService + startService → real service running on real port 2. bootoutAndWaitForExit → real process confirmed dead 3. teardownLaunchdServices → full sequence against real launchd 4. ensureNexuProcessesDead → real orphan process spawned and killed 5. getServiceStatus → real PID from launchctl print 6. getServiceStatus → unknown for non-existent label 7. installService → detects plist content change and re-bootstraps 8. stopServiceGracefully → real SIGTERM stops service CI: - desktop-ci-dist.yml: add `pnpm test` step on macOS runners so real launchd integration tests run in CI alongside the shell e2e tests. * test(desktop): comprehensive real launchd integration tests + macOS CI Expand launchd-integration.test.ts from 8 to 16 real launchd tests: 9. Full cycle: start → bootout → verify clean → cold re-start 10. Attach: detect already-running service from previous session 11. KeepAlive: service auto-restarts after SIGKILL (crash simulation) 12. Rapid start/stop cycles leave no orphan processes 13. Port conflict: occupied port → bootout still cleans up 14. bootout on non-registered label is idempotent (no throw) 15. teardownLaunchdServices with non-existent labels is safe 16. waitForExit handles process dying during bootout (race condition) CI: add pnpm test to desktop-ci-dev.yml (macOS-14) so real launchd integration tests also run in the dev CI path, not just dist CI. * test(desktop): 33 real launchd integration tests covering all LaunchdManager methods Expand from 16 to 33 real launchd tests, covering every LaunchdManager public method and critical lifecycle scenario against actual launchctl: Methods: - installService (fresh, idempotent, content-change re-bootstrap) - startService, stopService (SIGTERM), restartService (kickstart -k) - bootoutService, bootoutAndWaitForExit (with PID-aware fallback) - uninstallService (bootout + delete plist, idempotent) - stopServiceGracefully (SIGTERM → poll → SIGKILL escalation) - rebootstrapFromPlist (re-register after bootout) - getServiceStatus (running/unknown, PID parsing, env parsing) - isServiceRegistered, hasPlistFile, isServiceInstalled - waitForExit (with knownPid after bootout) - getDomain, getPlistDir Scenarios: - Full start→stop→restart cycle - Attach to already-running service from previous session - KeepAlive auto-restart after SIGKILL (crash simulation) - Rapid start/stop cycles with no orphans - Port conflict: bootout cleans up even when port is blocked - Double bootout is idempotent - Process dying during bootout (race condition) - Multiple services: start two, teardown both - ensureNexuProcessesDead: no-op when clean, kills orphans - teardownLaunchdServices: non-existent labels are safe * test(desktop): update server integration tests with real HTTP server Spin up a local HTTP server that mimics the desktop release CDN and verify the update feed resolution + YAML serving end-to-end: 1. Feed URL resolves to valid fetchable URL (stable/arm64) 2. Server serves latest-mac.yml at correct path 3. YAML contains all electron-updater required fields 4. All 3 channels × 2 architectures serve valid responses 5. 404 for invalid paths 6. Explicit feedUrl overrides default 7. Custom feed URL pointed at local server is fetchable 8. Version comparison: server version > current = update available 9. Download artifact URL serves content 10. Server request logging works 11. GitHub source returns github:// URL 12. NEXU_UPDATE_FEED_URL env takes highest priority These catch real HTTP/YAML issues that mocked autoUpdater tests miss. * fix(test): stabilize stopService test against KeepAlive race * fix(desktop): respect externally-set NEXU_HOME in dev mode bootstrap.ts configureLocalDevPaths() unconditionally overwrote process.env.NEXU_HOME with userData/.nexu, clobbering the value passed by dev-launchd.sh (pnpm start). This caused controller to read config from a fresh empty directory instead of the intended .tmp/desktop/nexu-home/, making every pnpm start feel like a first-time setup. Fix: only set NEXU_HOME as fallback when not already provided. Packaged users are unaffected — configureLocalDevPaths() returns early when app.isPackaged is true (line 47). Also adds 27 data-directory-invariants tests that guard: - bootstrap.ts path configuration guards (isPackaged, env respect) - runtime-config.ts NEXU_HOME resolution order - controller env.ts data file locations under NEXU_HOME - plist NEXU_HOME and OPENCLAW_STATE_DIR presence - dev-launchd.sh path consistency - AGENTS.md directory layout contract - Packaged mode backward compatibility with 0.1.7 - OpenClaw state directory separation from NEXU_HOME * fix(desktop): respect NEXU_HOME + 70 data directory path tests Fix: bootstrap.ts configureLocalDevPaths() now respects externally-set NEXU_HOME instead of unconditionally overwriting it. This fixes pnpm start creating a fresh config directory on every launch. 70 runtime tests verify every data path by calling real functions and checking real output: Controller plist env vars (26 tests): NEXU_HOME, OPENCLAW_STATE_DIR, OPENCLAW_CONFIG_PATH, OPENCLAW_SKILLS_DIR, OPENCLAW_EXTENSIONS_DIR, SKILLHUB_STATIC_SKILLS_DIR, PLATFORM_TEMPLATES_DIR, OPENCLAW_BIN, OPENCLAW_ELECTRON_EXECUTABLE, NODE_PATH, TMPDIR, PORT, HOST, WEB_URL, OPENCLAW_GATEWAY_PORT, OPENCLAW_GATEWAY_TOKEN, ELECTRON_RUN_AS_NODE, RUNTIME_MANAGE_OPENCLAW_PROCESS, RUNTIME_GATEWAY_PROBE_ENABLED, OPENCLAW_DISABLE_BONJOUR, NODE_ENV (dev+prod), HOME, PATH, NEXU_HOME omission OpenClaw plist env vars (12 tests): ELECTRON_RUN_AS_NODE, OPENCLAW_CONFIG, OPENCLAW_CONFIG_PATH, OPENCLAW_STATE_DIR, OPENCLAW_LAUNCHD_LABEL (dev+prod), OPENCLAW_SERVICE_MARKER, HOME, PATH, NODE_PATH, no NEXU_HOME, no PORT Plist structure (11 tests): ProgramArguments, WorkingDirectory, StandardOutPath, StandardErrorPath, KeepAlive, RunAtLoad, Label (dev+prod), gateway run args, --auth none (dev only), OtherJobEnabled Path resolution (12 tests): desktop-paths.ts helpers (6), resolveLaunchdPaths dev+packaged (4), getDefaultPlistDir dev+prod, getLogDir dev+prod Directory separation (5 tests): packaged NEXU_HOME ≠ userData, dev NEXU_HOME ≠ userData, NEXU_HOME under home, userData under Application Support, dev state repo-scoped Config resolution (4 tests): NEXU_HOME default, env override, runtime-config priority chain * test(desktop): rewrite data directory tests — verify program behavior, not path.resolve Replace 70 trivial path.resolve assertions with 27 behavior-focused tests that call real functions and check real output: generatePlist output (25 tests): Call the real function with production-realistic inputs, parse the XML, and verify every env var value matches expected paths. Key checks: - NEXU_HOME → ~/.nexu (not under userData) - OPENCLAW_STATE_DIR → under userData (not under NEXU_HOME) - OPENCLAW_CONFIG_PATH consistent with OPENCLAW_STATE_DIR - OPENCLAW_SKILLS_DIR consistent with OPENCLAW_STATE_DIR - OpenClaw plist has NO NEXU_HOME (it doesn't use it) These would have caught the #526 bug. runtime-config resolution (2 tests): Call getDesktopRuntimeConfig with different env objects, verify the NEXU_HOME priority chain: env > buildConfig > ~/.nexu default. Real launchd env verification (macOS, 2 tests): Start a real launchd service, read launchctl print output, parse the environment block, verify NEXU_HOME and OPENCLAW_STATE_DIR are what we set and are different directories. * test(desktop): quality overhaul — delete garbage, add real edge cases Deleted: - quit-handler.test.ts (8 tests superseded by quit-handler-full.test.ts) - data-directory-invariants.test.ts (27 grep-source-code tests, worthless) - launchd-manager-ops-extended.test.ts (11 tests duplicated in ops.test.ts) Added edge cases: - daemon-supervisor: partial failure (one unit hangs), double dispose, child.error event handling - lifecycle-teardown: process.kill EPERM handling, PID deduplication across multiple pgrep patterns - update-install: stopPeriodicCheck before teardown verification, ensureNexuProcessesDead throw propagation - launchd-integration (real launchd): NEXU_HOME with spaces, NEXU_HOME with Chinese unicode, OtherJobEnabled cascading behavior - launchd-lifecycle-e2e.sh: Phase 6 (spaces) + Phase 7 (unicode) using .cjs scripts for ESM-safe execution Net: -878 lines of garbage, +537 lines of behavior-focused tests. 17/17 real launchd e2e checks passing. * chore: add @vitest/coverage-v8 dev dependency Required for running test coverage reports locally and in CI. * chore(ci): comprehensive path filters for macOS CI Add missing paths that affect launchd/lifecycle behavior: - apps/controller/** — controller process management, env parsing, openclaw process spawning. Changes here can break launchd services. - scripts/dev-launchd.sh — pnpm start/stop/restart entry point. - scripts/kill-all.sh — global process cleanup. - scripts/desktop-stop-smoke.sh — added to dist CI (was only in dev). - scripts/launchd-lifecycle-e2e.sh — added to dev CI (was only in dist). - tests/desktop/** — test changes should trigger CI to verify they pass. - vitest.config.ts — test framework config changes could break all tests. This ensures any change that could affect launchd, process lifecycle, or the test suite triggers the macOS CI runners. * fix(desktop): P0-2 unified gracefulShutdown + P1-2 removeListener fix P0-2: Extract single authoritative gracefulShutdown(reason) function. - Idempotent: shutdownInProgress guard prevents double teardown. - 8-second hard timeout: if teardown hangs, process.exit(1) fires. - Handles both launchd mode (teardownLaunchdServices) and orchestrator mode (orchestrator.dispose) in one function. - SIGTERM + SIGINT handlers registered on process, route to gracefulShutdown then app.exit(0). This covers: - External kill (Activity Monitor, scripts, systemd) - Ctrl+C in terminal - System shutdown sending SIGTERM - dev-launchd.sh stop now sends SIGTERM first (triggers graceful shutdown in Electron), waits up to 10s, then SIGKILL as fallback. Previously it used SIGKILL immediately, bypassing all cleanup. P1-2: Replace removeAllListeners("before-quit") with removeListener. - Store the specific handler reference (beforeQuitHandler). - Only remove that handler, not all before-quit listeners. - Prevents future listeners (telemetry, flush, etc.) from being accidentally removed. Also fixes: dev.sh kill_residual_processes patterns were outdated (referenced .tmp/sidecars/ paths that no longer exist), causing CI stop smoke test failures. Updated to match current process patterns. * fix(desktop): address PR review comments + CODEOWNERS PR review fixes: - launchd-bootstrap: orphan kill now only runs when neither service is registered with launchd, preventing SIGKILL of healthy managed services during relaunch-after-crash scenarios. - quit-handler: teardownLaunchdServices always runs on quit-completely, even if plistDir is absent (plistDir only affects runtime-ports cleanup). - desktop-stop-smoke.sh: add web sidecar pattern to process checks. - desktop-check-dev.sh: replace fixed 2s sleep with bounded polling (max 10s) to avoid teardown race flakes. - dev-env.sh: lsregister success/failure logged accurately. - dev.sh: kill_residual_processes patterns updated to match current process paths (was using stale .tmp/sidecars/ paths). CODEOWNERS: require @lefarcen review for test changes only. Tests define the quality gate — source code anyone can change, but acceptance criteria changes need review. * fix(desktop): kill tsc/web watchers on pnpm stop + smoke test dev-launchd.sh stop_services now kills the tsc --watch and web watcher background processes. These were only cleaned by the EXIT trap (which fires when the start_services function's shell exits), but `pnpm stop` calls stop_services directly without triggering the trap — leaving watchers printing to the terminal after stop. Also adds tsc watcher residual check to desktop-stop-smoke.sh. * chore: revert CODEOWNERS + remove unused OPENCLAW_ENTRY gate in e2e - Revert CODEOWNERS to original (only api/migrations). - Remove OPENCLAW_ENTRY prerequisite from launchd-lifecycle-e2e.sh — the script only tests controller, never launches openclaw.
…reen (#597) * fix(desktop): robust lifecycle teardown for quit and update-install The update-install path previously called orchestrator.dispose() without properly booting out launchd services, causing macOS to report "app is still running" when the installer tried to replace the .app bundle. The quit-completely path had a similar issue: after bootout, waitForExit could not SIGKILL processes whose launchd label was already unregistered. Changes: - LaunchdManager.bootoutAndWaitForExit: captures PID before bootout so the SIGKILL fallback works even after the label is unregistered. - LaunchdManager.waitForExit: accepts optional knownPid parameter; uses process.kill(pid, 0) to verify death when launchctl print returns "unknown". - teardownLaunchdServices: new shared function used by both quit-handler and update-manager. Bootouts each service with PID-aware waiting, deletes runtime-ports.json, and kills orphan processes via pgrep. - ensureNexuProcessesDead: polling verification gate that loops pgrep + SIGKILL until all Nexu sidecar processes are confirmed dead (max 15s). - quitAndInstall: now three-phase — (1) teardown + dispose wrapped in try/catch so failures never block the install, (2) ensureNexuProcessesDead as the hard verification gate, (3) autoUpdater.quitAndInstall. - Bootstrap adds killOrphanNexuProcesses on cold start to clean up residual processes from a previously failed update. Tests: 27 new tests across 3 files covering teardown, PID-aware shutdown, verification gate, and the full update-install sequence. * chore(ci): add pnpm test to CI and launchd lifecycle e2e test - ci.yml: add a `test` job that runs `pnpm test` on ubuntu-latest, covering all 247+ vitest unit tests that were previously not run in CI. - desktop-ci-dist.yml: add real launchd lifecycle e2e test that runs on macOS CI runners before the packaged app build. The test exercises: 1. Bootstrap: register plist → kickstart → verify running + port 2. Teardown: bootout → verify label unregistered → verify process dead 3. SIGKILL fallback: bootout → saved-PID SIGKILL → verify dead 4. Orphan detection: spawn fake orphan → detect via lsof → SIGKILL 5. Re-bootstrap: fresh cold start after full cleanup - scripts/launchd-lifecycle-e2e.sh: standalone e2e test script (15 checks) that validates the launchd process management primitives used by the desktop app's quit and update-install paths. * fix(desktop): prevent Dock icon proliferation + add comprehensive tests Fixes: - dev-env.sh: add Launch Services cache flush (lsregister) after patching LSUIElement=true on the dev Electron binary. Without this, macOS uses cached plist data and still shows Dock icons for child processes. Add verification logging on success/failure. - daemon-supervisor: force ELECTRON_RUN_AS_NODE=1 on all spawn() calls that use process.execPath (Electron binary) as a safety net, even if the manifest env omits it. Tests: - daemon-supervisor.test.ts (10 tests): constructor, startAutoStart, stopUnit SIGTERM/SIGKILL escalation, 5s deadline, stopAll parallel, dispose, skip non-managed, ELECTRON_RUN_AS_NODE safety net, stoppedByUser suppresses restart, dependent stop order. - quit-handler.test.ts (8 tests): quit-completely full sequence, __nexuForceQuit flag, app.exit(0), error resilience for onBeforeQuit and webServer.close, no-plistDir skip, run-in-background hide. - desktop-stop-smoke.sh: post-stop verification script — checks no residual processes, free ports, no launchd labels, no stale state. Integrated into desktop-check-dev.sh for CI. * fix(desktop): route pnpm start through dev-env.sh for LSUIElement patch dev-launchd.sh (pnpm start) was launching Electron directly without going through dev-env.sh, bypassing the LSUIElement=true plist patch and Launch Services cache flush. This is the direct cause of the Dock icon proliferation reported by users running pnpm start. Now pnpm start → dev-launchd.sh → dev-env.sh → electron, matching the same path that pnpm dev uses via dev.sh → dev-run.sh → dev-env.sh. * test(desktop): add dev toolchain invariant tests Static analysis tests that guard critical invariants across the launch, environment, and shutdown scripts. These catch regressions like the dev-launchd.sh bypass of dev-env.sh that caused Dock icon proliferation. 19 invariant checks covering: - Launch paths: all Electron launch commands go through dev-env.sh - LSUIElement: plist patch + LS cache flush present - ELECTRON_RUN_AS_NODE: set in plists, manifests, daemon-supervisor, openclaw-process, and catalog-manager - Shutdown: bootout, orphan kill, port wait, teardown via shared function, try/catch wrapping, verification gate ordering * test(desktop): comprehensive coverage for lifecycle, quit, and update paths Bring test coverage for the 6 core lifecycle files from 63.5% to 83.8% overall statements, with function coverage at 93.8%. Per-file improvements: - update-manager.ts: 63.1% → 98.5% (bindEvents, checkNow, downloadUpdate, periodicCheck, setChannel/setSource, send to webviews) - quit-handler.ts: 36.1% → 97.0% (installLaunchdQuitHandler dialog flow, force-quit bypass, dev-mode bypass, Cmd+Q interception, dialogOpen guard) - launchd-manager.ts: 72.7% → 94.9% (uninstallService, stopServiceGracefully SIGKILL escalation, restartService, rebootstrapFromPlist, hasPlistFile) - daemon-supervisor.ts: 46.5% → 69.5% (startUnit port probe, auto-restart backoff, refreshDelegatedUnits pgrep, stdout/stderr capture, queryEvents) - launchd-bootstrap.ts: 92.6% → 93.1% (isLaunchdBootstrapEnabled packaged heuristic, ensureNexuProcessesDead edge cases) - plist-generator.ts: 100% (unchanged) New test files: daemon-supervisor.test.ts (27), quit-handler-full.test.ts (13), update-manager-full.test.ts (41), launchd-manager-ops-extended.test.ts (11), launchd-bootstrap-edge.test.ts (12). Total: 391 tests across 40 files, all passing. * fix(desktop): stopPeriodicCheck race + real launchd integration tests Fixes: - update-manager: save initial setTimeout ID so stopPeriodicCheck can cancel it during the initial delay window. Previously, calling stopPeriodicCheck before the initial delay expired was a no-op, allowing the interval to start during teardown. - update-manager: call stopPeriodicCheck() at the start of quitAndInstall() to prevent periodic checks from firing mid-teardown. Tests: - launchd-integration.test.ts: 8 tests that run REAL launchd on macOS (skipped on other platforms). Covers: 1. installService + startService → real service running on real port 2. bootoutAndWaitForExit → real process confirmed dead 3. teardownLaunchdServices → full sequence against real launchd 4. ensureNexuProcessesDead → real orphan process spawned and killed 5. getServiceStatus → real PID from launchctl print 6. getServiceStatus → unknown for non-existent label 7. installService → detects plist content change and re-bootstraps 8. stopServiceGracefully → real SIGTERM stops service CI: - desktop-ci-dist.yml: add `pnpm test` step on macOS runners so real launchd integration tests run in CI alongside the shell e2e tests. * test(desktop): comprehensive real launchd integration tests + macOS CI Expand launchd-integration.test.ts from 8 to 16 real launchd tests: 9. Full cycle: start → bootout → verify clean → cold re-start 10. Attach: detect already-running service from previous session 11. KeepAlive: service auto-restarts after SIGKILL (crash simulation) 12. Rapid start/stop cycles leave no orphan processes 13. Port conflict: occupied port → bootout still cleans up 14. bootout on non-registered label is idempotent (no throw) 15. teardownLaunchdServices with non-existent labels is safe 16. waitForExit handles process dying during bootout (race condition) CI: add pnpm test to desktop-ci-dev.yml (macOS-14) so real launchd integration tests also run in the dev CI path, not just dist CI. * test(desktop): 33 real launchd integration tests covering all LaunchdManager methods Expand from 16 to 33 real launchd tests, covering every LaunchdManager public method and critical lifecycle scenario against actual launchctl: Methods: - installService (fresh, idempotent, content-change re-bootstrap) - startService, stopService (SIGTERM), restartService (kickstart -k) - bootoutService, bootoutAndWaitForExit (with PID-aware fallback) - uninstallService (bootout + delete plist, idempotent) - stopServiceGracefully (SIGTERM → poll → SIGKILL escalation) - rebootstrapFromPlist (re-register after bootout) - getServiceStatus (running/unknown, PID parsing, env parsing) - isServiceRegistered, hasPlistFile, isServiceInstalled - waitForExit (with knownPid after bootout) - getDomain, getPlistDir Scenarios: - Full start→stop→restart cycle - Attach to already-running service from previous session - KeepAlive auto-restart after SIGKILL (crash simulation) - Rapid start/stop cycles with no orphans - Port conflict: bootout cleans up even when port is blocked - Double bootout is idempotent - Process dying during bootout (race condition) - Multiple services: start two, teardown both - ensureNexuProcessesDead: no-op when clean, kills orphans - teardownLaunchdServices: non-existent labels are safe * test(desktop): update server integration tests with real HTTP server Spin up a local HTTP server that mimics the desktop release CDN and verify the update feed resolution + YAML serving end-to-end: 1. Feed URL resolves to valid fetchable URL (stable/arm64) 2. Server serves latest-mac.yml at correct path 3. YAML contains all electron-updater required fields 4. All 3 channels × 2 architectures serve valid responses 5. 404 for invalid paths 6. Explicit feedUrl overrides default 7. Custom feed URL pointed at local server is fetchable 8. Version comparison: server version > current = update available 9. Download artifact URL serves content 10. Server request logging works 11. GitHub source returns github:// URL 12. NEXU_UPDATE_FEED_URL env takes highest priority These catch real HTTP/YAML issues that mocked autoUpdater tests miss. * fix(test): stabilize stopService test against KeepAlive race * fix(desktop): respect externally-set NEXU_HOME in dev mode bootstrap.ts configureLocalDevPaths() unconditionally overwrote process.env.NEXU_HOME with userData/.nexu, clobbering the value passed by dev-launchd.sh (pnpm start). This caused controller to read config from a fresh empty directory instead of the intended .tmp/desktop/nexu-home/, making every pnpm start feel like a first-time setup. Fix: only set NEXU_HOME as fallback when not already provided. Packaged users are unaffected — configureLocalDevPaths() returns early when app.isPackaged is true (line 47). Also adds 27 data-directory-invariants tests that guard: - bootstrap.ts path configuration guards (isPackaged, env respect) - runtime-config.ts NEXU_HOME resolution order - controller env.ts data file locations under NEXU_HOME - plist NEXU_HOME and OPENCLAW_STATE_DIR presence - dev-launchd.sh path consistency - AGENTS.md directory layout contract - Packaged mode backward compatibility with 0.1.7 - OpenClaw state directory separation from NEXU_HOME * fix(desktop): respect NEXU_HOME + 70 data directory path tests Fix: bootstrap.ts configureLocalDevPaths() now respects externally-set NEXU_HOME instead of unconditionally overwriting it. This fixes pnpm start creating a fresh config directory on every launch. 70 runtime tests verify every data path by calling real functions and checking real output: Controller plist env vars (26 tests): NEXU_HOME, OPENCLAW_STATE_DIR, OPENCLAW_CONFIG_PATH, OPENCLAW_SKILLS_DIR, OPENCLAW_EXTENSIONS_DIR, SKILLHUB_STATIC_SKILLS_DIR, PLATFORM_TEMPLATES_DIR, OPENCLAW_BIN, OPENCLAW_ELECTRON_EXECUTABLE, NODE_PATH, TMPDIR, PORT, HOST, WEB_URL, OPENCLAW_GATEWAY_PORT, OPENCLAW_GATEWAY_TOKEN, ELECTRON_RUN_AS_NODE, RUNTIME_MANAGE_OPENCLAW_PROCESS, RUNTIME_GATEWAY_PROBE_ENABLED, OPENCLAW_DISABLE_BONJOUR, NODE_ENV (dev+prod), HOME, PATH, NEXU_HOME omission OpenClaw plist env vars (12 tests): ELECTRON_RUN_AS_NODE, OPENCLAW_CONFIG, OPENCLAW_CONFIG_PATH, OPENCLAW_STATE_DIR, OPENCLAW_LAUNCHD_LABEL (dev+prod), OPENCLAW_SERVICE_MARKER, HOME, PATH, NODE_PATH, no NEXU_HOME, no PORT Plist structure (11 tests): ProgramArguments, WorkingDirectory, StandardOutPath, StandardErrorPath, KeepAlive, RunAtLoad, Label (dev+prod), gateway run args, --auth none (dev only), OtherJobEnabled Path resolution (12 tests): desktop-paths.ts helpers (6), resolveLaunchdPaths dev+packaged (4), getDefaultPlistDir dev+prod, getLogDir dev+prod Directory separation (5 tests): packaged NEXU_HOME ≠ userData, dev NEXU_HOME ≠ userData, NEXU_HOME under home, userData under Application Support, dev state repo-scoped Config resolution (4 tests): NEXU_HOME default, env override, runtime-config priority chain * test(desktop): rewrite data directory tests — verify program behavior, not path.resolve Replace 70 trivial path.resolve assertions with 27 behavior-focused tests that call real functions and check real output: generatePlist output (25 tests): Call the real function with production-realistic inputs, parse the XML, and verify every env var value matches expected paths. Key checks: - NEXU_HOME → ~/.nexu (not under userData) - OPENCLAW_STATE_DIR → under userData (not under NEXU_HOME) - OPENCLAW_CONFIG_PATH consistent with OPENCLAW_STATE_DIR - OPENCLAW_SKILLS_DIR consistent with OPENCLAW_STATE_DIR - OpenClaw plist has NO NEXU_HOME (it doesn't use it) These would have caught the #526 bug. runtime-config resolution (2 tests): Call getDesktopRuntimeConfig with different env objects, verify the NEXU_HOME priority chain: env > buildConfig > ~/.nexu default. Real launchd env verification (macOS, 2 tests): Start a real launchd service, read launchctl print output, parse the environment block, verify NEXU_HOME and OPENCLAW_STATE_DIR are what we set and are different directories. * test(desktop): quality overhaul — delete garbage, add real edge cases Deleted: - quit-handler.test.ts (8 tests superseded by quit-handler-full.test.ts) - data-directory-invariants.test.ts (27 grep-source-code tests, worthless) - launchd-manager-ops-extended.test.ts (11 tests duplicated in ops.test.ts) Added edge cases: - daemon-supervisor: partial failure (one unit hangs), double dispose, child.error event handling - lifecycle-teardown: process.kill EPERM handling, PID deduplication across multiple pgrep patterns - update-install: stopPeriodicCheck before teardown verification, ensureNexuProcessesDead throw propagation - launchd-integration (real launchd): NEXU_HOME with spaces, NEXU_HOME with Chinese unicode, OtherJobEnabled cascading behavior - launchd-lifecycle-e2e.sh: Phase 6 (spaces) + Phase 7 (unicode) using .cjs scripts for ESM-safe execution Net: -878 lines of garbage, +537 lines of behavior-focused tests. 17/17 real launchd e2e checks passing. * chore: add @vitest/coverage-v8 dev dependency Required for running test coverage reports locally and in CI. * chore(ci): comprehensive path filters for macOS CI Add missing paths that affect launchd/lifecycle behavior: - apps/controller/** — controller process management, env parsing, openclaw process spawning. Changes here can break launchd services. - scripts/dev-launchd.sh — pnpm start/stop/restart entry point. - scripts/kill-all.sh — global process cleanup. - scripts/desktop-stop-smoke.sh — added to dist CI (was only in dev). - scripts/launchd-lifecycle-e2e.sh — added to dev CI (was only in dist). - tests/desktop/** — test changes should trigger CI to verify they pass. - vitest.config.ts — test framework config changes could break all tests. This ensures any change that could affect launchd, process lifecycle, or the test suite triggers the macOS CI runners. * fix(desktop): P0-2 unified gracefulShutdown + P1-2 removeListener fix P0-2: Extract single authoritative gracefulShutdown(reason) function. - Idempotent: shutdownInProgress guard prevents double teardown. - 8-second hard timeout: if teardown hangs, process.exit(1) fires. - Handles both launchd mode (teardownLaunchdServices) and orchestrator mode (orchestrator.dispose) in one function. - SIGTERM + SIGINT handlers registered on process, route to gracefulShutdown then app.exit(0). This covers: - External kill (Activity Monitor, scripts, systemd) - Ctrl+C in terminal - System shutdown sending SIGTERM - dev-launchd.sh stop now sends SIGTERM first (triggers graceful shutdown in Electron), waits up to 10s, then SIGKILL as fallback. Previously it used SIGKILL immediately, bypassing all cleanup. P1-2: Replace removeAllListeners("before-quit") with removeListener. - Store the specific handler reference (beforeQuitHandler). - Only remove that handler, not all before-quit listeners. - Prevents future listeners (telemetry, flush, etc.) from being accidentally removed. Also fixes: dev.sh kill_residual_processes patterns were outdated (referenced .tmp/sidecars/ paths that no longer exist), causing CI stop smoke test failures. Updated to match current process patterns. * fix(desktop): address PR review comments + CODEOWNERS PR review fixes: - launchd-bootstrap: orphan kill now only runs when neither service is registered with launchd, preventing SIGKILL of healthy managed services during relaunch-after-crash scenarios. - quit-handler: teardownLaunchdServices always runs on quit-completely, even if plistDir is absent (plistDir only affects runtime-ports cleanup). - desktop-stop-smoke.sh: add web sidecar pattern to process checks. - desktop-check-dev.sh: replace fixed 2s sleep with bounded polling (max 10s) to avoid teardown race flakes. - dev-env.sh: lsregister success/failure logged accurately. - dev.sh: kill_residual_processes patterns updated to match current process paths (was using stale .tmp/sidecars/ paths). CODEOWNERS: require @lefarcen review for test changes only. Tests define the quality gate — source code anyone can change, but acceptance criteria changes need review. * fix(desktop): kill tsc/web watchers on pnpm stop + smoke test dev-launchd.sh stop_services now kills the tsc --watch and web watcher background processes. These were only cleaned by the EXIT trap (which fires when the start_services function's shell exits), but `pnpm stop` calls stop_services directly without triggering the trap — leaving watchers printing to the terminal after stop. Also adds tsc watcher residual check to desktop-stop-smoke.sh. * chore: revert CODEOWNERS + remove unused OPENCLAW_ENTRY gate in e2e - Revert CODEOWNERS to original (only api/migrations). - Remove OPENCLAW_ENTRY prerequisite from launchd-lifecycle-e2e.sh — the script only tests controller, never launches openclaw. * fix(desktop): harden lifecycle robustness — external runner, evidence-based updates, unified teardown - Extract Electron binary + frameworks to ~/.nexu/runtime/nexu-runner.app/ via APFS clone so launchd services never reference the .app bundle, unblocking Finder drag-and-drop reinstalls - Extract controller sidecar to ~/.nexu/runtime/controller-sidecar/ for the same reason; openclaw sidecar already extracted by existing logic - All three extractions use staging dir + atomic rename to prevent half-copies - Version-aware attach: refuse to attach to services from a different app version, build source, userDataPath, or openclawStateDir - Evidence-based update install: after process sweeps, lsof-check critical paths; only abort if .app bundle or sidecar dirs are actually locked - Unified dev/packaged teardown: Cmd+Q, window close, and no-window exit all go through teardownLaunchdServices in both modes - daemon-supervisor circuit breaker: MAX_CONSECUTIVE_RESTARTS=10 with 120s window, emits max_restarts_exceeded reason code - bootoutService tolerates "already gone" errors - runtime-ports.json atomic write (tmp + rename) - Tighter orphan cleanup: prefer launchd label + runtime-ports metadata, fall back to pgrep with node.* prefix and process tree exclusion * test(desktop): add evidence-based update, attach mismatch, and authoritative PID tests - ensureNexuProcessesDead: test launchctl-label-only discovery path - update-install: test critical path lock check (abort vs proceed) - attach identity: test buildSource/userDataPath/appVersion mismatch teardown - quit-handler: test no-window packaged teardown, dev before-quit teardown * test(desktop): expand lifecycle regression coverage * fix(desktop): restore allow-jit in inherit entitlements + add regression guard tests Root cause of nightly white-screen: entitlements.mac.inherit.plist dropped allow-jit, causing V8 to fail mmap with MAP_JIT on macOS 14.7.4+ → renderer OOM → white screen. Fix: add back allow-jit and allow-unsigned-executable-memory alongside inherit. Tests: 8 static analysis tests guard against future entitlement regressions, verifying both parent and inherit plists have the required V8 JIT keys. * test(desktop): strengthen plist tests — value assertions, XML escaping, ProgramArguments order - Entitlements: verify <true/> values (not just key presence), no dangerous entitlements (disable-executable-page-protection, get-task-allow), no duplicate keys, hardenedRuntime enabled in electron-builder - ProgramArguments: exact ordering for controller [node, entry] and openclaw [node, openclaw.mjs, gateway, run], dev --auth none insertion - Openclaw completeness: WorkingDirectory, StandardErrorPath, KeepAlive SuccessfulExit, ThrottleInterval, RunAtLoad=false - XML escaping: &, <, >, ", ' in path fields correctly escaped * fix(desktop): unify teardown paths, fix lsof PID parsing, only retry EADDRINUSE - Extract runTeardownAndExit() helper: dev-close, dev-before-quit, packaged-quit, packaged-no-window all share one code path with try/finally to guarantee app.exit(0) even if teardown throws - Restore onForceQuit semantic: only fires on explicit "Quit Completely" user choice, not on dev-mode or no-window exits - Add step-level console.warn for onBeforeQuit/webServer.close failures - lsof: parse PID column by field position instead of substring matching - Web port retry: only retry on EADDRINUSE, re-throw other errors immediately
Problem
1. Session data loss (introduced in v0.1.6)
The launchd architecture migration (#405) in v0.1.6 changed how
OPENCLAW_STATE_DIRis resolved:manifests.ts~/Library/Application Support/@nexu/desktop/runtime/openclaw/state/~/.nexu/runtime/openclaw/state/The launchd plist didn't pass
OPENCLAW_STATE_DIRto the controller, so it fell back toNEXU_HOME/runtime/openclaw/state(~/.nexu). This silently broke session continuity — users lost all historical conversations, channel state, and other OpenClaw runtime data.2. "App can no longer be opened" after quit
"Quit Completely" used
app.quit()which only requests a quit — event handlers could delay or prevent it. Dangling handles (timers, sockets) kept the Electron process alive, leaving staleSingletonLockfiles that blocked relaunch.3. Quit caused launchd respawn loop
An earlier fix attempted
SIGTERM → bootoutordering, but launchd'sKeepAlive.SuccessfulExit=falseautomatically respawned the process after SIGTERM, causing a loop before bootout could unregister it.Fix
Session data migration
openclawStateDirfromapp.getPath("userData")again (matching v0.1.5)OPENCLAW_STATE_DIRin the controller plist so it never falls back to the wrong default~/.nexu/runtime/openclaw/state/created during the v0.1.6 window and merge it back to the userData path (per-agent session merge, no overwrites, stamp file for idempotency)Quit flow fix
bootoutfirst (unregisters from launchd, no more respawns), thenwaitForExitpolls until process exits (SIGKILL fallback after 5s timeout)app.exit(0)instead ofapp.quit()after all cleanup is done — terminates immediately, no event handler can block itDocumentation
NEXU_HOMEvs ElectronuserDatasplitDirectory layout after migration
~/.nexu/(NEXU_HOME)config.json,cloud-profiles.json), compiled snapshots, skill ledger, skillhub cache, logs, openclaw-sidecar,nexu.db~/Library/Application Support/@nexu/desktop/(ElectronuserData)agents/(conversations),extensions/(channel state),skills/,openclaw.json, plus Electron internalsThe split is intentional:
NEXU_HOMEholds lightweight user preferences that persist across reinstalls; ElectronuserDataholds heavy runtime state tied to the app lifecycle.Test plan
pnpm typecheckpassespnpm test120/120 passespnpm lintno new errorsstate-migration: migration complete: 13 items migrated🤖 Generated with Claude Code
Summary by CodeRabbit
v0.1.7 Release Notes
New Features
Bug Fixes
Documentation
~/.nexu/, while runtime state now resides in~/Library/Application Support/@nexu/desktop/.Chores