fix(desktop): v0.1.7 hotfix — restore session data & fix quit race by lefarcen · Pull Request #526 · nexu-io/nexu

lefarcen · 2026-03-25T03:15:09Z

Problem

1. Session data loss (introduced in v0.1.6)

The launchd architecture migration (#405) in v0.1.6 changed how OPENCLAW_STATE_DIR is resolved:

	v0.1.5	v0.1.6
Path source	Desktop explicitly passes env var via `manifests.ts`	Controller falls back to default
Packaged path	`~/Library/Application Support/@nexu/desktop/runtime/openclaw/state/`	`~/.nexu/runtime/openclaw/state/`

The launchd plist didn't pass OPENCLAW_STATE_DIR to the controller, so it fell back to NEXU_HOME/runtime/openclaw/state (~/.nexu). This silently broke session continuity — users lost all historical conversations, channel state, and other OpenClaw runtime data.

2. "App can no longer be opened" after quit

"Quit Completely" used app.quit() which only requests a quit — event handlers could delay or prevent it. Dangling handles (timers, sockets) kept the Electron process alive, leaving stale SingletonLock files that blocked relaunch.

3. Quit caused launchd respawn loop

An earlier fix attempted SIGTERM → bootout ordering, but launchd's KeepAlive.SuccessfulExit=false automatically respawned the process after SIGTERM, causing a loop before bootout could unregister it.

Fix

Session data migration

Restore canonical path: Packaged mode derives openclawStateDir from app.getPath("userData") again (matching v0.1.5)
Pass env to controller plist: Explicitly set OPENCLAW_STATE_DIR in the controller plist so it never falls back to the wrong default
One-time migration: On startup, detect data under ~/.nexu/runtime/openclaw/state/ created during the v0.1.6 window and merge it back to the userData path (per-agent session merge, no overwrites, stamp file for idempotency)

Quit flow fix

Correct order: bootout first (unregisters from launchd, no more respawns), then waitForExit polls until process exits (SIGKILL fallback after 5s timeout)
Immediate exit: Use app.exit(0) instead of app.quit() after all cleanup is done — terminates immediately, no event handler can block it

Documentation

Updated AGENTS.md with packaged app directory layout table explaining the NEXU_HOME vs Electron userData split

Directory layout after migration

Directory	Purpose	Survives uninstall
`~/.nexu/` (`NEXU_HOME`)	User config (`config.json`, `cloud-profiles.json`), compiled snapshots, skill ledger, skillhub cache, logs, openclaw-sidecar, `nexu.db`	Yes
`~/Library/Application Support/@nexu/desktop/` (Electron `userData`)	OpenClaw runtime state: `agents/` (conversations), `extensions/` (channel state), `skills/`, `openclaw.json`, plus Electron internals	No

The split is intentional: NEXU_HOME holds lightweight user preferences that persist across reinstalls; Electron userData holds heavy runtime state tied to the app lifecycle.

Test plan

pnpm typecheck passes
pnpm test 120/120 passes
pnpm lint no new errors
Migration verified: cold-start log shows state-migration: migration complete: 13 items migrated
Merged session data present under userData path after migration
"Quit Completely" exits immediately — all processes gone, no stale SingletonLock
Relaunch after quit works without "app can no longer be opened" error

🤖 Generated with Claude Code

Summary by CodeRabbit

v0.1.7 Release Notes

New Features
- Added automatic state migration for upgrading from v0.1.5, with idempotent safeguards to prevent re-running migrations.
Bug Fixes
- Improved application shutdown with sequential service cleanup and exit verification.
Documentation
- Clarified data storage locations: user config remains in ~/.nexu/, while runtime state now resides in ~/Library/Application Support/@nexu/desktop/.
Chores
- Version bump to 0.1.7.

Replace reserved close code 1008 (Policy Violation) with private code 4008 when closing WebSocket connections on error. Code 1008 is reserved for server use and Node.js 22's native WebSocket throws DOMException [InvalidAccessError] when clients attempt to use it, causing the controller process to crash on authentication failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Cherry-pick WebSocket close code fix from PR #365 - Change launchd namespace from com.nexu.* to io.nexu.* - Add progress tracking directory with STATUS, DECISIONS, ISSUES Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Phase 1-3 of launchd architecture refactor: - LaunchdManager: wrapper for launchctl commands (install, start, stop, status, graceful shutdown) - PlistGenerator: generates launchd plist XML for Controller and OpenClaw services with proper env vars and dependencies - EmbeddedWebServer: serves static files and proxies API requests to Controller, replacing the web sidecar process Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- launchd-bootstrap.ts: complete bootstrap flow for launchd-based startup (install services, start controller, start openclaw, start embedded web server) - Feature flag NEXU_USE_LAUNCHD=1 for gradual rollout - Unified log directory at ~/.nexu/logs/ - Path resolution for packaged vs dev environments - Index file exporting all services Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- quit-handler.ts: handles before-quit event with dialog - Options: Quit Completely (stop services), Run in Background, Cancel - Graceful shutdown of launchd services - Exported via services/index.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

scripts/dev-launchd.sh provides: - start: generate plists, bootstrap and start services - stop: gracefully stop services - restart: stop then start - status: show launchd service status - logs: tail all log files Uses io.nexu.*.dev labels and ~/.nexu/logs/ for logging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add launchd service imports - Add runLaunchdColdStart function that uses bootstrapWithLaunchd - Check NEXU_USE_LAUNCHD=1 flag to choose bootstrap mode - Install launchd quit handler after successful launchd bootstrap - Modify before-quit handler to skip orchestrator cleanup in launchd mode - Derive openclaw paths from nexuHome config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- launchd-manager.test.ts: tests for LaunchdManager class and SERVICE_LABELS - plist-generator.test.ts: tests for generatePlist function Tests cover: - Platform check (darwin only) - Default and custom plist directories - UID-based domain construction - Dev vs prod label generation - Plist XML generation with correct structure - XML character escaping - Log path configuration - Service dependencies Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix OpenClaw config paths to match controller defaults in env.ts (OPENCLAW_STATE_DIR=~/.nexu/runtime/openclaw/state) - Add `gateway` subcommand to OpenClaw plist generation - Use OPENCLAW_CONFIG_PATH env var instead of --config argument - Add --auth none for dev mode to simplify local development - Update tests to verify OPENCLAW_CONFIG_PATH env var presence Tested with ./scripts/dev-launchd.sh - Controller and OpenClaw WebSocket connection verified working. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… workflow - Add "starting" RuntimeStatus: when OpenClaw gateway is unreachable but process is alive, show "启动中" instead of "已离线" - Parallelize launchd service install/start + web server (Promise.all) - Use adaptive readiness polling (50ms→250ms) instead of fixed 250ms - Fix dev-launchd.sh stop: use bootout directly instead of SIGTERM+bootout race with KeepAlive; use SIGKILL for Electron to bypass quit handler - Dev quit handler keeps services running (run-in-background) so vite HMR restarts don't kill launchd services - Add tool progress prompt to nexu-platform-bootstrap plugin - Disable humanDelay in config compiler - Cold start time reduced from ~5s to ~2s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Stop: wait for ports to free after bootout, SIGKILL orphans including chrome_crashpad_handler - Fix resolveLaunchdPaths for packaged mode: OpenClaw is at runtime/openclaw/node_modules/openclaw/openclaw.mjs, not runtime/openclaw-runtime/openclaw.mjs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tartup When the WebSocket to OpenClaw gateway isn't connected yet (during startup), channels were shown as "disconnected" (red). Now they show as "connecting" (yellow pulse) when the runtime is still starting, giving users a much less alarming startup experience. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add explicit "booting" → "ready" lifecycle to ControllerRuntimeState. During boot, gateway-unreachable is always treated as "starting" (not "unhealthy"), regardless of whether the process manager owns the OpenClaw process (fixes launchd mode where processManager.isAlive() returns false). Channel live status also uses bootPhase to show "connecting" during startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The embedded web server serves static files from apps/web/dist, so code changes to the web app require a build step before starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bootPhase was set to "ready" immediately after wsClient.connect(), but the WS handshake hadn't completed yet. Health loop then saw gateway-unreachable + bootPhase=ready → "unhealthy" → UI showed "已离线" during startup. Now bootPhase transitions to "ready" inside the onConnected callback, so the entire startup shows "starting" → "active" cleanly. Also adds temporary debug logs to home.tsx for startup diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Channel error status now shows translated lastError (e.g. "会话已过期" instead of generic "错误") - Controller maps WeChat "not configured" + not running to "session expired" for better UX - Add i18n keys for common channel errors (session expired, not configured, disabled) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use warning color (orange) instead of danger (red) for known recoverable errors like session expired, with actionable label "请重新连接" / "Reconnect required". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The exponential backoff for OpenClaw WebSocket reconnection could reach 16s+ during startup, causing the UI to stay in "starting" state for 20+ seconds. Cap at 4s so retry sequence is 1→2→4→4→4s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…way up When the health loop detects the gateway HTTP endpoint becomes reachable, it calls wsClient.retryNow() to cancel the backoff timer and connect immediately. This eliminates the 4-16s gap between gateway ready and WS connected during startup. Also replaces the ugly "Starting local services..." loading screen with a minimal Nexu logo pulse animation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- main.tsx had a duplicate SurfaceFrame with old loading text; replaced with Nexu logo pulse animation matching surface-frame.tsx - dev-launchd.sh now checks for dist/index.html and rebuilds desktop if missing, preventing blank screen after accidental dist deletion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace plain loading screen with animated Nexu logo matching the design system prototype (NexuLoader.tsx). Four quadrants light up sequentially in brand colors: orange, green, pink, gold. Pure CSS animation, no framer-motion dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

main.tsx had its own SurfaceFrame copy with the old loading screen. Now imports from components/surface-frame.tsx so both Runtime Console and Desktop Shell views use the same 4-color Nexu loader. Background updated to dark radial gradient matching desktop theme. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Loader now overlays on top of the webview instead of replacing it. The webview loads silently in the background while the Nexu logo animation plays. When the webview fires dom-ready, the loader disappears — no blank frames, no intermediate Loader2 spinner. Background uses warm radial gradient for polished appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The spinning circle was briefly visible between the Nexu splash loader and the actual UI. Replace with an empty div since the desktop splash overlay already covers the loading period. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Run openclawProcess.prepare(), ensureRuntimeModelPlugin(), and prepareDesktopCloudModelsForBootstrap() in parallel (were sequential) - Remove redundant compileCurrentConfig() call for preSeedConfigHash — doSync() already seeds the hash via noteConfigWritten() - Reduce WS initial backoff from 1000ms to 500ms (sequence: 500→1000→2000→4000→4000...) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Quit dialog now shows Chinese or English based on app.getLocale(). Chinese users see "完全退出 / 后台运行 / 取消". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Packaged macOS builds now use launchd mode by default (no env var needed). This enables the quit dialog, crash recovery, and background service support in production. Can be explicitly disabled with NEXU_USE_LAUNCHD=0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix path traversal vulnerability in embedded-web-server (sanitize URL pathname, reject paths outside webRoot) - Install launchd quit handler after try/catch so it works even if auth bootstrap fails - Add error handling to quitWithDecision (was missing try/catch) - Fix isAlive() handling undefined pid from failed spawn - Rename NODE_PATH to NODE_BIN in dev script to avoid Node.js env var conflict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- openclaw-config-writer: additional config writing logic - plist-generator: updated plist generation + tests - package.json: script updates for launchd dev workflow - openclaw-weixin accounts.ts: prior session changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On startup, detect already-running launchd services and reuse them instead of cold-starting. This enables instant resume after "Run in Background" (packaged) or dev restart. Attach flow: 1. Read runtime-ports.json for port metadata from previous session 2. Validate isDev mode and NEXU_HOME match 3. Extract env vars from running services via launchctl print 4. Probe controller /health and openclaw port 5. If all healthy, start embedded web server and attach Fallback: if attach fails (stale services, env mismatch, unhealthy), tear down and cold start as before. Also: - LaunchdManager.getServiceStatus() now parses environment variables - runtime-ports.json written on cold start, deleted on quit-completely - Port occupier detection kills rogue processes on openclaw port - index.ts overrides runtimeConfig with attached ports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

esbuild doesn't support typeof import() expressions. Use static import { createConnection } from "node:net" instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When Electron allocates a non-default port (e.g. 18790 because 18789 is occupied), the controller needs to know this port for both the config compiler and WS/health connections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Node.js rmSync with recursive+force can fail with ENOTEMPTY on macOS. Fall back to execFileSync rm -rf which handles this reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Node.js rmSync can silently fail on macOS (ENOTEMPTY race). Now uses rm -rf exclusively with existence check, and retries up to 3 times with 1s pause between attempts. This ensures first-launch sidecar extraction succeeds reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace binary attach-or-cold-start with unified per-service flow: - Each service independently checked: running+healthy → keep, else restart - Ports recovered from runtime-ports.json when any service is still running - NEXU_HOME validated to prevent cross-environment attach - Missing services started with correct recovered ports - Unhealthy running services torn down and restarted This enables partial attach: if only OpenClaw survived a crash, the next launch reuses its port and only cold-starts the controller. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

getLogDir() now accepts nexuHome param so dev mode writes launchd service logs to .tmp/desktop/nexu-home/logs/ instead of ~/.nexu/logs/. Also updates AGENTS.md with correct directory layout for dev vs packaged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rewrites desktop-startup-flow.md with full implementation details: - Directory layout for dev vs packaged (with tree diagrams) - Label isolation between dev (.dev) and packaged modes - Unified bootstrap flow (attach + cold start per-service) - Port architecture and auto-allocation - Attach mechanism (full, partial, fallback) - Status display timeline - File watch hot reload - Exit behavior - OpenClaw sidecar extraction - Complete key files reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve conflicts keeping our attach/isolation/hot-reload improvements while incorporating main's changes: - Brand name: "Nexu" → "nexu" (lowercase, from bb1ddcc) - UI polish from 2fb384c - surface-frame: white background from main's design update Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix web port fallback: increment port instead of port 0, record actual port in effectivePorts for runtime-ports.json - Fix quit handler: catch deleteRuntimePorts errors to prevent blocking quit - Fix dev script: initialize watcher PIDs before trap to avoid set -u errors - Fix docs: probe endpoint is /api/auth/get-session not /health Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ess-architecture

Two critical fixes for v0.1.7 hotfix: 1. State directory regression: v0.1.6 changed OPENCLAW_STATE_DIR from Electron userData (~/.../Application Support/@nexu/desktop/runtime/openclaw/state) to NEXU_HOME (~/.nexu/runtime/openclaw/state), silently losing all historical conversations. This restores userData as the canonical path for packaged builds and adds a one-time migration to merge any data created under ~/.nexu during the v0.1.6 window. 2. Quit race condition: bootoutService() sends an async unregister to launchd but doesn't wait for the process to exit. Relaunching immediately hits port conflicts / "open failed". Now calls stopServiceGracefully() (SIGTERM + wait + SIGKILL) before bootout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-25T03:15:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Mode-dependent OpenClaw state path selection was introduced (dev vs packaged). Packaged mode may migrate legacy state from nexuHome into Electron userData. The computed state dir is exposed to launchd via OPENCLAW_STATE_DIR. Launchd shutdown now stops services per-label and waits for exit.

Changes

Cohort / File(s)	Summary
State Dir & Migration `apps/desktop/main/index.ts`, `apps/desktop/main/services/state-migration.ts`	Compute `openclawStateDir` differently for dev vs packaged; add `getLegacyNexuHomeStateDir()` and `migrateOpenclawState()` to perform an idempotent one-time merge from legacy `nexuHome` into Electron `userData` when needed.
Launchd plist & Env `apps/desktop/main/services/plist-generator.ts`	Add `OPENCLAW_STATE_DIR` to generated plist `<EnvironmentVariables>` (XML-escaped) and adjust template boundaries to insert it correctly.
Launchd lifecycle & Quit `apps/desktop/main/services/quit-handler.ts`, `apps/desktop/main/services/launchd-manager.ts`	Stop services in per-label sequence, add `LaunchdManager.waitForExit(label, timeoutMs)` that polls status and attempts SIGKILL after timeout; update quit flow to use `app.exit(0)` for forced exits and adjust background/run behavior.
Docs & Version `AGENTS.md`, `apps/desktop/package.json`	Document packaged-mode split between `NEXU_HOME` and Electron `userData` for OpenClaw state and bump package version `0.1.6` → `0.1.7`.

Sequence Diagram

sequenceDiagram
    participant App as Desktop App (Cold Start)
    participant Mode as Mode Detector
    participant Path as Path Resolver
    participant Migration as State Migration
    participant Plist as Plist Generator
    participant Launchd as Launchd Bootstrap

    App->>Mode: determine mode (dev / packaged)
    Mode-->>Path: provide mode
    Path->>Path: compute openclawStateDir (dev: nexuHome/... | packaged: userData/...)
    alt packaged && legacy differs
        Path->>Migration: migrate legacy state (source: nexuHome -> target: userData)
        Migration-->>Path: migration complete
    end
    Path-->>App: final openclawStateDir
    App->>Plist: include OPENCLAW_STATE_DIR in env
    Plist-->>Launchd: generate plist with env
    App->>Launchd: bootstrap services with state dir
    Launchd-->>App: services started

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(desktop): launchd attach, partial attach, dev isolation & hot reload #519 — Overlapping changes to OPENCLAW_STATE_DIR resolution, plist env, and launchd lifecycle handling.
feat(desktop): launchd-based process architecture #405 — Prior launchd/bootstrap and plist/env work that this change builds upon.

Suggested reviewers

PerishCode
mrcfps
anthhub

Poem

🐇 I hop through dirs at break of dawn,

I tuck old files where new roots spawn,
One tidy merge, no crumbs remain,
Launchd hums, services wake again.
v0.1.7 — a carrot-coded dawn!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the PR's main changes: fixing v0.1.7 session data loss and quit race issues, matching the core problems and solutions described in the PR.
Description check	✅ Passed	The description comprehensively covers all template sections: clear problem statement with specific v0.1.5/v0.1.6 comparison, detailed fixes addressing all three issues, documentation updates, test results, and implementation notes for reviewers.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/launchd-process-architecture

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-03-25T03:15:35Z

Deploying nexu-docs with Cloudflare Pages

Latest commit:	`0457102`
Status:	✅ Deploy successful!
Preview URL:	https://432b4216.nexu-docs.pages.dev
Branch Preview URL:	https://refactor-launchd-process-arc.nexu-docs.pages.dev

View logs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04571023d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T03:18:15Z

+  if (!existsSync(sourceStateDir)) {
+    log(`source not found: ${sourceStateDir}, nothing to migrate`);
+    writeStamp(stampPath);


Create target state directory before writing migration stamp

When sourceStateDir is missing, this branch writes .v016-migration-done immediately, but targetStateDir is only created later. On fresh packaged installs (no legacy ~/.nexu/runtime/openclaw/state yet), writeFileSync throws ENOENT because .../runtime/openclaw/state does not exist, causing cold-start failure before launchd bootstrap completes. Ensure mkdirSync(targetStateDir, { recursive: true }) runs before any stamp write path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-25T03:18:15Z

+    if (!existsSync(targetFile)) {
+      cpSync(sourceFile, targetFile);


Merge conflicting session files instead of skipping them

This migration only copies session files when the target file does not exist, so if the same session key exists in both locations (for example stable channel/session keys that map to the same *.jsonl filename), all conversation updates produced during v0.1.6 in sourceStateDir are silently dropped. The hotfix goal is to restore sessions, but this logic preserves the older target file and discards newer source history on filename collisions.

Useful? React with 👍 / 👎.

coderabbitai

🧹 Nitpick comments (1)

apps/desktop/main/services/quit-handler.ts (1)

209-220: Consider extracting the shared shutdown logic.

The stop-and-bootout loop is duplicated from installLaunchdQuitHandler. Extracting to a helper function (e.g., stopAndBootoutServices) would reduce duplication and simplify future changes.

♻️ Proposed refactor

async function stopAndBootoutServices(
  launchd: LaunchdManager,
  labels: string[],
): Promise<void> {
  for (const label of labels) {
    try {
      await launchd.stopServiceGracefully(label, 5000);
    } catch {
      // May already be stopped
    }
    try {
      await launchd.bootoutService(label);
    } catch (err) {
      console.error(`Error booting out ${label}:`, err);
    }
  }
}

Then use it in both locations:

await stopAndBootoutServices(opts.launchd, [opts.labels.openclaw, opts.labels.controller]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/quit-handler.ts` around lines 209 - 220, Extract
the duplicated stop-and-bootout loop into a helper async function (e.g.,
stopAndBootoutServices) that accepts a LaunchdManager and an array of labels,
then call that helper from both installLaunchdQuitHandler and the current
quit-handler code; the helper should iterate labels and for each call
launchd.stopServiceGracefully(label, 5000) with a try/catch (silent on failure)
and then call launchd.bootoutService(label) with a try/catch that logs the error
(retain the existing console.error message). Update the two call sites to invoke
stopAndBootoutServices(opts.launchd, [opts.labels.openclaw,
opts.labels.controller]) to remove duplication.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 209-220: Extract the duplicated stop-and-bootout loop into a
helper async function (e.g., stopAndBootoutServices) that accepts a
LaunchdManager and an array of labels, then call that helper from both
installLaunchdQuitHandler and the current quit-handler code; the helper should
iterate labels and for each call launchd.stopServiceGracefully(label, 5000) with
a try/catch (silent on failure) and then call launchd.bootoutService(label) with
a try/catch that logs the error (retain the existing console.error message).
Update the two call sites to invoke stopAndBootoutServices(opts.launchd,
[opts.labels.openclaw, opts.labels.controller]) to remove duplication.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2dc332cf-bfbd-4d37-b66d-b50a00c484d6

📥 Commits

Reviewing files that changed from the base of the PR and between 262fc9b and 0457102.

📒 Files selected for processing (5)

apps/desktop/main/index.ts
apps/desktop/main/services/plist-generator.ts
apps/desktop/main/services/quit-handler.ts
apps/desktop/main/services/state-migration.ts
apps/desktop/package.json

- Fix quit flow: bootout first (unregister from launchd), then wait for process exit. Previous order (SIGTERM → bootout) caused launchd KeepAlive to respawn the process before bootout could unregister it. - Add LaunchdManager.waitForExit() — polls status after bootout, falls back to SIGKILL by PID if process persists beyond timeout. - Document packaged app directory layout in AGENTS.md: NEXU_HOME (~/.nexu) for user config, Electron userData for OpenClaw runtime state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/desktop/main/services/launchd-manager.ts`:
- Around line 251-256: The loop in waitForExit incorrectly treats a transient
getServiceStatus failure mapped to status "unknown" as a definite exit; update
waitForExit (and mirror the same fix in stopServiceGracefully) to distinguish
definite states ("stopped") from transient "unknown" by retrying on "unknown"
instead of immediately returning: keep polling until timeoutMs elapses, count
consecutive "unknown" responses (or allow retries for a short backoff window,
e.g., a few attempts/poll intervals) and only give up/return early if you
observe a definitive non-"running" state such as "stopped" or if the
unknown-count exceeds a small threshold, ensuring the existing SIGKILL/timeout
fallback still fires when appropriate; reference getServiceStatus, waitForExit,
stopServiceGracefully and the status values "running"/"stopped"/"unknown" when
making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b088a0a7-9000-40d7-bea2-d715404cfd56

📥 Commits

Reviewing files that changed from the base of the PR and between 0457102 and 365c535.

📒 Files selected for processing (3)

AGENTS.md
apps/desktop/main/services/launchd-manager.ts
apps/desktop/main/services/quit-handler.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/desktop/main/services/quit-handler.ts

After app.quit(), dangling handles (timers, sockets) can keep the Electron event loop alive indefinitely. Add a 3s safety net that calls process.exit(0) if the process hasn't exited by then. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

apps/desktop/main/services/quit-handler.ts (1)

138-151: Extract shared shutdown flow to a helper to prevent drift.

The same bootout/wait/force-exit sequence now exists in two places; centralizing it will reduce divergence risk in future hotfixes.

Also applies to: 217-228, 234-239

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/quit-handler.ts` around lines 138 - 151, Extract
the bootout/wait/ignore-failure sequence into a single helper (e.g.,
shutdownLaunchdService or bootoutAndAwaitExit) that accepts a launchd client and
a service label, calls opts.launchd.bootoutService(label), awaits
opts.launchd.waitForExit(label, 5000), catches and logs bootout errors and
ignores wait errors (best-effort), and use that helper in place of the
duplicated for-loops that currently call opts.launchd.bootoutService and
opts.launchd.waitForExit (the blocks referencing opts.labels.openclaw,
opts.labels.controller and the duplicated ranges noted) so both sites call the
new helper instead of repeating the sequence.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 146-150: The empty catch around opts.launchd.waitForExit(label,
5000) swallows real errors; update both occurrences (the try/catch blocks around
opts.launchd.waitForExit at the two spots) to catch the error and log it with
context (include label and timeout) using the existing logger (e.g.,
processLogger or opts.logger) or rethrow if it indicates a real failure, rather
than leaving the catch empty. Ensure the log message clearly states the
operation ("waitForExit" for label) and includes the error details to aid
debugging.
- Around line 216-217: The code currently calls app.quit() and schedules
process.exit(0) even when quitWithDecision(decision) is called with
"run-in-background"; update the control flow in quitWithDecision so only the
"quit-completely" branch performs app.quit() and schedules process.exit(0) (and
sends shutdown messages to opts.labels.openclaw / opts.labels.controller), while
the "run-in-background" branch must avoid calling app.quit() or process.exit and
instead only hides/minimizes or releases resources as intended; locate and
remove or relocate any app.quit() and process.exit(0) calls outside the strict
decision === "quit-completely" check (including the similar block around the
code handling lines ~231-239) so the decision contract for "run-in-background"
is preserved.
- Around line 143-145: The catch blocks currently call console.error with
free-form strings and the raw err object (e.g., the block that logs `Error
booting out ${label}:`, err) — change these to structured JSON logs and avoid
passing raw error objects: replace the free-form console.error calls with a JSON
object that includes a descriptive message field, the label variable, and a
sanitized error object (only message, code, and truncated stack or sanitized
fields), ensuring no credentials are included; apply the same change to the
other catch sites referenced (the console.error uses around the label handling
and the ones at the other noted ranges).

---

Nitpick comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 138-151: Extract the bootout/wait/ignore-failure sequence into a
single helper (e.g., shutdownLaunchdService or bootoutAndAwaitExit) that accepts
a launchd client and a service label, calls opts.launchd.bootoutService(label),
awaits opts.launchd.waitForExit(label, 5000), catches and logs bootout errors
and ignores wait errors (best-effort), and use that helper in place of the
duplicated for-loops that currently call opts.launchd.bootoutService and
opts.launchd.waitForExit (the blocks referencing opts.labels.openclaw,
opts.labels.controller and the duplicated ranges noted) so both sites call the
new helper instead of repeating the sequence.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e168aa53-b7aa-45c1-82ea-b30feae75f09

📥 Commits

Reviewing files that changed from the base of the PR and between 365c535 and defc427.

📒 Files selected for processing (1)

apps/desktop/main/services/quit-handler.ts

coderabbitai · 2026-03-25T03:55:53Z

          } catch (err) {
            console.error(`Error booting out ${label}:`, err);
          }


⚠️ Potential issue | 🟠 Major

Use structured/sanitized logs in the new quit path.

New logging uses free-form strings and raw err objects; switch to structured JSON fields and avoid logging raw errors directly.

As per coding guidelines, "Logging: structured (pino or console JSON), never log credentials."

Also applies to: 166-168, 220-222, 235-237

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/quit-handler.ts` around lines 143 - 145, The catch blocks currently call console.error with free-form strings and the raw err object (e.g., the block that logs `Error booting out ${label}:`, err) — change these to structured JSON logs and avoid passing raw error objects: replace the free-form console.error calls with a JSON object that includes a descriptive message field, the label variable, and a sanitized error object (only message, code, and truncated stack or sanitized fields), ensuring no credentials are included; apply the same change to the other catch sites referenced (the console.error uses around the label handling and the ones at the other noted ranges).

app.quit() only requests a quit — event handlers can delay or prevent it, and dangling handles keep the process alive. This left stale SingletonLock files that blocked relaunch ("app can no longer be opened"). Switch to app.exit(0) which terminates immediately after services have been bootout'd. All cleanup (flush logs, stop services, delete ports file) happens before this call, so nothing is lost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

apps/desktop/main/services/quit-handler.ts (2)

143-150: ⚠️ Potential issue | 🟠 Major

Use structured, sanitized logs for launchd shutdown failures.

These catch blocks still log raw err values for bootoutService() and suppress waitForExit() failures entirely. That violates the repo logging rules and leaves the quit-race path blind when launchctl or status polling fails.

As per coding guidelines, "Logging: structured (pino or console JSON), never log credentials."

Also applies to: 213-220

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/quit-handler.ts` around lines 143 - 150, Replace
raw/unsanitized logging in the bootoutService() catch and the
opts.launchd.waitForExit() catch with structured JSON logs (using the repo
logger, e.g., processLogger or pino) that include a short context string, the
service label, and only sanitized error fields (e.g., err.message and err.code)
— do not log full stacks or raw err objects or any sensitive fields; for the
waitForExit() catch, emit a best-effort level log (warn/info) noting that exit
polling failed with the sanitized error and label so the quit-race path is
observable; apply the same change to the similar block around lines 213-220.

197-225: ⚠️ Potential issue | 🟠 Major

quitWithDecision("run-in-background") still tears down and exits.

This path now runs quit-only cleanup before branching, then unconditionally calls app.exit(0). It no longer matches the interactive "run-in-background" behavior, which just hides the window and keeps the app alive.

Proposed fix

 export async function quitWithDecision(
   decision: "quit-completely" | "run-in-background",
   opts: QuitHandlerOptions,
 ): Promise<void> {
+  if (decision === "run-in-background") {
+    BrowserWindow.getAllWindows()[0]?.hide();
+    return;
+  }
+
   try {
     await opts.onBeforeQuit?.();
   } catch (err) {
     console.error("Error in onBeforeQuit:", err);
   }
@@
-  if (decision === "quit-completely") {
-    for (const label of [opts.labels.openclaw, opts.labels.controller]) {
-      try {
-        await opts.launchd.bootoutService(label);
-      } catch (err) {
-        console.error(`Error booting out ${label}:`, err);
-      }
-      try {
-        await opts.launchd.waitForExit(label, 5000);
-      } catch {
-        // Best effort
-      }
-    }
+  for (const label of [opts.labels.openclaw, opts.labels.controller]) {
+    try {
+      await opts.launchd.bootoutService(label);
+    } catch (err) {
+      console.error(`Error booting out ${label}:`, err);
+    }
+    try {
+      await opts.launchd.waitForExit(label, 5000);
+    } catch {
+      // Best effort
+    }
   }
 
   (app as unknown as Record<string, unknown>).__nexuForceQuit = true;
   app.exit(0);
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/quit-handler.ts` around lines 197 - 225, The
current flow always performs teardown and calls app.exit(0) regardless of
decision; change it so only the quit-completely path performs teardown and exit
while run-in-background just hides/keeps the app alive. Concretely: move the
webServer closing, launchd bootout/waitForExit loop, the __nexuForceQuit
assignment, and app.exit(0) into the branch that checks decision ===
"quit-completely"; for decision === "run-in-background" call the existing
window-hide logic (or return early) and do not call app.exit; keep the existing
try/catch blocks for opts.onBeforeQuit, opts.webServer, and launchd methods
(bootoutService, waitForExit) but only invoke them when handling quit-completely
so run-in-background does not tear down the process.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 138-139: waitForExit() can return immediately after sending
SIGKILL on timeout, allowing deleteRuntimePorts() and app.exit(0) to run while
the service is still tearing down and causing a relaunch race; update
waitForExit() (called from launchd-manager.ts paths around lines referenced) to,
on timeout->SIGKILL, re-poll the target process (with a short loop and backoff)
until it is actually gone or a second hard timeout elapses, and only then return
success, or alternatively treat the initial timeout as a hard failure and
propagate an error to prevent calling deleteRuntimePorts()/app.exit(0); modify
callers (where waitForExit() is used) to respect the propagated failure if you
choose the hard-fail approach.

---

Duplicate comments:
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 143-150: Replace raw/unsanitized logging in the bootoutService()
catch and the opts.launchd.waitForExit() catch with structured JSON logs (using
the repo logger, e.g., processLogger or pino) that include a short context
string, the service label, and only sanitized error fields (e.g., err.message
and err.code) — do not log full stacks or raw err objects or any sensitive
fields; for the waitForExit() catch, emit a best-effort level log (warn/info)
noting that exit polling failed with the sanitized error and label so the
quit-race path is observable; apply the same change to the similar block around
lines 213-220.
- Around line 197-225: The current flow always performs teardown and calls
app.exit(0) regardless of decision; change it so only the quit-completely path
performs teardown and exit while run-in-background just hides/keeps the app
alive. Concretely: move the webServer closing, launchd bootout/waitForExit loop,
the __nexuForceQuit assignment, and app.exit(0) into the branch that checks
decision === "quit-completely"; for decision === "run-in-background" call the
existing window-hide logic (or return early) and do not call app.exit; keep the
existing try/catch blocks for opts.onBeforeQuit, opts.webServer, and launchd
methods (bootoutService, waitForExit) but only invoke them when handling
quit-completely so run-in-background does not tear down the process.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63d35049-6b1d-42e8-919c-c819df5bea7c

📥 Commits

Reviewing files that changed from the base of the PR and between defc427 and dab4ed9.

📒 Files selected for processing (1)

apps/desktop/main/services/quit-handler.ts

- state-migration: mkdirSync targetStateDir before writing stamp on fresh installs where source doesn't exist (P0 — prevents ENOENT) - waitForExit: distinguish "stopped" from "unknown" status, retry on transient "unknown" reads (3 consecutive = treat as exited), re-poll after SIGKILL to confirm process is actually gone - quitWithDecision: "run-in-background" no longer calls app.exit(), just hides window and keeps services running - Replace empty catches with console.warn for waitForExit failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/desktop/main/services/launchd-manager.ts`:
- Around line 277-285: The current waitForExit flow swallows failures: the empty
catch after attempting to kill the process and the silent success when the
recheck loop still finds the service running can hide failed shutdowns; update
the waitForExit implementation (the try/catch around the kill logic and the
re-poll loop that calls getServiceStatus(label)) to surface failures—either
throw a descriptive error or call the appropriate logger when the kill attempt
throws and when, after the 5 rechecks, recheck.status === "running" still
holds—include the label and any caught error details in the message so callers
can detect and handle failed force-kills.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e99dc0b-c130-49ba-93cc-3e67199766c9

📥 Commits

Reviewing files that changed from the base of the PR and between dab4ed9 and 0c40f53.

📒 Files selected for processing (3)

apps/desktop/main/services/launchd-manager.ts
apps/desktop/main/services/quit-handler.ts
apps/desktop/main/services/state-migration.ts

✅ Files skipped from review due to trivial changes (1)

apps/desktop/main/services/quit-handler.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/desktop/main/services/state-migration.ts

coderabbitai · 2026-03-25T04:24:27Z

+      } catch {
+        // Process may have exited between check and kill
+      }
+      // Re-poll briefly to confirm kill took effect
+      for (let i = 0; i < 5; i++) {
+        await new Promise((r) => setTimeout(r, 200));
+        const recheck = await this.getServiceStatus(label);
+        if (recheck.status !== "running") return;
+      }


⚠️ Potential issue | 🟠 Major

Don't silently succeed when force-kill fails.

On Line 277 and Line 284 paths, waitForExit can fail to terminate the process and still return with no warning. That makes shutdown failures invisible and can mask stale-lock/respawn regressions.

Proposed fix

if (status.pid) { try { process.kill(status.pid, "SIGKILL"); - } catch { - // Process may have exited between check and kill + } catch (err) { + console.warn( + `Failed to SIGKILL ${label} (pid ${status.pid}):`, + err instanceof Error ? err.message : err, + ); } // Re-poll briefly to confirm kill took effect for (let i = 0; i < 5; i++) { await new Promise((r) => setTimeout(r, 200)); const recheck = await this.getServiceStatus(label); if (recheck.status !== "running") return; } + console.warn( + `Service ${label} still appears running after SIGKILL fallback`, + ); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/launchd-manager.ts` around lines 277 - 285, The current waitForExit flow swallows failures: the empty catch after attempting to kill the process and the silent success when the recheck loop still finds the service running can hide failed shutdowns; update the waitForExit implementation (the try/catch around the kill logic and the re-poll loop that calls getServiceStatus(label)) to surface failures—either throw a descriptive error or call the appropriate logger when the kill attempt throws and when, after the 5 rechecks, recheck.status === "running" still holds—include the label and any caught error details in the message so callers can detect and handle failed force-kills.

… in launchd mode Controller plist was missing 13 critical environment variables compared to the daemon-supervisor manifest path (OPENCLAW_CONFIG_PATH, OPENCLAW_SKILLS_DIR, NODE_PATH, WEB_URL, HOST, etc.), causing the launchd-managed controller to fail skill loading, config compilation, and module resolution. Also skip module-level port probing when launchd mode is active — the bootstrap has its own port recovery via runtime-ports.json and handles leftover processes gracefully. This prevents startup crashes when residual services occupy the preferred ports. Additionally includes orchestrator launchd integration: enableLaunchdMode(), refreshLaunchdUnits(), launchd log tailing, and start/stop delegation to LaunchdManager. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lefarcen · 2026-03-25T11:22:40Z

Superseded by #544 — consolidated hotfix targeting release/v0.1.6 with all fixes squashed into a single commit.

…, not path.resolve Replace 70 trivial path.resolve assertions with 27 behavior-focused tests that call real functions and check real output: generatePlist output (25 tests): Call the real function with production-realistic inputs, parse the XML, and verify every env var value matches expected paths. Key checks: - NEXU_HOME → ~/.nexu (not under userData) - OPENCLAW_STATE_DIR → under userData (not under NEXU_HOME) - OPENCLAW_CONFIG_PATH consistent with OPENCLAW_STATE_DIR - OPENCLAW_SKILLS_DIR consistent with OPENCLAW_STATE_DIR - OpenClaw plist has NO NEXU_HOME (it doesn't use it) These would have caught the #526 bug. runtime-config resolution (2 tests): Call getDesktopRuntimeConfig with different env objects, verify the NEXU_HOME priority chain: env > buildConfig > ~/.nexu default. Real launchd env verification (macOS, 2 tests): Start a real launchd service, read launchctl print output, parse the environment block, verify NEXU_HOME and OPENCLAW_STATE_DIR are what we set and are different directories.

@lefarcen

…563) * fix(desktop): robust lifecycle teardown for quit and update-install The update-install path previously called orchestrator.dispose() without properly booting out launchd services, causing macOS to report "app is still running" when the installer tried to replace the .app bundle. The quit-completely path had a similar issue: after bootout, waitForExit could not SIGKILL processes whose launchd label was already unregistered. Changes: - LaunchdManager.bootoutAndWaitForExit: captures PID before bootout so the SIGKILL fallback works even after the label is unregistered. - LaunchdManager.waitForExit: accepts optional knownPid parameter; uses process.kill(pid, 0) to verify death when launchctl print returns "unknown". - teardownLaunchdServices: new shared function used by both quit-handler and update-manager. Bootouts each service with PID-aware waiting, deletes runtime-ports.json, and kills orphan processes via pgrep. - ensureNexuProcessesDead: polling verification gate that loops pgrep + SIGKILL until all Nexu sidecar processes are confirmed dead (max 15s). - quitAndInstall: now three-phase — (1) teardown + dispose wrapped in try/catch so failures never block the install, (2) ensureNexuProcessesDead as the hard verification gate, (3) autoUpdater.quitAndInstall. - Bootstrap adds killOrphanNexuProcesses on cold start to clean up residual processes from a previously failed update. Tests: 27 new tests across 3 files covering teardown, PID-aware shutdown, verification gate, and the full update-install sequence. * chore(ci): add pnpm test to CI and launchd lifecycle e2e test - ci.yml: add a `test` job that runs `pnpm test` on ubuntu-latest, covering all 247+ vitest unit tests that were previously not run in CI. - desktop-ci-dist.yml: add real launchd lifecycle e2e test that runs on macOS CI runners before the packaged app build. The test exercises: 1. Bootstrap: register plist → kickstart → verify running + port 2. Teardown: bootout → verify label unregistered → verify process dead 3. SIGKILL fallback: bootout → saved-PID SIGKILL → verify dead 4. Orphan detection: spawn fake orphan → detect via lsof → SIGKILL 5. Re-bootstrap: fresh cold start after full cleanup - scripts/launchd-lifecycle-e2e.sh: standalone e2e test script (15 checks) that validates the launchd process management primitives used by the desktop app's quit and update-install paths. * fix(desktop): prevent Dock icon proliferation + add comprehensive tests Fixes: - dev-env.sh: add Launch Services cache flush (lsregister) after patching LSUIElement=true on the dev Electron binary. Without this, macOS uses cached plist data and still shows Dock icons for child processes. Add verification logging on success/failure. - daemon-supervisor: force ELECTRON_RUN_AS_NODE=1 on all spawn() calls that use process.execPath (Electron binary) as a safety net, even if the manifest env omits it. Tests: - daemon-supervisor.test.ts (10 tests): constructor, startAutoStart, stopUnit SIGTERM/SIGKILL escalation, 5s deadline, stopAll parallel, dispose, skip non-managed, ELECTRON_RUN_AS_NODE safety net, stoppedByUser suppresses restart, dependent stop order. - quit-handler.test.ts (8 tests): quit-completely full sequence, __nexuForceQuit flag, app.exit(0), error resilience for onBeforeQuit and webServer.close, no-plistDir skip, run-in-background hide. - desktop-stop-smoke.sh: post-stop verification script — checks no residual processes, free ports, no launchd labels, no stale state. Integrated into desktop-check-dev.sh for CI. * fix(desktop): route pnpm start through dev-env.sh for LSUIElement patch dev-launchd.sh (pnpm start) was launching Electron directly without going through dev-env.sh, bypassing the LSUIElement=true plist patch and Launch Services cache flush. This is the direct cause of the Dock icon proliferation reported by users running pnpm start. Now pnpm start → dev-launchd.sh → dev-env.sh → electron, matching the same path that pnpm dev uses via dev.sh → dev-run.sh → dev-env.sh. * test(desktop): add dev toolchain invariant tests Static analysis tests that guard critical invariants across the launch, environment, and shutdown scripts. These catch regressions like the dev-launchd.sh bypass of dev-env.sh that caused Dock icon proliferation. 19 invariant checks covering: - Launch paths: all Electron launch commands go through dev-env.sh - LSUIElement: plist patch + LS cache flush present - ELECTRON_RUN_AS_NODE: set in plists, manifests, daemon-supervisor, openclaw-process, and catalog-manager - Shutdown: bootout, orphan kill, port wait, teardown via shared function, try/catch wrapping, verification gate ordering * test(desktop): comprehensive coverage for lifecycle, quit, and update paths Bring test coverage for the 6 core lifecycle files from 63.5% to 83.8% overall statements, with function coverage at 93.8%. Per-file improvements: - update-manager.ts: 63.1% → 98.5% (bindEvents, checkNow, downloadUpdate, periodicCheck, setChannel/setSource, send to webviews) - quit-handler.ts: 36.1% → 97.0% (installLaunchdQuitHandler dialog flow, force-quit bypass, dev-mode bypass, Cmd+Q interception, dialogOpen guard) - launchd-manager.ts: 72.7% → 94.9% (uninstallService, stopServiceGracefully SIGKILL escalation, restartService, rebootstrapFromPlist, hasPlistFile) - daemon-supervisor.ts: 46.5% → 69.5% (startUnit port probe, auto-restart backoff, refreshDelegatedUnits pgrep, stdout/stderr capture, queryEvents) - launchd-bootstrap.ts: 92.6% → 93.1% (isLaunchdBootstrapEnabled packaged heuristic, ensureNexuProcessesDead edge cases) - plist-generator.ts: 100% (unchanged) New test files: daemon-supervisor.test.ts (27), quit-handler-full.test.ts (13), update-manager-full.test.ts (41), launchd-manager-ops-extended.test.ts (11), launchd-bootstrap-edge.test.ts (12). Total: 391 tests across 40 files, all passing. * fix(desktop): stopPeriodicCheck race + real launchd integration tests Fixes: - update-manager: save initial setTimeout ID so stopPeriodicCheck can cancel it during the initial delay window. Previously, calling stopPeriodicCheck before the initial delay expired was a no-op, allowing the interval to start during teardown. - update-manager: call stopPeriodicCheck() at the start of quitAndInstall() to prevent periodic checks from firing mid-teardown. Tests: - launchd-integration.test.ts: 8 tests that run REAL launchd on macOS (skipped on other platforms). Covers: 1. installService + startService → real service running on real port 2. bootoutAndWaitForExit → real process confirmed dead 3. teardownLaunchdServices → full sequence against real launchd 4. ensureNexuProcessesDead → real orphan process spawned and killed 5. getServiceStatus → real PID from launchctl print 6. getServiceStatus → unknown for non-existent label 7. installService → detects plist content change and re-bootstraps 8. stopServiceGracefully → real SIGTERM stops service CI: - desktop-ci-dist.yml: add `pnpm test` step on macOS runners so real launchd integration tests run in CI alongside the shell e2e tests. * test(desktop): comprehensive real launchd integration tests + macOS CI Expand launchd-integration.test.ts from 8 to 16 real launchd tests: 9. Full cycle: start → bootout → verify clean → cold re-start 10. Attach: detect already-running service from previous session 11. KeepAlive: service auto-restarts after SIGKILL (crash simulation) 12. Rapid start/stop cycles leave no orphan processes 13. Port conflict: occupied port → bootout still cleans up 14. bootout on non-registered label is idempotent (no throw) 15. teardownLaunchdServices with non-existent labels is safe 16. waitForExit handles process dying during bootout (race condition) CI: add pnpm test to desktop-ci-dev.yml (macOS-14) so real launchd integration tests also run in the dev CI path, not just dist CI. * test(desktop): 33 real launchd integration tests covering all LaunchdManager methods Expand from 16 to 33 real launchd tests, covering every LaunchdManager public method and critical lifecycle scenario against actual launchctl: Methods: - installService (fresh, idempotent, content-change re-bootstrap) - startService, stopService (SIGTERM), restartService (kickstart -k) - bootoutService, bootoutAndWaitForExit (with PID-aware fallback) - uninstallService (bootout + delete plist, idempotent) - stopServiceGracefully (SIGTERM → poll → SIGKILL escalation) - rebootstrapFromPlist (re-register after bootout) - getServiceStatus (running/unknown, PID parsing, env parsing) - isServiceRegistered, hasPlistFile, isServiceInstalled - waitForExit (with knownPid after bootout) - getDomain, getPlistDir Scenarios: - Full start→stop→restart cycle - Attach to already-running service from previous session - KeepAlive auto-restart after SIGKILL (crash simulation) - Rapid start/stop cycles with no orphans - Port conflict: bootout cleans up even when port is blocked - Double bootout is idempotent - Process dying during bootout (race condition) - Multiple services: start two, teardown both - ensureNexuProcessesDead: no-op when clean, kills orphans - teardownLaunchdServices: non-existent labels are safe * test(desktop): update server integration tests with real HTTP server Spin up a local HTTP server that mimics the desktop release CDN and verify the update feed resolution + YAML serving end-to-end: 1. Feed URL resolves to valid fetchable URL (stable/arm64) 2. Server serves latest-mac.yml at correct path 3. YAML contains all electron-updater required fields 4. All 3 channels × 2 architectures serve valid responses 5. 404 for invalid paths 6. Explicit feedUrl overrides default 7. Custom feed URL pointed at local server is fetchable 8. Version comparison: server version > current = update available 9. Download artifact URL serves content 10. Server request logging works 11. GitHub source returns github:// URL 12. NEXU_UPDATE_FEED_URL env takes highest priority These catch real HTTP/YAML issues that mocked autoUpdater tests miss. * fix(test): stabilize stopService test against KeepAlive race * fix(desktop): respect externally-set NEXU_HOME in dev mode bootstrap.ts configureLocalDevPaths() unconditionally overwrote process.env.NEXU_HOME with userData/.nexu, clobbering the value passed by dev-launchd.sh (pnpm start). This caused controller to read config from a fresh empty directory instead of the intended .tmp/desktop/nexu-home/, making every pnpm start feel like a first-time setup. Fix: only set NEXU_HOME as fallback when not already provided. Packaged users are unaffected — configureLocalDevPaths() returns early when app.isPackaged is true (line 47). Also adds 27 data-directory-invariants tests that guard: - bootstrap.ts path configuration guards (isPackaged, env respect) - runtime-config.ts NEXU_HOME resolution order - controller env.ts data file locations under NEXU_HOME - plist NEXU_HOME and OPENCLAW_STATE_DIR presence - dev-launchd.sh path consistency - AGENTS.md directory layout contract - Packaged mode backward compatibility with 0.1.7 - OpenClaw state directory separation from NEXU_HOME * fix(desktop): respect NEXU_HOME + 70 data directory path tests Fix: bootstrap.ts configureLocalDevPaths() now respects externally-set NEXU_HOME instead of unconditionally overwriting it. This fixes pnpm start creating a fresh config directory on every launch. 70 runtime tests verify every data path by calling real functions and checking real output: Controller plist env vars (26 tests): NEXU_HOME, OPENCLAW_STATE_DIR, OPENCLAW_CONFIG_PATH, OPENCLAW_SKILLS_DIR, OPENCLAW_EXTENSIONS_DIR, SKILLHUB_STATIC_SKILLS_DIR, PLATFORM_TEMPLATES_DIR, OPENCLAW_BIN, OPENCLAW_ELECTRON_EXECUTABLE, NODE_PATH, TMPDIR, PORT, HOST, WEB_URL, OPENCLAW_GATEWAY_PORT, OPENCLAW_GATEWAY_TOKEN, ELECTRON_RUN_AS_NODE, RUNTIME_MANAGE_OPENCLAW_PROCESS, RUNTIME_GATEWAY_PROBE_ENABLED, OPENCLAW_DISABLE_BONJOUR, NODE_ENV (dev+prod), HOME, PATH, NEXU_HOME omission OpenClaw plist env vars (12 tests): ELECTRON_RUN_AS_NODE, OPENCLAW_CONFIG, OPENCLAW_CONFIG_PATH, OPENCLAW_STATE_DIR, OPENCLAW_LAUNCHD_LABEL (dev+prod), OPENCLAW_SERVICE_MARKER, HOME, PATH, NODE_PATH, no NEXU_HOME, no PORT Plist structure (11 tests): ProgramArguments, WorkingDirectory, StandardOutPath, StandardErrorPath, KeepAlive, RunAtLoad, Label (dev+prod), gateway run args, --auth none (dev only), OtherJobEnabled Path resolution (12 tests): desktop-paths.ts helpers (6), resolveLaunchdPaths dev+packaged (4), getDefaultPlistDir dev+prod, getLogDir dev+prod Directory separation (5 tests): packaged NEXU_HOME ≠ userData, dev NEXU_HOME ≠ userData, NEXU_HOME under home, userData under Application Support, dev state repo-scoped Config resolution (4 tests): NEXU_HOME default, env override, runtime-config priority chain * test(desktop): rewrite data directory tests — verify program behavior, not path.resolve Replace 70 trivial path.resolve assertions with 27 behavior-focused tests that call real functions and check real output: generatePlist output (25 tests): Call the real function with production-realistic inputs, parse the XML, and verify every env var value matches expected paths. Key checks: - NEXU_HOME → ~/.nexu (not under userData) - OPENCLAW_STATE_DIR → under userData (not under NEXU_HOME) - OPENCLAW_CONFIG_PATH consistent with OPENCLAW_STATE_DIR - OPENCLAW_SKILLS_DIR consistent with OPENCLAW_STATE_DIR - OpenClaw plist has NO NEXU_HOME (it doesn't use it) These would have caught the #526 bug. runtime-config resolution (2 tests): Call getDesktopRuntimeConfig with different env objects, verify the NEXU_HOME priority chain: env > buildConfig > ~/.nexu default. Real launchd env verification (macOS, 2 tests): Start a real launchd service, read launchctl print output, parse the environment block, verify NEXU_HOME and OPENCLAW_STATE_DIR are what we set and are different directories. * test(desktop): quality overhaul — delete garbage, add real edge cases Deleted: - quit-handler.test.ts (8 tests superseded by quit-handler-full.test.ts) - data-directory-invariants.test.ts (27 grep-source-code tests, worthless) - launchd-manager-ops-extended.test.ts (11 tests duplicated in ops.test.ts) Added edge cases: - daemon-supervisor: partial failure (one unit hangs), double dispose, child.error event handling - lifecycle-teardown: process.kill EPERM handling, PID deduplication across multiple pgrep patterns - update-install: stopPeriodicCheck before teardown verification, ensureNexuProcessesDead throw propagation - launchd-integration (real launchd): NEXU_HOME with spaces, NEXU_HOME with Chinese unicode, OtherJobEnabled cascading behavior - launchd-lifecycle-e2e.sh: Phase 6 (spaces) + Phase 7 (unicode) using .cjs scripts for ESM-safe execution Net: -878 lines of garbage, +537 lines of behavior-focused tests. 17/17 real launchd e2e checks passing. * chore: add @vitest/coverage-v8 dev dependency Required for running test coverage reports locally and in CI. * chore(ci): comprehensive path filters for macOS CI Add missing paths that affect launchd/lifecycle behavior: - apps/controller/** — controller process management, env parsing, openclaw process spawning. Changes here can break launchd services. - scripts/dev-launchd.sh — pnpm start/stop/restart entry point. - scripts/kill-all.sh — global process cleanup. - scripts/desktop-stop-smoke.sh — added to dist CI (was only in dev). - scripts/launchd-lifecycle-e2e.sh — added to dev CI (was only in dist). - tests/desktop/** — test changes should trigger CI to verify they pass. - vitest.config.ts — test framework config changes could break all tests. This ensures any change that could affect launchd, process lifecycle, or the test suite triggers the macOS CI runners. * fix(desktop): P0-2 unified gracefulShutdown + P1-2 removeListener fix P0-2: Extract single authoritative gracefulShutdown(reason) function. - Idempotent: shutdownInProgress guard prevents double teardown. - 8-second hard timeout: if teardown hangs, process.exit(1) fires. - Handles both launchd mode (teardownLaunchdServices) and orchestrator mode (orchestrator.dispose) in one function. - SIGTERM + SIGINT handlers registered on process, route to gracefulShutdown then app.exit(0). This covers: - External kill (Activity Monitor, scripts, systemd) - Ctrl+C in terminal - System shutdown sending SIGTERM - dev-launchd.sh stop now sends SIGTERM first (triggers graceful shutdown in Electron), waits up to 10s, then SIGKILL as fallback. Previously it used SIGKILL immediately, bypassing all cleanup. P1-2: Replace removeAllListeners("before-quit") with removeListener. - Store the specific handler reference (beforeQuitHandler). - Only remove that handler, not all before-quit listeners. - Prevents future listeners (telemetry, flush, etc.) from being accidentally removed. Also fixes: dev.sh kill_residual_processes patterns were outdated (referenced .tmp/sidecars/ paths that no longer exist), causing CI stop smoke test failures. Updated to match current process patterns. * fix(desktop): address PR review comments + CODEOWNERS PR review fixes: - launchd-bootstrap: orphan kill now only runs when neither service is registered with launchd, preventing SIGKILL of healthy managed services during relaunch-after-crash scenarios. - quit-handler: teardownLaunchdServices always runs on quit-completely, even if plistDir is absent (plistDir only affects runtime-ports cleanup). - desktop-stop-smoke.sh: add web sidecar pattern to process checks. - desktop-check-dev.sh: replace fixed 2s sleep with bounded polling (max 10s) to avoid teardown race flakes. - dev-env.sh: lsregister success/failure logged accurately. - dev.sh: kill_residual_processes patterns updated to match current process paths (was using stale .tmp/sidecars/ paths). CODEOWNERS: require @lefarcen review for test changes only. Tests define the quality gate — source code anyone can change, but acceptance criteria changes need review. * fix(desktop): kill tsc/web watchers on pnpm stop + smoke test dev-launchd.sh stop_services now kills the tsc --watch and web watcher background processes. These were only cleaned by the EXIT trap (which fires when the start_services function's shell exits), but `pnpm stop` calls stop_services directly without triggering the trap — leaving watchers printing to the terminal after stop. Also adds tsc watcher residual check to desktop-stop-smoke.sh. * chore: revert CODEOWNERS + remove unused OPENCLAW_ENTRY gate in e2e - Revert CODEOWNERS to original (only api/migrations). - Remove OPENCLAW_ENTRY prerequisite from launchd-lifecycle-e2e.sh — the script only tests controller, never launches openclaw.

@lefarcen

…reen (#597) * fix(desktop): robust lifecycle teardown for quit and update-install The update-install path previously called orchestrator.dispose() without properly booting out launchd services, causing macOS to report "app is still running" when the installer tried to replace the .app bundle. The quit-completely path had a similar issue: after bootout, waitForExit could not SIGKILL processes whose launchd label was already unregistered. Changes: - LaunchdManager.bootoutAndWaitForExit: captures PID before bootout so the SIGKILL fallback works even after the label is unregistered. - LaunchdManager.waitForExit: accepts optional knownPid parameter; uses process.kill(pid, 0) to verify death when launchctl print returns "unknown". - teardownLaunchdServices: new shared function used by both quit-handler and update-manager. Bootouts each service with PID-aware waiting, deletes runtime-ports.json, and kills orphan processes via pgrep. - ensureNexuProcessesDead: polling verification gate that loops pgrep + SIGKILL until all Nexu sidecar processes are confirmed dead (max 15s). - quitAndInstall: now three-phase — (1) teardown + dispose wrapped in try/catch so failures never block the install, (2) ensureNexuProcessesDead as the hard verification gate, (3) autoUpdater.quitAndInstall. - Bootstrap adds killOrphanNexuProcesses on cold start to clean up residual processes from a previously failed update. Tests: 27 new tests across 3 files covering teardown, PID-aware shutdown, verification gate, and the full update-install sequence. * chore(ci): add pnpm test to CI and launchd lifecycle e2e test - ci.yml: add a `test` job that runs `pnpm test` on ubuntu-latest, covering all 247+ vitest unit tests that were previously not run in CI. - desktop-ci-dist.yml: add real launchd lifecycle e2e test that runs on macOS CI runners before the packaged app build. The test exercises: 1. Bootstrap: register plist → kickstart → verify running + port 2. Teardown: bootout → verify label unregistered → verify process dead 3. SIGKILL fallback: bootout → saved-PID SIGKILL → verify dead 4. Orphan detection: spawn fake orphan → detect via lsof → SIGKILL 5. Re-bootstrap: fresh cold start after full cleanup - scripts/launchd-lifecycle-e2e.sh: standalone e2e test script (15 checks) that validates the launchd process management primitives used by the desktop app's quit and update-install paths. * fix(desktop): prevent Dock icon proliferation + add comprehensive tests Fixes: - dev-env.sh: add Launch Services cache flush (lsregister) after patching LSUIElement=true on the dev Electron binary. Without this, macOS uses cached plist data and still shows Dock icons for child processes. Add verification logging on success/failure. - daemon-supervisor: force ELECTRON_RUN_AS_NODE=1 on all spawn() calls that use process.execPath (Electron binary) as a safety net, even if the manifest env omits it. Tests: - daemon-supervisor.test.ts (10 tests): constructor, startAutoStart, stopUnit SIGTERM/SIGKILL escalation, 5s deadline, stopAll parallel, dispose, skip non-managed, ELECTRON_RUN_AS_NODE safety net, stoppedByUser suppresses restart, dependent stop order. - quit-handler.test.ts (8 tests): quit-completely full sequence, __nexuForceQuit flag, app.exit(0), error resilience for onBeforeQuit and webServer.close, no-plistDir skip, run-in-background hide. - desktop-stop-smoke.sh: post-stop verification script — checks no residual processes, free ports, no launchd labels, no stale state. Integrated into desktop-check-dev.sh for CI. * fix(desktop): route pnpm start through dev-env.sh for LSUIElement patch dev-launchd.sh (pnpm start) was launching Electron directly without going through dev-env.sh, bypassing the LSUIElement=true plist patch and Launch Services cache flush. This is the direct cause of the Dock icon proliferation reported by users running pnpm start. Now pnpm start → dev-launchd.sh → dev-env.sh → electron, matching the same path that pnpm dev uses via dev.sh → dev-run.sh → dev-env.sh. * test(desktop): add dev toolchain invariant tests Static analysis tests that guard critical invariants across the launch, environment, and shutdown scripts. These catch regressions like the dev-launchd.sh bypass of dev-env.sh that caused Dock icon proliferation. 19 invariant checks covering: - Launch paths: all Electron launch commands go through dev-env.sh - LSUIElement: plist patch + LS cache flush present - ELECTRON_RUN_AS_NODE: set in plists, manifests, daemon-supervisor, openclaw-process, and catalog-manager - Shutdown: bootout, orphan kill, port wait, teardown via shared function, try/catch wrapping, verification gate ordering * test(desktop): comprehensive coverage for lifecycle, quit, and update paths Bring test coverage for the 6 core lifecycle files from 63.5% to 83.8% overall statements, with function coverage at 93.8%. Per-file improvements: - update-manager.ts: 63.1% → 98.5% (bindEvents, checkNow, downloadUpdate, periodicCheck, setChannel/setSource, send to webviews) - quit-handler.ts: 36.1% → 97.0% (installLaunchdQuitHandler dialog flow, force-quit bypass, dev-mode bypass, Cmd+Q interception, dialogOpen guard) - launchd-manager.ts: 72.7% → 94.9% (uninstallService, stopServiceGracefully SIGKILL escalation, restartService, rebootstrapFromPlist, hasPlistFile) - daemon-supervisor.ts: 46.5% → 69.5% (startUnit port probe, auto-restart backoff, refreshDelegatedUnits pgrep, stdout/stderr capture, queryEvents) - launchd-bootstrap.ts: 92.6% → 93.1% (isLaunchdBootstrapEnabled packaged heuristic, ensureNexuProcessesDead edge cases) - plist-generator.ts: 100% (unchanged) New test files: daemon-supervisor.test.ts (27), quit-handler-full.test.ts (13), update-manager-full.test.ts (41), launchd-manager-ops-extended.test.ts (11), launchd-bootstrap-edge.test.ts (12). Total: 391 tests across 40 files, all passing. * fix(desktop): stopPeriodicCheck race + real launchd integration tests Fixes: - update-manager: save initial setTimeout ID so stopPeriodicCheck can cancel it during the initial delay window. Previously, calling stopPeriodicCheck before the initial delay expired was a no-op, allowing the interval to start during teardown. - update-manager: call stopPeriodicCheck() at the start of quitAndInstall() to prevent periodic checks from firing mid-teardown. Tests: - launchd-integration.test.ts: 8 tests that run REAL launchd on macOS (skipped on other platforms). Covers: 1. installService + startService → real service running on real port 2. bootoutAndWaitForExit → real process confirmed dead 3. teardownLaunchdServices → full sequence against real launchd 4. ensureNexuProcessesDead → real orphan process spawned and killed 5. getServiceStatus → real PID from launchctl print 6. getServiceStatus → unknown for non-existent label 7. installService → detects plist content change and re-bootstraps 8. stopServiceGracefully → real SIGTERM stops service CI: - desktop-ci-dist.yml: add `pnpm test` step on macOS runners so real launchd integration tests run in CI alongside the shell e2e tests. * test(desktop): comprehensive real launchd integration tests + macOS CI Expand launchd-integration.test.ts from 8 to 16 real launchd tests: 9. Full cycle: start → bootout → verify clean → cold re-start 10. Attach: detect already-running service from previous session 11. KeepAlive: service auto-restarts after SIGKILL (crash simulation) 12. Rapid start/stop cycles leave no orphan processes 13. Port conflict: occupied port → bootout still cleans up 14. bootout on non-registered label is idempotent (no throw) 15. teardownLaunchdServices with non-existent labels is safe 16. waitForExit handles process dying during bootout (race condition) CI: add pnpm test to desktop-ci-dev.yml (macOS-14) so real launchd integration tests also run in the dev CI path, not just dist CI. * test(desktop): 33 real launchd integration tests covering all LaunchdManager methods Expand from 16 to 33 real launchd tests, covering every LaunchdManager public method and critical lifecycle scenario against actual launchctl: Methods: - installService (fresh, idempotent, content-change re-bootstrap) - startService, stopService (SIGTERM), restartService (kickstart -k) - bootoutService, bootoutAndWaitForExit (with PID-aware fallback) - uninstallService (bootout + delete plist, idempotent) - stopServiceGracefully (SIGTERM → poll → SIGKILL escalation) - rebootstrapFromPlist (re-register after bootout) - getServiceStatus (running/unknown, PID parsing, env parsing) - isServiceRegistered, hasPlistFile, isServiceInstalled - waitForExit (with knownPid after bootout) - getDomain, getPlistDir Scenarios: - Full start→stop→restart cycle - Attach to already-running service from previous session - KeepAlive auto-restart after SIGKILL (crash simulation) - Rapid start/stop cycles with no orphans - Port conflict: bootout cleans up even when port is blocked - Double bootout is idempotent - Process dying during bootout (race condition) - Multiple services: start two, teardown both - ensureNexuProcessesDead: no-op when clean, kills orphans - teardownLaunchdServices: non-existent labels are safe * test(desktop): update server integration tests with real HTTP server Spin up a local HTTP server that mimics the desktop release CDN and verify the update feed resolution + YAML serving end-to-end: 1. Feed URL resolves to valid fetchable URL (stable/arm64) 2. Server serves latest-mac.yml at correct path 3. YAML contains all electron-updater required fields 4. All 3 channels × 2 architectures serve valid responses 5. 404 for invalid paths 6. Explicit feedUrl overrides default 7. Custom feed URL pointed at local server is fetchable 8. Version comparison: server version > current = update available 9. Download artifact URL serves content 10. Server request logging works 11. GitHub source returns github:// URL 12. NEXU_UPDATE_FEED_URL env takes highest priority These catch real HTTP/YAML issues that mocked autoUpdater tests miss. * fix(test): stabilize stopService test against KeepAlive race * fix(desktop): respect externally-set NEXU_HOME in dev mode bootstrap.ts configureLocalDevPaths() unconditionally overwrote process.env.NEXU_HOME with userData/.nexu, clobbering the value passed by dev-launchd.sh (pnpm start). This caused controller to read config from a fresh empty directory instead of the intended .tmp/desktop/nexu-home/, making every pnpm start feel like a first-time setup. Fix: only set NEXU_HOME as fallback when not already provided. Packaged users are unaffected — configureLocalDevPaths() returns early when app.isPackaged is true (line 47). Also adds 27 data-directory-invariants tests that guard: - bootstrap.ts path configuration guards (isPackaged, env respect) - runtime-config.ts NEXU_HOME resolution order - controller env.ts data file locations under NEXU_HOME - plist NEXU_HOME and OPENCLAW_STATE_DIR presence - dev-launchd.sh path consistency - AGENTS.md directory layout contract - Packaged mode backward compatibility with 0.1.7 - OpenClaw state directory separation from NEXU_HOME * fix(desktop): respect NEXU_HOME + 70 data directory path tests Fix: bootstrap.ts configureLocalDevPaths() now respects externally-set NEXU_HOME instead of unconditionally overwriting it. This fixes pnpm start creating a fresh config directory on every launch. 70 runtime tests verify every data path by calling real functions and checking real output: Controller plist env vars (26 tests): NEXU_HOME, OPENCLAW_STATE_DIR, OPENCLAW_CONFIG_PATH, OPENCLAW_SKILLS_DIR, OPENCLAW_EXTENSIONS_DIR, SKILLHUB_STATIC_SKILLS_DIR, PLATFORM_TEMPLATES_DIR, OPENCLAW_BIN, OPENCLAW_ELECTRON_EXECUTABLE, NODE_PATH, TMPDIR, PORT, HOST, WEB_URL, OPENCLAW_GATEWAY_PORT, OPENCLAW_GATEWAY_TOKEN, ELECTRON_RUN_AS_NODE, RUNTIME_MANAGE_OPENCLAW_PROCESS, RUNTIME_GATEWAY_PROBE_ENABLED, OPENCLAW_DISABLE_BONJOUR, NODE_ENV (dev+prod), HOME, PATH, NEXU_HOME omission OpenClaw plist env vars (12 tests): ELECTRON_RUN_AS_NODE, OPENCLAW_CONFIG, OPENCLAW_CONFIG_PATH, OPENCLAW_STATE_DIR, OPENCLAW_LAUNCHD_LABEL (dev+prod), OPENCLAW_SERVICE_MARKER, HOME, PATH, NODE_PATH, no NEXU_HOME, no PORT Plist structure (11 tests): ProgramArguments, WorkingDirectory, StandardOutPath, StandardErrorPath, KeepAlive, RunAtLoad, Label (dev+prod), gateway run args, --auth none (dev only), OtherJobEnabled Path resolution (12 tests): desktop-paths.ts helpers (6), resolveLaunchdPaths dev+packaged (4), getDefaultPlistDir dev+prod, getLogDir dev+prod Directory separation (5 tests): packaged NEXU_HOME ≠ userData, dev NEXU_HOME ≠ userData, NEXU_HOME under home, userData under Application Support, dev state repo-scoped Config resolution (4 tests): NEXU_HOME default, env override, runtime-config priority chain * test(desktop): rewrite data directory tests — verify program behavior, not path.resolve Replace 70 trivial path.resolve assertions with 27 behavior-focused tests that call real functions and check real output: generatePlist output (25 tests): Call the real function with production-realistic inputs, parse the XML, and verify every env var value matches expected paths. Key checks: - NEXU_HOME → ~/.nexu (not under userData) - OPENCLAW_STATE_DIR → under userData (not under NEXU_HOME) - OPENCLAW_CONFIG_PATH consistent with OPENCLAW_STATE_DIR - OPENCLAW_SKILLS_DIR consistent with OPENCLAW_STATE_DIR - OpenClaw plist has NO NEXU_HOME (it doesn't use it) These would have caught the #526 bug. runtime-config resolution (2 tests): Call getDesktopRuntimeConfig with different env objects, verify the NEXU_HOME priority chain: env > buildConfig > ~/.nexu default. Real launchd env verification (macOS, 2 tests): Start a real launchd service, read launchctl print output, parse the environment block, verify NEXU_HOME and OPENCLAW_STATE_DIR are what we set and are different directories. * test(desktop): quality overhaul — delete garbage, add real edge cases Deleted: - quit-handler.test.ts (8 tests superseded by quit-handler-full.test.ts) - data-directory-invariants.test.ts (27 grep-source-code tests, worthless) - launchd-manager-ops-extended.test.ts (11 tests duplicated in ops.test.ts) Added edge cases: - daemon-supervisor: partial failure (one unit hangs), double dispose, child.error event handling - lifecycle-teardown: process.kill EPERM handling, PID deduplication across multiple pgrep patterns - update-install: stopPeriodicCheck before teardown verification, ensureNexuProcessesDead throw propagation - launchd-integration (real launchd): NEXU_HOME with spaces, NEXU_HOME with Chinese unicode, OtherJobEnabled cascading behavior - launchd-lifecycle-e2e.sh: Phase 6 (spaces) + Phase 7 (unicode) using .cjs scripts for ESM-safe execution Net: -878 lines of garbage, +537 lines of behavior-focused tests. 17/17 real launchd e2e checks passing. * chore: add @vitest/coverage-v8 dev dependency Required for running test coverage reports locally and in CI. * chore(ci): comprehensive path filters for macOS CI Add missing paths that affect launchd/lifecycle behavior: - apps/controller/** — controller process management, env parsing, openclaw process spawning. Changes here can break launchd services. - scripts/dev-launchd.sh — pnpm start/stop/restart entry point. - scripts/kill-all.sh — global process cleanup. - scripts/desktop-stop-smoke.sh — added to dist CI (was only in dev). - scripts/launchd-lifecycle-e2e.sh — added to dev CI (was only in dist). - tests/desktop/** — test changes should trigger CI to verify they pass. - vitest.config.ts — test framework config changes could break all tests. This ensures any change that could affect launchd, process lifecycle, or the test suite triggers the macOS CI runners. * fix(desktop): P0-2 unified gracefulShutdown + P1-2 removeListener fix P0-2: Extract single authoritative gracefulShutdown(reason) function. - Idempotent: shutdownInProgress guard prevents double teardown. - 8-second hard timeout: if teardown hangs, process.exit(1) fires. - Handles both launchd mode (teardownLaunchdServices) and orchestrator mode (orchestrator.dispose) in one function. - SIGTERM + SIGINT handlers registered on process, route to gracefulShutdown then app.exit(0). This covers: - External kill (Activity Monitor, scripts, systemd) - Ctrl+C in terminal - System shutdown sending SIGTERM - dev-launchd.sh stop now sends SIGTERM first (triggers graceful shutdown in Electron), waits up to 10s, then SIGKILL as fallback. Previously it used SIGKILL immediately, bypassing all cleanup. P1-2: Replace removeAllListeners("before-quit") with removeListener. - Store the specific handler reference (beforeQuitHandler). - Only remove that handler, not all before-quit listeners. - Prevents future listeners (telemetry, flush, etc.) from being accidentally removed. Also fixes: dev.sh kill_residual_processes patterns were outdated (referenced .tmp/sidecars/ paths that no longer exist), causing CI stop smoke test failures. Updated to match current process patterns. * fix(desktop): address PR review comments + CODEOWNERS PR review fixes: - launchd-bootstrap: orphan kill now only runs when neither service is registered with launchd, preventing SIGKILL of healthy managed services during relaunch-after-crash scenarios. - quit-handler: teardownLaunchdServices always runs on quit-completely, even if plistDir is absent (plistDir only affects runtime-ports cleanup). - desktop-stop-smoke.sh: add web sidecar pattern to process checks. - desktop-check-dev.sh: replace fixed 2s sleep with bounded polling (max 10s) to avoid teardown race flakes. - dev-env.sh: lsregister success/failure logged accurately. - dev.sh: kill_residual_processes patterns updated to match current process paths (was using stale .tmp/sidecars/ paths). CODEOWNERS: require @lefarcen review for test changes only. Tests define the quality gate — source code anyone can change, but acceptance criteria changes need review. * fix(desktop): kill tsc/web watchers on pnpm stop + smoke test dev-launchd.sh stop_services now kills the tsc --watch and web watcher background processes. These were only cleaned by the EXIT trap (which fires when the start_services function's shell exits), but `pnpm stop` calls stop_services directly without triggering the trap — leaving watchers printing to the terminal after stop. Also adds tsc watcher residual check to desktop-stop-smoke.sh. * chore: revert CODEOWNERS + remove unused OPENCLAW_ENTRY gate in e2e - Revert CODEOWNERS to original (only api/migrations). - Remove OPENCLAW_ENTRY prerequisite from launchd-lifecycle-e2e.sh — the script only tests controller, never launches openclaw. * fix(desktop): harden lifecycle robustness — external runner, evidence-based updates, unified teardown - Extract Electron binary + frameworks to ~/.nexu/runtime/nexu-runner.app/ via APFS clone so launchd services never reference the .app bundle, unblocking Finder drag-and-drop reinstalls - Extract controller sidecar to ~/.nexu/runtime/controller-sidecar/ for the same reason; openclaw sidecar already extracted by existing logic - All three extractions use staging dir + atomic rename to prevent half-copies - Version-aware attach: refuse to attach to services from a different app version, build source, userDataPath, or openclawStateDir - Evidence-based update install: after process sweeps, lsof-check critical paths; only abort if .app bundle or sidecar dirs are actually locked - Unified dev/packaged teardown: Cmd+Q, window close, and no-window exit all go through teardownLaunchdServices in both modes - daemon-supervisor circuit breaker: MAX_CONSECUTIVE_RESTARTS=10 with 120s window, emits max_restarts_exceeded reason code - bootoutService tolerates "already gone" errors - runtime-ports.json atomic write (tmp + rename) - Tighter orphan cleanup: prefer launchd label + runtime-ports metadata, fall back to pgrep with node.* prefix and process tree exclusion * test(desktop): add evidence-based update, attach mismatch, and authoritative PID tests - ensureNexuProcessesDead: test launchctl-label-only discovery path - update-install: test critical path lock check (abort vs proceed) - attach identity: test buildSource/userDataPath/appVersion mismatch teardown - quit-handler: test no-window packaged teardown, dev before-quit teardown * test(desktop): expand lifecycle regression coverage * fix(desktop): restore allow-jit in inherit entitlements + add regression guard tests Root cause of nightly white-screen: entitlements.mac.inherit.plist dropped allow-jit, causing V8 to fail mmap with MAP_JIT on macOS 14.7.4+ → renderer OOM → white screen. Fix: add back allow-jit and allow-unsigned-executable-memory alongside inherit. Tests: 8 static analysis tests guard against future entitlement regressions, verifying both parent and inherit plists have the required V8 JIT keys. * test(desktop): strengthen plist tests — value assertions, XML escaping, ProgramArguments order - Entitlements: verify <true/> values (not just key presence), no dangerous entitlements (disable-executable-page-protection, get-task-allow), no duplicate keys, hardenedRuntime enabled in electron-builder - ProgramArguments: exact ordering for controller [node, entry] and openclaw [node, openclaw.mjs, gateway, run], dev --auth none insertion - Openclaw completeness: WorkingDirectory, StandardErrorPath, KeepAlive SuccessfulExit, ThrottleInterval, RunAtLoad=false - XML escaping: &, <, >, ", ' in path fields correctly escaped * fix(desktop): unify teardown paths, fix lsof PID parsing, only retry EADDRINUSE - Extract runTeardownAndExit() helper: dev-close, dev-before-quit, packaged-quit, packaged-no-window all share one code path with try/finally to guarantee app.exit(0) even if teardown throws - Restore onForceQuit semantic: only fires on explicit "Quit Completely" user choice, not on dev-mode or no-window exits - Add step-level console.warn for onBeforeQuit/webServer.close failures - lsof: parse PID column by field position instead of substring matching - Web port retry: only retry on EADDRINUSE, re-throw other errors immediately

lefarcen and others added 30 commits March 23, 2026 16:10

fix(scripts): build web alongside controller in dev-launchd start

4abac72

The embedded web server serves static files from apps/web/dist, so code changes to the web app require a build step before starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(desktop): white background + 96px logo matching design prototype

febd4c1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(desktop): localize quit dialog (zh-CN / en)

f335440

Quit dialog now shows Chinese or English based on app.getLocale(). Chinese users see "完全退出 / 后台运行 / 取消". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lefarcen and others added 12 commits March 24, 2026 19:58

fix(desktop): use static import for node:net instead of require()

aa7473c

esbuild doesn't support typeof import() expressions. Use static import { createConnection } from "node:net" instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(desktop): fallback to rm -rf when rmSync fails on sidecar cleanup

0ea926b

Node.js rmSync with recursive+force can fail with ENOTEMPTY on macOS. Fall back to execFileSync rm -rf which handles this reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into refactor/launchd-proc…

dc4ac30

…ess-architecture

chatgpt-codex-connector Bot reviewed Mar 25, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread apps/desktop/main/services/launchd-manager.ts

coderabbitai Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread apps/desktop/main/services/quit-handler.ts

coderabbitai Bot reviewed Mar 25, 2026

View reviewed changes

lefarcen closed this Mar 25, 2026

		if (!existsSync(targetFile)) {
		cpSync(sourceFile, targetFile);

Conversation

lefarcen commented Mar 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

1. Session data loss (introduced in v0.1.6)

2. "App can no longer be opened" after quit

3. Quit caused launchd respawn loop

Fix

Session data migration

Quit flow fix

Documentation

Directory layout after migration

Test plan

Summary by CodeRabbit

v0.1.7 Release Notes

Uh oh!

coderabbitai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

cloudflare-workers-and-pages Bot commented Mar 25, 2026

Deploying nexu-docs with Cloudflare Pages

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

lefarcen commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lefarcen commented Mar 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading