feat(desktop): launchd attach, partial attach, dev isolation & hot reload#519
feat(desktop): launchd attach, partial attach, dev isolation & hot reload#519
Conversation
Replace reserved close code 1008 (Policy Violation) with private code 4008 when closing WebSocket connections on error. Code 1008 is reserved for server use and Node.js 22's native WebSocket throws DOMException [InvalidAccessError] when clients attempt to use it, causing the controller process to crash on authentication failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Cherry-pick WebSocket close code fix from PR #365 - Change launchd namespace from com.nexu.* to io.nexu.* - Add progress tracking directory with STATUS, DECISIONS, ISSUES Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 1-3 of launchd architecture refactor: - LaunchdManager: wrapper for launchctl commands (install, start, stop, status, graceful shutdown) - PlistGenerator: generates launchd plist XML for Controller and OpenClaw services with proper env vars and dependencies - EmbeddedWebServer: serves static files and proxies API requests to Controller, replacing the web sidecar process Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- launchd-bootstrap.ts: complete bootstrap flow for launchd-based startup (install services, start controller, start openclaw, start embedded web server) - Feature flag NEXU_USE_LAUNCHD=1 for gradual rollout - Unified log directory at ~/.nexu/logs/ - Path resolution for packaged vs dev environments - Index file exporting all services Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- quit-handler.ts: handles before-quit event with dialog - Options: Quit Completely (stop services), Run in Background, Cancel - Graceful shutdown of launchd services - Exported via services/index.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
scripts/dev-launchd.sh provides: - start: generate plists, bootstrap and start services - stop: gracefully stop services - restart: stop then start - status: show launchd service status - logs: tail all log files Uses io.nexu.*.dev labels and ~/.nexu/logs/ for logging. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add launchd service imports - Add runLaunchdColdStart function that uses bootstrapWithLaunchd - Check NEXU_USE_LAUNCHD=1 flag to choose bootstrap mode - Install launchd quit handler after successful launchd bootstrap - Modify before-quit handler to skip orchestrator cleanup in launchd mode - Derive openclaw paths from nexuHome config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- launchd-manager.test.ts: tests for LaunchdManager class and SERVICE_LABELS - plist-generator.test.ts: tests for generatePlist function Tests cover: - Platform check (darwin only) - Default and custom plist directories - UID-based domain construction - Dev vs prod label generation - Plist XML generation with correct structure - XML character escaping - Log path configuration - Service dependencies Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix OpenClaw config paths to match controller defaults in env.ts (OPENCLAW_STATE_DIR=~/.nexu/runtime/openclaw/state) - Add `gateway` subcommand to OpenClaw plist generation - Use OPENCLAW_CONFIG_PATH env var instead of --config argument - Add --auth none for dev mode to simplify local development - Update tests to verify OPENCLAW_CONFIG_PATH env var presence Tested with ./scripts/dev-launchd.sh - Controller and OpenClaw WebSocket connection verified working. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… workflow - Add "starting" RuntimeStatus: when OpenClaw gateway is unreachable but process is alive, show "启动中" instead of "已离线" - Parallelize launchd service install/start + web server (Promise.all) - Use adaptive readiness polling (50ms→250ms) instead of fixed 250ms - Fix dev-launchd.sh stop: use bootout directly instead of SIGTERM+bootout race with KeepAlive; use SIGKILL for Electron to bypass quit handler - Dev quit handler keeps services running (run-in-background) so vite HMR restarts don't kill launchd services - Add tool progress prompt to nexu-platform-bootstrap plugin - Disable humanDelay in config compiler - Cold start time reduced from ~5s to ~2s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Stop: wait for ports to free after bootout, SIGKILL orphans including chrome_crashpad_handler - Fix resolveLaunchdPaths for packaged mode: OpenClaw is at runtime/openclaw/node_modules/openclaw/openclaw.mjs, not runtime/openclaw-runtime/openclaw.mjs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tartup When the WebSocket to OpenClaw gateway isn't connected yet (during startup), channels were shown as "disconnected" (red). Now they show as "connecting" (yellow pulse) when the runtime is still starting, giving users a much less alarming startup experience. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit "booting" → "ready" lifecycle to ControllerRuntimeState. During boot, gateway-unreachable is always treated as "starting" (not "unhealthy"), regardless of whether the process manager owns the OpenClaw process (fixes launchd mode where processManager.isAlive() returns false). Channel live status also uses bootPhase to show "connecting" during startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The embedded web server serves static files from apps/web/dist, so code changes to the web app require a build step before starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bootPhase was set to "ready" immediately after wsClient.connect(), but the WS handshake hadn't completed yet. Health loop then saw gateway-unreachable + bootPhase=ready → "unhealthy" → UI showed "已离线" during startup. Now bootPhase transitions to "ready" inside the onConnected callback, so the entire startup shows "starting" → "active" cleanly. Also adds temporary debug logs to home.tsx for startup diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Channel error status now shows translated lastError (e.g. "会话已过期" instead of generic "错误") - Controller maps WeChat "not configured" + not running to "session expired" for better UX - Add i18n keys for common channel errors (session expired, not configured, disabled) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use warning color (orange) instead of danger (red) for known recoverable errors like session expired, with actionable label "请重新连接" / "Reconnect required". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The exponential backoff for OpenClaw WebSocket reconnection could reach 16s+ during startup, causing the UI to stay in "starting" state for 20+ seconds. Cap at 4s so retry sequence is 1→2→4→4→4s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…way up When the health loop detects the gateway HTTP endpoint becomes reachable, it calls wsClient.retryNow() to cancel the backoff timer and connect immediately. This eliminates the 4-16s gap between gateway ready and WS connected during startup. Also replaces the ugly "Starting local services..." loading screen with a minimal Nexu logo pulse animation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- main.tsx had a duplicate SurfaceFrame with old loading text; replaced with Nexu logo pulse animation matching surface-frame.tsx - dev-launchd.sh now checks for dist/index.html and rebuilds desktop if missing, preventing blank screen after accidental dist deletion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace plain loading screen with animated Nexu logo matching the design system prototype (NexuLoader.tsx). Four quadrants light up sequentially in brand colors: orange, green, pink, gold. Pure CSS animation, no framer-motion dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
main.tsx had its own SurfaceFrame copy with the old loading screen. Now imports from components/surface-frame.tsx so both Runtime Console and Desktop Shell views use the same 4-color Nexu loader. Background updated to dark radial gradient matching desktop theme. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loader now overlays on top of the webview instead of replacing it. The webview loads silently in the background while the Nexu logo animation plays. When the webview fires dom-ready, the loader disappears — no blank frames, no intermediate Loader2 spinner. Background uses warm radial gradient for polished appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The spinning circle was briefly visible between the Nexu splash loader and the actual UI. Replace with an empty div since the desktop splash overlay already covers the loading period. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Run openclawProcess.prepare(), ensureRuntimeModelPlugin(), and prepareDesktopCloudModelsForBootstrap() in parallel (were sequential) - Remove redundant compileCurrentConfig() call for preSeedConfigHash — doSync() already seeds the hash via noteConfigWritten() - Reduce WS initial backoff from 1000ms to 500ms (sequence: 500→1000→2000→4000→4000...) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Quit dialog now shows Chinese or English based on app.getLocale(). Chinese users see "完全退出 / 后台运行 / 取消". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Packaged macOS builds now use launchd mode by default (no env var needed). This enables the quit dialog, crash recovery, and background service support in production. Can be explicitly disabled with NEXU_USE_LAUNCHD=0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix path traversal vulnerability in embedded-web-server (sanitize URL pathname, reject paths outside webRoot) - Install launchd quit handler after try/catch so it works even if auth bootstrap fails - Add error handling to quitWithDecision (was missing try/catch) - Fix isAlive() handling undefined pid from failed spawn - Rename NODE_PATH to NODE_BIN in dev script to avoid Node.js env var conflict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- openclaw-config-writer: additional config writing logic - plist-generator: updated plist generation + tests - package.json: script updates for launchd dev workflow - openclaw-weixin accounts.ts: prior session changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Electron allocates a non-default port (e.g. 18790 because 18789 is occupied), the controller needs to know this port for both the config compiler and WS/health connections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node.js rmSync with recursive+force can fail with ENOTEMPTY on macOS. Fall back to execFileSync rm -rf which handles this reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node.js rmSync can silently fail on macOS (ENOTEMPTY race). Now uses rm -rf exclusively with existence check, and retries up to 3 times with 1s pause between attempts. This ensures first-launch sidecar extraction succeeds reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace binary attach-or-cold-start with unified per-service flow: - Each service independently checked: running+healthy → keep, else restart - Ports recovered from runtime-ports.json when any service is still running - NEXU_HOME validated to prevent cross-environment attach - Missing services started with correct recovered ports - Unhealthy running services torn down and restarted This enables partial attach: if only OpenClaw survived a crash, the next launch reuses its port and only cold-starts the controller. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getLogDir() now accepts nexuHome param so dev mode writes launchd service logs to .tmp/desktop/nexu-home/logs/ instead of ~/.nexu/logs/. Also updates AGENTS.md with correct directory layout for dev vs packaged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites desktop-startup-flow.md with full implementation details: - Directory layout for dev vs packaged (with tree diagrams) - Label isolation between dev (.dev) and packaged modes - Unified bootstrap flow (attach + cold start per-service) - Port architecture and auto-allocation - Attach mechanism (full, partial, fallback) - Status display timeline - File watch hot reload - Exit behavior - OpenClaw sidecar extraction - Complete key files reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying nexu-docs with
|
| Latest commit: |
cdda66f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b1950477.nexu-docs.pages.dev |
| Branch Preview URL: | https://refactor-launchd-process-arc.nexu-docs.pages.dev |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR enhances desktop startup flow by implementing launchd service attach/port-recovery, dev/packaged mode isolation with separate NEXU_HOME directories, runtime-ports persistence, and improved service health checking. It includes sidecar extraction retries, dev-mode watchers for controller/web rebuilds, and comprehensive startup flow documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant Electron as Electron App
participant Runtime as runtime-ports.json
participant LC as launchctl
participant Health as Health Checks
participant Web as Web Server
Electron->>Runtime: Read persisted ports (if exist)
alt Ports found & dev mode match
Runtime-->>Electron: Return recovered ports
Electron->>LC: Check controller/openclaw status
LC-->>Electron: Service running status
alt Services still running
Electron->>Health: Probe controller /health
Health-->>Electron: Port validity + env check
alt Health OK & NEXU_HOME match
Electron->>Web: Start with recovered ports
Web-->>Electron: attachedPorts result
else Health check failed
Electron->>Electron: Fall through to cold-start
end
else Services stopped
Electron->>Electron: Fall through to cold-start
end
else No ports or mismatch
Electron->>Electron: Proceed to cold-start
end
alt Cold-start flow
Electron->>LC: Install & start controller/openclaw
LC-->>Electron: Services initializing
Electron->>Health: Poll controller readiness
Health-->>Electron: Controller ready
Electron->>Web: Start with fresh ports
Web-->>Electron: Persist ports to runtime-ports.json
end
Electron->>Electron: Render UI (gated by controller ready)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5da48e829
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Resolve conflicts keeping our attach/isolation/hot-reload improvements while incorporating main's changes: - Brand name: "Nexu" → "nexu" (lowercase, from bb1ddcc) - UI polish from 2fb384c - surface-frame: white background from main's design update Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c5da48e to
cdda66f
Compare
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/desktop/main/services/launchd-bootstrap.ts (1)
473-481:⚠️ Potential issue | 🟠 MajorUpdate plist when environment variables change, not just on first installation.
The
ensureServicefunction only generates and installs the plist whenisServiceInstalled(label)returns false. Once the service is installed and registered, the condition is never met again, so the plist is never regenerated even ifeffectivePorts,nexuHome, orgatewayTokenchange. This causes the running service to use stale values forPORT,OPENCLAW_GATEWAY_PORT,NEXU_HOME, andOPENCLAW_GATEWAY_TOKENfrom the original plist, while the bootstrap logic operates with new values, leading to mismatches on cold starts.Additionally,
installServiceonly callslaunchctl bootstrapif the service is not already registered. If it were called again with an updated plist, the service would still not pick up the changes becauselaunchctl kickstart(used inensureRunningandstartService) starts the already-loaded config in memory rather than reloading from disk.Fix by regenerating the plist when environment values change, or by using
bootoutfollowed bybootstrapto reload the configuration from disk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/launchd-bootstrap.ts` around lines 473 - 481, ensureService currently only generates and installs a plist when launchd.isServiceInstalled(label) is false, causing stale env vars; update ensureService to detect when generatePlist(type, plistEnv) produces different content than the installed plist and on change either (a) call launchd.bootout(label) then launchd.installService(label, newPlist) to bootstrap the updated plist, or (b) overwrite the on-disk plist and call launchctl bootstrap of the service so the running instance picks up new PORT/OPENCLAW_GATEWAY_PORT/NEXU_HOME/OPENCLAW_GATEWAY_TOKEN values; use generatePlist, installService, and the launchd bootout/bootstrap/ensureRunning or startService flows to perform the replace-and-restart sequence when plist content differs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/desktop/main/index.ts`:
- Around line 499-507: RuntimeOrchestrator was built earlier from
createRuntimeUnitManifests(..., runtimeConfig) but this branch only mutates
runtimeConfig when launchdResult.attachedPorts is present, so the existing
RuntimeOrchestrator still has old port endpoints; update/resync the orchestrator
after updating runtimeConfig by regenerating the manifests and rebuilding or
calling the appropriate refresh method on RuntimeOrchestrator (e.g., recreate or
call RuntimeOrchestrator.sync/update with createRuntimeUnitManifests(...,
runtimeConfig)) so all components use the attachedPorts values from
launchdResult.
In `@apps/desktop/main/runtime/manifests.ts`:
- Around line 191-206: The retry loop in ensurePackagedOpenclawSidecar() shells
out to PATH-dependent binaries ("rm", "tar", "sleep") which can fail in packaged
apps; replace those execFileSync calls so the extraction is self-contained: use
fs.rmSync or fs.rmdirSync with { recursive: true, force: true } to remove
extractedSidecarRoot, use a Node-based tar extractor (e.g.
require('tar').extract or another bundled tar library resolved via
require.resolve) to unpack archivePath into extractedSidecarRoot, and replace
the execFileSync("sleep", ["1"]) pause with a programmatic delay (convert the
function to async and use await new Promise(r => setTimeout(r, 1000)) or a
synchronous Atomics.wait alternative). Update MAX_RETRIES loop and
writeFileSync(stampPath, archiveStamp) usage accordingly so no PATH-dependent
binaries are invoked.
In `@apps/desktop/main/services/launchd-bootstrap.ts`:
- Around line 516-529: The fallback start for startEmbeddedWebServer uses port:
0 but never updates effectivePorts.webPort afterwards, so capture the actual
bound port from webServer (e.g. via the server object's listening address/port)
immediately after each successful start (both the try/catch block around
webServer creation and the similar block at the later occurrence) and assign it
to effectivePorts.webPort before any logging, attachedPorts updates, or writing
runtime-ports.json so persistence reflects the real listening port.
- Around line 497-505: Currently the code unconditionally SIGKILLs the PID
returned by detectPortOccupier for effectivePorts.openclawPort; instead, call a
verification step (e.g. inspect the occupier's command/args or process name via
the same detectPortOccupier result or a helper like getProcessInfo) and only
call process.kill(occupier.pid, "SIGKILL") when the command string or binary
matches known Nexu/OpenClaw signatures; if it does not match, do not kill—either
pick a new free port (update effectivePorts.openclawPort) or throw/return a
clear error indicating the port is occupied by a non-OpenClaw process so the
caller can decide. Ensure you reference detectPortOccupier, occupier.pid,
effectivePorts.openclawPort and process.kill in the change so reviewers can find
and update the kill path.
- Around line 323-337: detectPortOccupier currently calls execFileAsync("lsof",
...) which depends on PATH and swallows errors; change the call in
detectPortOccupier to use the absolute binary path "/usr/sbin/lsof" and update
the catch to surface or log the error (rather than silently returning null) so
missing-binary failures are visible; also scan related launchd service methods
that call launchctl or other system binaries and switch to absolute paths (e.g.,
"/bin/launchctl" or "/usr/bin/launchctl") and ensure their error handlers log or
propagate execution errors instead of swallowing them.
In `@apps/desktop/main/services/plist-generator.ts`:
- Around line 30-31: The OpenClaw plist generation does not include the nexuHome
-> NEXU_HOME environment variable, so when launchd-bootstrap.ts falls back to
openclawStatus.env?.NEXU_HOME the value is missing; update the OpenClaw plist
generation code to thread the nexuHome property into the plist's environment
(set NEXU_HOME to the provided nexuHome) the same way the controller plist does,
and make the same change in the second OpenClaw plist-generation block (the
other spot referenced around 155-183) so both generated OpenClaw plists include
NEXU_HOME for runtime partial-attach checks.
- Around line 81-89: The plist is missing a WEB_URL entry so launchd falls back
to http://localhost:5173; update the plist generation in plist-generator.ts (the
block that writes OPENCLAW_GATEWAY_PORT) to also emit a
<key>WEB_URL</key><string>...</string> when env.webUrl is present (use the same
escapeXml(env.webUrl) helper), placing it alongside OPENCLAW_GATEWAY_PORT so
compileOpenClawConfig() and the launchd controller receive the same origin
value.
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 146-149: Make the runtime-ports cleanup best-effort so it cannot
abort the quit sequence: call deleteRuntimePorts(opts.plistDir) in a try/catch
or fire-and-forget (Promise.catch) so any thrown error is swallowed/logged but
does not prevent setting __nexuForceQuit and calling app.quit(); ensure the
branch that currently checks opts.plistDir still runs the quit logic
unconditionally after scheduling or catching the deleteRuntimePorts call so
__nexuForceQuit and app.quit() always run even if deleteRuntimePorts fails.
In `@scripts/dev-launchd.sh`:
- Around line 156-157: The EXIT/INT/TERM trap currently references
CONTROLLER_WATCH_PID and WEB_WATCH_PID while they may be unset under set -euo
pipefail; update the trap to guard those variables (e.g. use parameter expansion
with defaults like ${CONTROLLER_WATCH_PID:-} and ${WEB_WATCH_PID:-} or build a
local pid string only if set) so kill is never given an unbound parameter, and
ensure stop_services is still invoked; locate the trap line that installs the
handler and replace the direct $CONTROLLER_WATCH_PID/$WEB_WATCH_PID references
with safe expansions or a conditional check before calling kill.
- Around line 175-183: The web watcher background subshell is inheriting set
-euo pipefail so a failed build command (the pipeline containing "pnpm --filter
`@nexu/web` build 2>&1 | tail -1 | sed 's/^/[web] /'") will terminate the loop; to
fix it, ensure the subshell does not exit on a non-zero build status by either
disabling errexit at the start of that subshell (e.g., run "set +e" inside the
subshell that contains the while loop) or by making the build pipeline tolerant
to failure (e.g., append "|| true" to the pnpm...|tail...|sed pipeline), and
preserve the existing echo messages so the loop continues watching for future
changes.
In `@specs/guides/desktop-startup-flow.md`:
- Around line 161-163: The startup docs are outdated: update the description of
waitForControllerReadiness() to state it polls /api/auth/get-session (not
/health) and note that the embedded web server port is allocated directly in
Electron via startEmbeddedWebServer() rather than via the controller config
chain; edit the text that references waitForControllerReadiness(), the polling
cadence, and the bootstrap flow (including lines referenced around 239–241) to
point readers to apps/desktop/main/services/launchd-bootstrap.ts for readiness
behavior and to the Electron startEmbeddedWebServer() implementation for port
allocation and debugging.
---
Outside diff comments:
In `@apps/desktop/main/services/launchd-bootstrap.ts`:
- Around line 473-481: ensureService currently only generates and installs a
plist when launchd.isServiceInstalled(label) is false, causing stale env vars;
update ensureService to detect when generatePlist(type, plistEnv) produces
different content than the installed plist and on change either (a) call
launchd.bootout(label) then launchd.installService(label, newPlist) to bootstrap
the updated plist, or (b) overwrite the on-disk plist and call launchctl
bootstrap of the service so the running instance picks up new
PORT/OPENCLAW_GATEWAY_PORT/NEXU_HOME/OPENCLAW_GATEWAY_TOKEN values; use
generatePlist, installService, and the launchd bootout/bootstrap/ensureRunning
or startService flows to perform the replace-and-restart sequence when plist
content differs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1159815-06ea-4eca-9350-cbd4ee32f343
📒 Files selected for processing (13)
AGENTS.mdapps/controller/src/app/env.tsapps/controller/src/lib/openclaw-config-compiler.tsapps/desktop/main/index.tsapps/desktop/main/runtime/manifests.tsapps/desktop/main/services/launchd-bootstrap.tsapps/desktop/main/services/launchd-manager.tsapps/desktop/main/services/plist-generator.tsapps/desktop/main/services/quit-handler.tsapps/desktop/src/components/surface-frame.tsxpackage.jsonscripts/dev-launchd.shspecs/guides/desktop-startup-flow.md
💤 Files with no reviewable changes (1)
- package.json
- Fix web port fallback: increment port instead of port 0, record actual port in effectivePorts for runtime-ports.json - Fix quit handler: catch deleteRuntimePorts errors to prevent blocking quit - Fix dev script: initialize watcher PIDs before trap to avoid set -u errors - Fix docs: probe endpoint is /api/auth/get-session not /health Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af8cb5e616
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| console.warn( | ||
| `Port ${effectivePorts.openclawPort} occupied by PID ${occupier.pid}, killing...`, | ||
| ); | ||
| try { | ||
| process.kill(occupier.pid, "SIGKILL"); |
There was a problem hiding this comment.
Avoid killing arbitrary process on OpenClaw port
When OpenClaw is not healthy, startup now force-kills whatever PID is listening on effectivePorts.openclawPort without verifying ownership. In packaged or dev runs, this can terminate an unrelated local service that happens to use that port, causing unexpected outages/data loss outside Nexu. This should only kill known stale Nexu/OpenClaw processes (or fail with a clear conflict) instead of unconditional SIGKILL on the first lsof result.
Useful? React with 👍 / 👎.
| !expectedNexuHome || | ||
| !runningNexuHome || | ||
| runningNexuHome === expectedNexuHome |
There was a problem hiding this comment.
Require validated NEXU_HOME before reusing recovered ports
The attach gate treats a missing runningNexuHome as acceptable (!runningNexuHome), so recovered ports are trusted even when environment identity cannot be verified. In partial-survival cases where controller is down, runningNexuHome is typically unavailable, which bypasses the cross-environment guard and can attach to stale launchd state from another dev workspace sharing the same labels. The check should fail closed when expectedNexuHome is set but no matching runtime value is available.
Useful? React with 👍 / 👎.
Summary
Incremental improvements to the launchd-based process architecture (#405):
Service Attach Mechanism
runtime-ports.jsonpersisted between sessionsDev/Packaged State Isolation
.tmp/desktop/nexu-home/(config, OpenClaw state, logs)~/.nexu/OPENCLAW_STATE_DIRnow derives fromNEXU_HOMEinstead of hardcoded~/.nexuNEXU_HOME/logs/not~/.nexu/logs/Hot Reload
pnpm startauto-watches controller (tsc --watch + launchctl kickstart) and web (polling + rebuild)Port Architecture
OPENCLAW_GATEWAY_PORTpassed to controller plist for consistent port allocationenv.openclawGatewayPort(not config store) to avoid stale port persistenceReliability
rm -rffallback (macOS ENOTEMPTY fix)auth=none(no gateway token) to avoid token mismatch on hot reloadDocumentation
specs/guides/desktop-startup-flow.mdAGENTS.mdwith correct directory layoutTest plan
pnpm typecheckpassespnpm testpassespnpm start→ cold start → stop → start (attach).tmp/desktop/nexu-home/🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation