Skip to content

fix(controller): use valid WebSocket close codes#365

Closed
lefarcen wants to merge 1 commit intomainfrom
fix/ws-close-code
Closed

fix(controller): use valid WebSocket close codes#365
lefarcen wants to merge 1 commit intomainfrom
fix/ws-close-code

Conversation

@lefarcen
Copy link
Copy Markdown
Collaborator

@lefarcen lefarcen commented Mar 23, 2026

Summary

  • Replace reserved WebSocket close code 1008 with private code 4008
  • Fix crash when controller encounters OpenClaw authentication errors

Problem

When the controller fails to authenticate with the OpenClaw gateway (e.g., "device signature invalid"), it attempts to close the WebSocket with code 1008 (Policy Violation). However, code 1008 is reserved for server use only. Node.js 22's native WebSocket implementation throws DOMException [InvalidAccessError]: invalid code when a client attempts to use it, causing the entire controller process to crash and enter a restart loop.

Solution

Use close code 4008 instead, which is in the valid private-use range (4000-4999) per RFC 6455.

Test plan

  • Build and package desktop app
  • Verify controller no longer crashes when OpenClaw returns authentication errors
  • Verify normal WebSocket connection/disconnection still works

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved WebSocket connection error handling for failed connection scenarios, including missing credentials and timeout conditions.

Replace reserved close code 1008 (Policy Violation) with private code
4008 when closing WebSocket connections on error. Code 1008 is reserved
for server use and Node.js 22's native WebSocket throws
DOMException [InvalidAccessError] when clients attempt to use it,
causing the controller process to crash on authentication failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f8f0fea-9cb4-48b8-bd4f-88349b8c3d1c

📥 Commits

Reviewing files that changed from the base of the PR and between 64215ab and d26a9f6.

📒 Files selected for processing (1)
  • apps/controller/src/runtime/openclaw-ws-client.ts

📝 Walkthrough

Walkthrough

WebSocket close codes are updated from 1008 to 4008 in three error handling scenarios within the OpenClaw WebSocket client: missing connect challenge nonce, connect timeout, and connect failure. Close reason strings remain unchanged.

Changes

Cohort / File(s) Summary
WebSocket Close Codes
apps/controller/src/runtime/openclaw-ws-client.ts
Updated three WebSocket close code calls from 1008 to 4008 in error handling paths (missing challenge nonce, connect timeout, connect failure).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 Hop, hop! The codes now jump to four-zero-oh-eight,\br>
When sockets fail and connections can't wait,\br>
Old 1008 hops away with a bound,\br>
Custom codes make us a rabbit's delight,\br>
Where protocols speak in custom light! 🌙

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description includes a comprehensive summary with the problem statement, solution, and test plan, but omits several required template sections like 'What', 'Why', 'How', and 'Affected areas'. Restructure the description to follow the template format with explicit 'What', 'Why', 'How' sections and check the 'Affected areas' checkbox for 'Controller' and 'OpenClaw runtime'.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: fixing WebSocket close codes in the controller from invalid to valid codes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ws-close-code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

lefarcen added a commit that referenced this pull request Mar 23, 2026
- Cherry-pick WebSocket close code fix from PR #365
- Change launchd namespace from com.nexu.* to io.nexu.*
- Add progress tracking directory with STATUS, DECISIONS, ISSUES

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
lefarcen added a commit that referenced this pull request Mar 24, 2026
* fix(controller): use valid WebSocket close codes

Replace reserved close code 1008 (Policy Violation) with private code
4008 when closing WebSocket connections on error. Code 1008 is reserved
for server use and Node.js 22's native WebSocket throws
DOMException [InvalidAccessError] when clients attempt to use it,
causing the controller process to crash on authentication failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: setup launchd refactor branch with progress tracking

- Cherry-pick WebSocket close code fix from PR #365
- Change launchd namespace from com.nexu.* to io.nexu.*
- Add progress tracking directory with STATUS, DECISIONS, ISSUES

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): implement launchd service management core

Phase 1-3 of launchd architecture refactor:

- LaunchdManager: wrapper for launchctl commands (install, start,
  stop, status, graceful shutdown)
- PlistGenerator: generates launchd plist XML for Controller and
  OpenClaw services with proper env vars and dependencies
- EmbeddedWebServer: serves static files and proxies API requests
  to Controller, replacing the web sidecar process

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): add launchd bootstrap module with feature flag

- launchd-bootstrap.ts: complete bootstrap flow for launchd-based
  startup (install services, start controller, start openclaw,
  start embedded web server)
- Feature flag NEXU_USE_LAUNCHD=1 for gradual rollout
- Unified log directory at ~/.nexu/logs/
- Path resolution for packaged vs dev environments
- Index file exporting all services

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): add quit handler with exit dialog

- quit-handler.ts: handles before-quit event with dialog
- Options: Quit Completely (stop services), Run in Background, Cancel
- Graceful shutdown of launchd services
- Exported via services/index.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(scripts): add launchd-based development script

scripts/dev-launchd.sh provides:
- start: generate plists, bootstrap and start services
- stop: gracefully stop services
- restart: stop then start
- status: show launchd service status
- logs: tail all log files

Uses io.nexu.*.dev labels and ~/.nexu/logs/ for logging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): integrate launchd bootstrap into main index.ts

- Add launchd service imports
- Add runLaunchdColdStart function that uses bootstrapWithLaunchd
- Check NEXU_USE_LAUNCHD=1 flag to choose bootstrap mode
- Install launchd quit handler after successful launchd bootstrap
- Modify before-quit handler to skip orchestrator cleanup in launchd mode
- Derive openclaw paths from nexuHome config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(desktop): add unit tests for launchd services

- launchd-manager.test.ts: tests for LaunchdManager class and SERVICE_LABELS
- plist-generator.test.ts: tests for generatePlist function

Tests cover:
- Platform check (darwin only)
- Default and custom plist directories
- UID-based domain construction
- Dev vs prod label generation
- Plist XML generation with correct structure
- XML character escaping
- Log path configuration
- Service dependencies

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(desktop): correct launchd OpenClaw startup and config paths

- Fix OpenClaw config paths to match controller defaults in env.ts
  (OPENCLAW_STATE_DIR=~/.nexu/runtime/openclaw/state)
- Add `gateway` subcommand to OpenClaw plist generation
- Use OPENCLAW_CONFIG_PATH env var instead of --config argument
- Add --auth none for dev mode to simplify local development
- Update tests to verify OPENCLAW_CONFIG_PATH env var presence

Tested with ./scripts/dev-launchd.sh - Controller and OpenClaw
WebSocket connection verified working.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): add starting status, optimize launchd startup, fix dev workflow

- Add "starting" RuntimeStatus: when OpenClaw gateway is unreachable but
  process is alive, show "启动中" instead of "已离线"
- Parallelize launchd service install/start + web server (Promise.all)
- Use adaptive readiness polling (50ms→250ms) instead of fixed 250ms
- Fix dev-launchd.sh stop: use bootout directly instead of SIGTERM+bootout
  race with KeepAlive; use SIGKILL for Electron to bypass quit handler
- Dev quit handler keeps services running (run-in-background) so vite HMR
  restarts don't kill launchd services
- Add tool progress prompt to nexu-platform-bootstrap plugin
- Disable humanDelay in config compiler
- Cold start time reduced from ~5s to ~2s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): fix stop cleanup and packaged OpenClaw path

- Stop: wait for ports to free after bootout, SIGKILL orphans including
  chrome_crashpad_handler
- Fix resolveLaunchdPaths for packaged mode: OpenClaw is at
  runtime/openclaw/node_modules/openclaw/openclaw.mjs, not
  runtime/openclaw-runtime/openclaw.mjs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): show "connecting" instead of "disconnected" during startup

When the WebSocket to OpenClaw gateway isn't connected yet (during
startup), channels were shown as "disconnected" (red). Now they show
as "connecting" (yellow pulse) when the runtime is still starting,
giving users a much less alarming startup experience.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(controller): add bootPhase to RuntimeState for startup awareness

Add explicit "booting" → "ready" lifecycle to ControllerRuntimeState.
During boot, gateway-unreachable is always treated as "starting" (not
"unhealthy"), regardless of whether the process manager owns the
OpenClaw process (fixes launchd mode where processManager.isAlive()
returns false). Channel live status also uses bootPhase to show
"connecting" during startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): build web alongside controller in dev-launchd start

The embedded web server serves static files from apps/web/dist, so
code changes to the web app require a build step before starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): defer bootPhase=ready until gateway WS connected

bootPhase was set to "ready" immediately after wsClient.connect(),
but the WS handshake hadn't completed yet. Health loop then saw
gateway-unreachable + bootPhase=ready → "unhealthy" → UI showed
"已离线" during startup.

Now bootPhase transitions to "ready" inside the onConnected callback,
so the entire startup shows "starting" → "active" cleanly.

Also adds temporary debug logs to home.tsx for startup diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(web): show friendly error labels for channel errors

- Channel error status now shows translated lastError (e.g. "会话已过期"
  instead of generic "错误")
- Controller maps WeChat "not configured" + not running to
  "session expired" for better UX
- Add i18n keys for common channel errors (session expired, not
  configured, disabled)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(web): show "请重新连接" for recoverable channel errors

Use warning color (orange) instead of danger (red) for known
recoverable errors like session expired, with actionable label
"请重新连接" / "Reconnect required".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf(controller): reduce WS reconnect max backoff from 30s to 4s

The exponential backoff for OpenClaw WebSocket reconnection could
reach 16s+ during startup, causing the UI to stay in "starting"
state for 20+ seconds. Cap at 4s so retry sequence is 1→2→4→4→4s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf(controller): health loop triggers immediate WS reconnect on gateway up

When the health loop detects the gateway HTTP endpoint becomes
reachable, it calls wsClient.retryNow() to cancel the backoff timer
and connect immediately. This eliminates the 4-16s gap between
gateway ready and WS connected during startup.

Also replaces the ugly "Starting local services..." loading screen
with a minimal Nexu logo pulse animation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): update loading screen in main.tsx + guard missing dist

- main.tsx had a duplicate SurfaceFrame with old loading text; replaced
  with Nexu logo pulse animation matching surface-frame.tsx
- dev-launchd.sh now checks for dist/index.html and rebuilds desktop
  if missing, preventing blank screen after accidental dist deletion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): colorful Nexu logo loader with 4-color stagger animation

Replace plain loading screen with animated Nexu logo matching the
design system prototype (NexuLoader.tsx). Four quadrants light up
sequentially in brand colors: orange, green, pink, gold.
Pure CSS animation, no framer-motion dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): deduplicate SurfaceFrame, use shared component in main.tsx

main.tsx had its own SurfaceFrame copy with the old loading screen.
Now imports from components/surface-frame.tsx so both Runtime Console
and Desktop Shell views use the same 4-color Nexu loader.

Background updated to dark radial gradient matching desktop theme.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(desktop): white background + 96px logo matching design prototype

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): seamless splash → UI transition with overlay loader

Loader now overlays on top of the webview instead of replacing it.
The webview loads silently in the background while the Nexu logo
animation plays. When the webview fires dom-ready, the loader
disappears — no blank frames, no intermediate Loader2 spinner.

Background uses warm radial gradient for polished appearance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(web): remove Loader2 spinner from auth pending state

The spinning circle was briefly visible between the Nexu splash
loader and the actual UI. Replace with an empty div since the
desktop splash overlay already covers the loading period.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf(controller): parallelize bootstrap prep + faster WS retry

- Run openclawProcess.prepare(), ensureRuntimeModelPlugin(), and
  prepareDesktopCloudModelsForBootstrap() in parallel (were sequential)
- Remove redundant compileCurrentConfig() call for preSeedConfigHash —
  doSync() already seeds the hash via noteConfigWritten()
- Reduce WS initial backoff from 1000ms to 500ms (sequence: 500→1000→2000→4000→4000...)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): localize quit dialog (zh-CN / en)

Quit dialog now shows Chinese or English based on app.getLocale().
Chinese users see "完全退出 / 后台运行 / 取消".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): enable launchd by default on macOS

Packaged macOS builds now use launchd mode by default (no env var
needed). This enables the quit dialog, crash recovery, and background
service support in production. Can be explicitly disabled with
NEXU_USE_LAUNCHD=0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit PR review findings

- Fix path traversal vulnerability in embedded-web-server (sanitize
  URL pathname, reject paths outside webRoot)
- Install launchd quit handler after try/catch so it works even if
  auth bootstrap fails
- Add error handling to quitWithDecision (was missing try/catch)
- Fix isAlive() handling undefined pid from failed spawn
- Rename NODE_PATH to NODE_BIN in dev script to avoid Node.js
  env var conflict

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: commit remaining working tree changes from launchd branch

- openclaw-config-writer: additional config writing logic
- plist-generator: updated plist generation + tests
- package.json: script updates for launchd dev workflow
- openclaw-weixin accounts.ts: prior session changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): pass OpenClaw gateway port to controller plist

Controller plist was missing OPENCLAW_GATEWAY_PORT env var, causing
it to use the default 18789 while the actual OpenClaw port was
dynamically allocated. Also adds RUNTIME_MANAGE_OPENCLAW_PROCESS=false
since launchd manages the OpenClaw process, not the controller.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): remove OPENCLAW_GATEWAY_PORT from controller plist

The Electron port allocator and the config store can disagree on the
OpenClaw gateway port. Passing the allocator's port via plist env var
caused a mismatch — controller connected to 18789 while OpenClaw
listened on 18790. Let the controller use its own default (18789)
which matches the config store's default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add openclaw-weixin test and tsconfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): correct packaged webRoot path for launchd mode

Web files are at resources/runtime/web/dist/ not resources/web/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): extract openclaw sidecar tar before launchd bootstrap

In packaged mode, the OpenClaw sidecar is distributed as payload.tar.gz.
The orchestrator mode already had extraction logic, but launchd mode
pointed directly at the unextracted archive path. Now reuses
ensurePackagedOpenclawSidecar() to extract on first run and cache
with stamp file validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): pass gateway auth token to controller plist

Packaged OpenClaw requires token auth (unlike dev mode which uses
--auth none). Controller WS client was rejected with token_missing.
Now passes runtimeConfig.tokens.gateway via OPENCLAW_GATEWAY_TOKEN
env var in the controller plist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): "Run in Background" hides window instead of quitting

Previously both quit options called app.quit(). Now "Run in Background"
hides all windows and keeps the process alive so launchd services
continue running. Clicking the Dock icon re-shows the window.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use bootout instead of SIGTERM for quit-completely

SIGTERM + KeepAlive causes launchd to restart the service immediately.
bootout atomically stops and unregisters the service, preventing
KeepAlive from respawning it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): hide window on close instead of destroying (macOS)

Standard Electron macOS pattern: intercept window close to hide
instead of destroy, preserving webview state. Clicking Dock icon
re-shows the same window without re-authentication.

Uses forceQuit flag set in before-quit to allow actual quit when
user chooses "Quit Completely" or Cmd+Q.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): fix forceQuit flag management for quit dialog

forceQuit was set in global before-quit handler, preventing "Run in
Background" from working (windows couldn't hide because forceQuit
was already true). Now forceQuit is only set via onForceQuit callback
when "Quit Completely" is actually chosen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): rewrite quit handler — background mode skips cleanup

"Run in Background" was running onBeforeQuit + closing web server
before hiding windows, which broke the web UI. Now it only hides
windows without any cleanup. Full cleanup only runs for
"Quit Completely".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): rewrite quit handler using window close event

Electron's before-quit doesn't reliably support async operations.
Moved dialog logic to the window close event handler instead:
- Window close shows quit dialog (packaged) or hides (dev)
- Cmd+Q / Dock quit redirects to window close handler
- Force-quit flag on app object for clean shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): prevent duplicate close handlers on main window

Remove browser-window-created listener that could add a second close
handler to the same window, causing the quit dialog to fire twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): add src to webview ref callback deps

useCallback dependency was missing src, so when src changed from null
to a URL, the ref callback didn't re-run and webview src was never set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): bind did-finish-load in ref callback + default to web surface

- Move webview load event listener into the ref callback to avoid race
  where dom-ready fires before useEffect binds. Use did-finish-load
  (fires after navigation complete) instead of dom-ready.
- Default activeSurface to "web" in both dev and packaged mode so the
  Nexu loader and web UI are visible immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): mock better-auth session in embedded web server

Launchd mode uses an embedded web server (no web sidecar process),
but better-auth's /api/auth/get-session endpoint was missing, causing
AuthLayout to block rendering. Add a mock desktop session response
so the web app proceeds past auth in desktop local mode.

Also removes temporary DevTools debug code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use vite dev server URL for HMR in dev mode

In dev mode, load the renderer from vite dev server (VITE_DEV_SERVER_URL)
instead of pre-built dist/index.html. This fixes blank pages after
vite HMR restarts Electron, since the dev server always serves the
latest code. Production still uses loadFile for the static dist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): add CORS headers to embedded web server for dev mode

When renderer loads from vite dev server (localhost:5180), fetch to
embedded web server (127.0.0.1:50810) is cross-origin. Add CORS
headers to allow dev mode requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(web): remove debug console.log from home page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): dev mode lets window close normally on quit

Dev mode was hiding the window on close (for vite HMR), but this
made Dock right-click quit feel broken. Now dev mode lets the
window close normally — services are managed by pnpm stop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): auto-stop launchd services when dev script exits

Add trap to dev-launchd.sh so launchd services are cleaned up when
Electron quits, Ctrl+C is pressed, or the script exits for any reason.
Previously services stayed running after Dock quit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): dev mode before-quit should not block app quit

The before-quit handler was calling preventDefault() even in dev mode,
preventing Dock right-click quit from working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI failures — SVG a11y lint + shared build order

- Add aria-label to loader SVG to fix biome a11y/noSvgWithoutTitle
- Use pnpm build (all packages) instead of individual filter builds
  in dev-launchd.sh to ensure @nexu/shared builds before web

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove launchd progress tracking docs

Development progress files no longer needed after implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): disable launchd in CI and non-packaged dev mode

CI runners may not support launchd properly. Now launchd is only
enabled via explicit NEXU_USE_LAUNCHD=1 (dev scripts) or in
packaged macOS apps. CI and plain `pnpm dev` use orchestrator mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): restore dev.sh as default start, launchd via start:launchd

pnpm start must be non-blocking for CI compatibility (desktop-check-dev.sh).
Restore dev.sh (tmux orchestrator) as default. Launchd mode available via:
- pnpm start:launchd / stop:launchd / restart:launchd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable launchd mode in desktop CI

Set NEXU_USE_LAUNCHD=0 so desktop-ci uses the tmux orchestrator
mode which is non-blocking and CI-compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use loadFile + direct electron launch for launchd mode

- Remove vite dev server loadURL (causes CORS + preload issues)
- dev-launchd.sh launches electron directly after build (no vite watch)
- Restore loadFile for all modes (file:// protocol, no CORS)
- CI desktop-check uses dev.sh directly (non-blocking tmux mode)
- package.json: pnpm start uses dev-launchd.sh (launchd mode)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
@lefarcen
Copy link
Copy Markdown
Collaborator Author

Superseded by #405 which includes this fix (commit 3157dc4).

@lefarcen lefarcen closed this Mar 24, 2026
lefarcen added a commit that referenced this pull request Mar 24, 2026
…load (#519)

* fix(controller): use valid WebSocket close codes

Replace reserved close code 1008 (Policy Violation) with private code
4008 when closing WebSocket connections on error. Code 1008 is reserved
for server use and Node.js 22's native WebSocket throws
DOMException [InvalidAccessError] when clients attempt to use it,
causing the controller process to crash on authentication failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: setup launchd refactor branch with progress tracking

- Cherry-pick WebSocket close code fix from PR #365
- Change launchd namespace from com.nexu.* to io.nexu.*
- Add progress tracking directory with STATUS, DECISIONS, ISSUES

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): implement launchd service management core

Phase 1-3 of launchd architecture refactor:

- LaunchdManager: wrapper for launchctl commands (install, start,
  stop, status, graceful shutdown)
- PlistGenerator: generates launchd plist XML for Controller and
  OpenClaw services with proper env vars and dependencies
- EmbeddedWebServer: serves static files and proxies API requests
  to Controller, replacing the web sidecar process

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): add launchd bootstrap module with feature flag

- launchd-bootstrap.ts: complete bootstrap flow for launchd-based
  startup (install services, start controller, start openclaw,
  start embedded web server)
- Feature flag NEXU_USE_LAUNCHD=1 for gradual rollout
- Unified log directory at ~/.nexu/logs/
- Path resolution for packaged vs dev environments
- Index file exporting all services

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): add quit handler with exit dialog

- quit-handler.ts: handles before-quit event with dialog
- Options: Quit Completely (stop services), Run in Background, Cancel
- Graceful shutdown of launchd services
- Exported via services/index.ts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(scripts): add launchd-based development script

scripts/dev-launchd.sh provides:
- start: generate plists, bootstrap and start services
- stop: gracefully stop services
- restart: stop then start
- status: show launchd service status
- logs: tail all log files

Uses io.nexu.*.dev labels and ~/.nexu/logs/ for logging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): integrate launchd bootstrap into main index.ts

- Add launchd service imports
- Add runLaunchdColdStart function that uses bootstrapWithLaunchd
- Check NEXU_USE_LAUNCHD=1 flag to choose bootstrap mode
- Install launchd quit handler after successful launchd bootstrap
- Modify before-quit handler to skip orchestrator cleanup in launchd mode
- Derive openclaw paths from nexuHome config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(desktop): add unit tests for launchd services

- launchd-manager.test.ts: tests for LaunchdManager class and SERVICE_LABELS
- plist-generator.test.ts: tests for generatePlist function

Tests cover:
- Platform check (darwin only)
- Default and custom plist directories
- UID-based domain construction
- Dev vs prod label generation
- Plist XML generation with correct structure
- XML character escaping
- Log path configuration
- Service dependencies

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(desktop): correct launchd OpenClaw startup and config paths

- Fix OpenClaw config paths to match controller defaults in env.ts
  (OPENCLAW_STATE_DIR=~/.nexu/runtime/openclaw/state)
- Add `gateway` subcommand to OpenClaw plist generation
- Use OPENCLAW_CONFIG_PATH env var instead of --config argument
- Add --auth none for dev mode to simplify local development
- Update tests to verify OPENCLAW_CONFIG_PATH env var presence

Tested with ./scripts/dev-launchd.sh - Controller and OpenClaw
WebSocket connection verified working.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(desktop): add starting status, optimize launchd startup, fix dev workflow

- Add "starting" RuntimeStatus: when OpenClaw gateway is unreachable but
  process is alive, show "启动中" instead of "已离线"
- Parallelize launchd service install/start + web server (Promise.all)
- Use adaptive readiness polling (50ms→250ms) instead of fixed 250ms
- Fix dev-launchd.sh stop: use bootout directly instead of SIGTERM+bootout
  race with KeepAlive; use SIGKILL for Electron to bypass quit handler
- Dev quit handler keeps services running (run-in-background) so vite HMR
  restarts don't kill launchd services
- Add tool progress prompt to nexu-platform-bootstrap plugin
- Disable humanDelay in config compiler
- Cold start time reduced from ~5s to ~2s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): fix stop cleanup and packaged OpenClaw path

- Stop: wait for ports to free after bootout, SIGKILL orphans including
  chrome_crashpad_handler
- Fix resolveLaunchdPaths for packaged mode: OpenClaw is at
  runtime/openclaw/node_modules/openclaw/openclaw.mjs, not
  runtime/openclaw-runtime/openclaw.mjs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): show "connecting" instead of "disconnected" during startup

When the WebSocket to OpenClaw gateway isn't connected yet (during
startup), channels were shown as "disconnected" (red). Now they show
as "connecting" (yellow pulse) when the runtime is still starting,
giving users a much less alarming startup experience.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(controller): add bootPhase to RuntimeState for startup awareness

Add explicit "booting" → "ready" lifecycle to ControllerRuntimeState.
During boot, gateway-unreachable is always treated as "starting" (not
"unhealthy"), regardless of whether the process manager owns the
OpenClaw process (fixes launchd mode where processManager.isAlive()
returns false). Channel live status also uses bootPhase to show
"connecting" during startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): build web alongside controller in dev-launchd start

The embedded web server serves static files from apps/web/dist, so
code changes to the web app require a build step before starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): defer bootPhase=ready until gateway WS connected

bootPhase was set to "ready" immediately after wsClient.connect(),
but the WS handshake hadn't completed yet. Health loop then saw
gateway-unreachable + bootPhase=ready → "unhealthy" → UI showed
"已离线" during startup.

Now bootPhase transitions to "ready" inside the onConnected callback,
so the entire startup shows "starting" → "active" cleanly.

Also adds temporary debug logs to home.tsx for startup diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(web): show friendly error labels for channel errors

- Channel error status now shows translated lastError (e.g. "会话已过期"
  instead of generic "错误")
- Controller maps WeChat "not configured" + not running to
  "session expired" for better UX
- Add i18n keys for common channel errors (session expired, not
  configured, disabled)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(web): show "请重新连接" for recoverable channel errors

Use warning color (orange) instead of danger (red) for known
recoverable errors like session expired, with actionable label
"请重新连接" / "Reconnect required".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf(controller): reduce WS reconnect max backoff from 30s to 4s

The exponential backoff for OpenClaw WebSocket reconnection could
reach 16s+ during startup, causing the UI to stay in "starting"
state for 20+ seconds. Cap at 4s so retry sequence is 1→2→4→4→4s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf(controller): health loop triggers immediate WS reconnect on gateway up

When the health loop detects the gateway HTTP endpoint becomes
reachable, it calls wsClient.retryNow() to cancel the backoff timer
and connect immediately. This eliminates the 4-16s gap between
gateway ready and WS connected during startup.

Also replaces the ugly "Starting local services..." loading screen
with a minimal Nexu logo pulse animation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): update loading screen in main.tsx + guard missing dist

- main.tsx had a duplicate SurfaceFrame with old loading text; replaced
  with Nexu logo pulse animation matching surface-frame.tsx
- dev-launchd.sh now checks for dist/index.html and rebuilds desktop
  if missing, preventing blank screen after accidental dist deletion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): colorful Nexu logo loader with 4-color stagger animation

Replace plain loading screen with animated Nexu logo matching the
design system prototype (NexuLoader.tsx). Four quadrants light up
sequentially in brand colors: orange, green, pink, gold.
Pure CSS animation, no framer-motion dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): deduplicate SurfaceFrame, use shared component in main.tsx

main.tsx had its own SurfaceFrame copy with the old loading screen.
Now imports from components/surface-frame.tsx so both Runtime Console
and Desktop Shell views use the same 4-color Nexu loader.

Background updated to dark radial gradient matching desktop theme.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(desktop): white background + 96px logo matching design prototype

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): seamless splash → UI transition with overlay loader

Loader now overlays on top of the webview instead of replacing it.
The webview loads silently in the background while the Nexu logo
animation plays. When the webview fires dom-ready, the loader
disappears — no blank frames, no intermediate Loader2 spinner.

Background uses warm radial gradient for polished appearance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(web): remove Loader2 spinner from auth pending state

The spinning circle was briefly visible between the Nexu splash
loader and the actual UI. Replace with an empty div since the
desktop splash overlay already covers the loading period.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf(controller): parallelize bootstrap prep + faster WS retry

- Run openclawProcess.prepare(), ensureRuntimeModelPlugin(), and
  prepareDesktopCloudModelsForBootstrap() in parallel (were sequential)
- Remove redundant compileCurrentConfig() call for preSeedConfigHash —
  doSync() already seeds the hash via noteConfigWritten()
- Reduce WS initial backoff from 1000ms to 500ms (sequence: 500→1000→2000→4000→4000...)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): localize quit dialog (zh-CN / en)

Quit dialog now shows Chinese or English based on app.getLocale().
Chinese users see "完全退出 / 后台运行 / 取消".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): enable launchd by default on macOS

Packaged macOS builds now use launchd mode by default (no env var
needed). This enables the quit dialog, crash recovery, and background
service support in production. Can be explicitly disabled with
NEXU_USE_LAUNCHD=0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit PR review findings

- Fix path traversal vulnerability in embedded-web-server (sanitize
  URL pathname, reject paths outside webRoot)
- Install launchd quit handler after try/catch so it works even if
  auth bootstrap fails
- Add error handling to quitWithDecision (was missing try/catch)
- Fix isAlive() handling undefined pid from failed spawn
- Rename NODE_PATH to NODE_BIN in dev script to avoid Node.js
  env var conflict

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: commit remaining working tree changes from launchd branch

- openclaw-config-writer: additional config writing logic
- plist-generator: updated plist generation + tests
- package.json: script updates for launchd dev workflow
- openclaw-weixin accounts.ts: prior session changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): pass OpenClaw gateway port to controller plist

Controller plist was missing OPENCLAW_GATEWAY_PORT env var, causing
it to use the default 18789 while the actual OpenClaw port was
dynamically allocated. Also adds RUNTIME_MANAGE_OPENCLAW_PROCESS=false
since launchd manages the OpenClaw process, not the controller.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): remove OPENCLAW_GATEWAY_PORT from controller plist

The Electron port allocator and the config store can disagree on the
OpenClaw gateway port. Passing the allocator's port via plist env var
caused a mismatch — controller connected to 18789 while OpenClaw
listened on 18790. Let the controller use its own default (18789)
which matches the config store's default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add openclaw-weixin test and tsconfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): correct packaged webRoot path for launchd mode

Web files are at resources/runtime/web/dist/ not resources/web/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): extract openclaw sidecar tar before launchd bootstrap

In packaged mode, the OpenClaw sidecar is distributed as payload.tar.gz.
The orchestrator mode already had extraction logic, but launchd mode
pointed directly at the unextracted archive path. Now reuses
ensurePackagedOpenclawSidecar() to extract on first run and cache
with stamp file validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): pass gateway auth token to controller plist

Packaged OpenClaw requires token auth (unlike dev mode which uses
--auth none). Controller WS client was rejected with token_missing.
Now passes runtimeConfig.tokens.gateway via OPENCLAW_GATEWAY_TOKEN
env var in the controller plist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): "Run in Background" hides window instead of quitting

Previously both quit options called app.quit(). Now "Run in Background"
hides all windows and keeps the process alive so launchd services
continue running. Clicking the Dock icon re-shows the window.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use bootout instead of SIGTERM for quit-completely

SIGTERM + KeepAlive causes launchd to restart the service immediately.
bootout atomically stops and unregisters the service, preventing
KeepAlive from respawning it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): hide window on close instead of destroying (macOS)

Standard Electron macOS pattern: intercept window close to hide
instead of destroy, preserving webview state. Clicking Dock icon
re-shows the same window without re-authentication.

Uses forceQuit flag set in before-quit to allow actual quit when
user chooses "Quit Completely" or Cmd+Q.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): fix forceQuit flag management for quit dialog

forceQuit was set in global before-quit handler, preventing "Run in
Background" from working (windows couldn't hide because forceQuit
was already true). Now forceQuit is only set via onForceQuit callback
when "Quit Completely" is actually chosen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): rewrite quit handler — background mode skips cleanup

"Run in Background" was running onBeforeQuit + closing web server
before hiding windows, which broke the web UI. Now it only hides
windows without any cleanup. Full cleanup only runs for
"Quit Completely".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): rewrite quit handler using window close event

Electron's before-quit doesn't reliably support async operations.
Moved dialog logic to the window close event handler instead:
- Window close shows quit dialog (packaged) or hides (dev)
- Cmd+Q / Dock quit redirects to window close handler
- Force-quit flag on app object for clean shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): prevent duplicate close handlers on main window

Remove browser-window-created listener that could add a second close
handler to the same window, causing the quit dialog to fire twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): add src to webview ref callback deps

useCallback dependency was missing src, so when src changed from null
to a URL, the ref callback didn't re-run and webview src was never set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): bind did-finish-load in ref callback + default to web surface

- Move webview load event listener into the ref callback to avoid race
  where dom-ready fires before useEffect binds. Use did-finish-load
  (fires after navigation complete) instead of dom-ready.
- Default activeSurface to "web" in both dev and packaged mode so the
  Nexu loader and web UI are visible immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): mock better-auth session in embedded web server

Launchd mode uses an embedded web server (no web sidecar process),
but better-auth's /api/auth/get-session endpoint was missing, causing
AuthLayout to block rendering. Add a mock desktop session response
so the web app proceeds past auth in desktop local mode.

Also removes temporary DevTools debug code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use vite dev server URL for HMR in dev mode

In dev mode, load the renderer from vite dev server (VITE_DEV_SERVER_URL)
instead of pre-built dist/index.html. This fixes blank pages after
vite HMR restarts Electron, since the dev server always serves the
latest code. Production still uses loadFile for the static dist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): add CORS headers to embedded web server for dev mode

When renderer loads from vite dev server (localhost:5180), fetch to
embedded web server (127.0.0.1:50810) is cross-origin. Add CORS
headers to allow dev mode requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(web): remove debug console.log from home page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): dev mode lets window close normally on quit

Dev mode was hiding the window on close (for vite HMR), but this
made Dock right-click quit feel broken. Now dev mode lets the
window close normally — services are managed by pnpm stop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): auto-stop launchd services when dev script exits

Add trap to dev-launchd.sh so launchd services are cleaned up when
Electron quits, Ctrl+C is pressed, or the script exits for any reason.
Previously services stayed running after Dock quit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): dev mode before-quit should not block app quit

The before-quit handler was calling preventDefault() even in dev mode,
preventing Dock right-click quit from working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI failures — SVG a11y lint + shared build order

- Add aria-label to loader SVG to fix biome a11y/noSvgWithoutTitle
- Use pnpm build (all packages) instead of individual filter builds
  in dev-launchd.sh to ensure @nexu/shared builds before web

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove launchd progress tracking docs

Development progress files no longer needed after implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): disable launchd in CI and non-packaged dev mode

CI runners may not support launchd properly. Now launchd is only
enabled via explicit NEXU_USE_LAUNCHD=1 (dev scripts) or in
packaged macOS apps. CI and plain `pnpm dev` use orchestrator mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): restore dev.sh as default start, launchd via start:launchd

pnpm start must be non-blocking for CI compatibility (desktop-check-dev.sh).
Restore dev.sh (tmux orchestrator) as default. Launchd mode available via:
- pnpm start:launchd / stop:launchd / restart:launchd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable launchd mode in desktop CI

Set NEXU_USE_LAUNCHD=0 so desktop-ci uses the tmux orchestrator
mode which is non-blocking and CI-compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use loadFile + direct electron launch for launchd mode

- Remove vite dev server loadURL (causes CORS + preload issues)
- dev-launchd.sh launches electron directly after build (no vite watch)
- Restore loadFile for all modes (file:// protocol, no CORS)
- CI desktop-check uses dev.sh directly (non-blocking tmux mode)
- package.json: pnpm start uses dev-launchd.sh (launchd mode)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(scripts): add reload commands for hot reload without full restart

New commands for faster development iteration:
- pnpm reload          — rebuild controller + web, restart controller service
- pnpm reload:controller — rebuild + restart controller only (~3s)
- pnpm reload:web      — rebuild web only (~5s, reload page to see changes)

Uses launchctl kickstart -k to restart individual services without
stopping Electron or OpenClaw. Much faster than full pnpm restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(scripts): auto-watch controller + web on pnpm start

pnpm start now automatically watches for file changes:
- Controller: tsc --watch → auto-restart launchd service on compile
- Web: vite build --watch → auto-rebuild static files

No extra commands needed. Save a file → changes auto-apply.
Controller restarts in ~2-3s, web rebuilds in ~5s (refresh to see).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): replace vite build --watch with safe polling watcher

vite build --watch conflicted with the Electron process (vite-plugin-electron
interference). Replace with:
- Controller: tsc --watch + launchctl kickstart -k (unchanged)
- Web: polling-based watcher using find+stat, triggers pnpm build on change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): use env port for gateway config instead of config store

gateway.port in openclaw.json was read from config store which could
persist a stale port from a previous session. Now uses env.openclawGatewayPort
(default 18789) which is the same source used by health probe and WS client.
Eliminates port mismatch between controller and OpenClaw.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add desktop startup flow guide

Comprehensive documentation of the launchd-based startup sequence,
architecture overview, port management, file watch, and exit behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify startup flow applies to both dev and packaged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(scripts): isolate dev state to repo-local .tmp/desktop/nexu-home

Dev mode now uses repo-scoped NEXU_HOME (.tmp/desktop/nexu-home)
instead of ~/.nexu/ to avoid conflicts with packaged app state.

- dev-launchd.sh: DEV_NEXU_HOME, logs under .tmp/desktop/logs
- plist-generator: pass NEXU_HOME env to controller service
- Packaged app unchanged (still uses ~/.nexu/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): derive OPENCLAW_STATE_DIR from NEXU_HOME

OPENCLAW_STATE_DIR had a hardcoded default of ~/.nexu/... which
ignored the NEXU_HOME override. Now defaults to NEXU_HOME/runtime/
openclaw/state, so dev mode (.tmp/desktop/nexu-home) and packaged
mode (~/.nexu) both get correct paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(controller): dev mode uses auth=none, OPENCLAW_STATE_DIR from NEXU_HOME

- Dev mode: don't pass gateway token to controller plist (auth=none)
  Avoids token mismatch when services are restarted via hot reload
- OPENCLAW_STATE_DIR derives from NEXU_HOME instead of hardcoded ~/.nexu

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(scripts): kill processes occupying openclaw port during cleanup

full_cleanup now force-kills any process on the openclaw port (18789),
including global openclaw instances that would prevent launchd
services from starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): implement launchd service attach mechanism

On startup, detect already-running launchd services and reuse them
instead of cold-starting. This enables instant resume after "Run in
Background" (packaged) or dev restart.

Attach flow:
1. Read runtime-ports.json for port metadata from previous session
2. Validate isDev mode and NEXU_HOME match
3. Extract env vars from running services via launchctl print
4. Probe controller /health and openclaw port
5. If all healthy, start embedded web server and attach

Fallback: if attach fails (stale services, env mismatch, unhealthy),
tear down and cold start as before.

Also:
- LaunchdManager.getServiceStatus() now parses environment variables
- runtime-ports.json written on cold start, deleted on quit-completely
- Port occupier detection kills rogue processes on openclaw port
- index.ts overrides runtimeConfig with attached ports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): use static import for node:net instead of require()

esbuild doesn't support typeof import() expressions. Use static
import { createConnection } from "node:net" instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): pass OPENCLAW_GATEWAY_PORT to controller plist

When Electron allocates a non-default port (e.g. 18790 because 18789
is occupied), the controller needs to know this port for both the
config compiler and WS/health connections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): fallback to rm -rf when rmSync fails on sidecar cleanup

Node.js rmSync with recursive+force can fail with ENOTEMPTY on macOS.
Fall back to execFileSync rm -rf which handles this reliably.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): retry sidecar extraction with rm -rf + verify

Node.js rmSync can silently fail on macOS (ENOTEMPTY race). Now uses
rm -rf exclusively with existence check, and retries up to 3 times
with 1s pause between attempts. This ensures first-launch sidecar
extraction succeeds reliably.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(desktop): unified bootstrap with partial attach support

Replace binary attach-or-cold-start with unified per-service flow:
- Each service independently checked: running+healthy → keep, else restart
- Ports recovered from runtime-ports.json when any service is still running
- NEXU_HOME validated to prevent cross-environment attach
- Missing services started with correct recovered ports
- Unhealthy running services torn down and restarted

This enables partial attach: if only OpenClaw survived a crash, the
next launch reuses its port and only cold-starts the controller.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): dev logs go to NEXU_HOME/logs instead of ~/.nexu/logs

getLogDir() now accepts nexuHome param so dev mode writes launchd
service logs to .tmp/desktop/nexu-home/logs/ instead of ~/.nexu/logs/.
Also updates AGENTS.md with correct directory layout for dev vs packaged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: comprehensive English startup flow guide

Rewrites desktop-startup-flow.md with full implementation details:
- Directory layout for dev vs packaged (with tree diagrams)
- Label isolation between dev (.dev) and packaged modes
- Unified bootstrap flow (attach + cold start per-service)
- Port architecture and auto-allocation
- Attach mechanism (full, partial, fallback)
- Status display timeline
- File watch hot reload
- Exit behavior
- OpenClaw sidecar extraction
- Complete key files reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR #519 review findings

- Fix web port fallback: increment port instead of port 0, record
  actual port in effectivePorts for runtime-ports.json
- Fix quit handler: catch deleteRuntimePorts errors to prevent
  blocking quit
- Fix dev script: initialize watcher PIDs before trap to avoid
  set -u errors
- Fix docs: probe endpoint is /api/auth/get-session not /health

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant