Skip to content

fix(desktop): launchd startup reliability hotfix#544

Merged
Siri-Ray merged 4 commits intomainfrom
hotfix/launchd-startup-reliability
Mar 25, 2026
Merged

fix(desktop): launchd startup reliability hotfix#544
Siri-Ray merged 4 commits intomainfrom
hotfix/launchd-startup-reliability

Conversation

@lefarcen
Copy link
Copy Markdown
Collaborator

@lefarcen lefarcen commented Mar 25, 2026

Relates to #510, #558

Summary

Fixes multiple launchd integration issues causing "app won't open" and "channels stuck on connecting" reports:

  • Plist config drift: after app upgrade, installService now compares plist content and re-bootstraps when changed (was silently using stale paths/env vars)
  • White screen on startup: renderer fetched runtimeConfig before cold-start resolved ports; IPC handler now gates on coldStartReady promise
  • Stop button ignored by KeepAlive: Stop now uses bootout (unregisters service) instead of SIGTERM; Start re-bootstraps from plist
  • Stale plist cleanup on startup: compares existing plists against freshly generated ones, cleans up mismatches from old installations
  • Force-quit recovery: detects dead Electron PID → uses fresh web port instead of stale recovered port
  • Orphaned services: services running without runtime-ports.json are torn down cleanly
  • NEXU_HOME validation: packaged mode now passes actual ~/.nexu for validation (was undefined → skipped)
  • Web port fallback: uses port:0 (OS-assigned) instead of fragile +1 increment
  • effectivePorts always returned: not just on attach, so runtimeConfig is always synced

Test plan

  • pnpm typecheck — passes
  • pnpm test — 176 tests pass (27 files), including 14 new startup scenario smoke tests
  • pnpm start — dev mode launches and displays UI correctly (even with packaged app occupying default port)
  • Manual: install packaged app → upgrade → verify no white screen
  • Manual: Force Quit → reopen → verify recovery
  • Manual: Stop/Start buttons in control panel

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • macOS launchd service support for automated runtime lifecycle and Start/Stop controls for launchd-managed units.
    • One-time state migration on packaged launch to preserve user runtime state during upgrade.
  • Documentation

    • Clarified packaged-mode storage: persistent user state vs transient Electron Application Support state; updated full-reset instructions.
  • Tests

    • Added extensive tests for launchd bootstrap, manager ops, startup scenarios, plist generation, and state migration.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef360fe851

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/desktop/main/runtime/daemon-supervisor.ts
Comment thread apps/desktop/main/index.ts
Comment thread apps/desktop/main/runtime/daemon-supervisor.ts
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Integrates launchd-mode for the desktop app, splits persistent vs packaged runtime state between ~/.nexu/ and Electron userData, adds cold-start synchronization and state migration (v0.1.5→v0.1.6), and extends the runtime orchestrator to manage launchd services, lifecycle, and incremental log tailing.

Changes

Cohort / File(s) Summary
Documentation
AGENTS.md
Reworked desktop storage semantics: two-directory model (NEXU_HOME vs Electron userData); clarified OPENCLAW_STATE_DIR assignment and updated reset instructions.
Cold-start & IPC
apps/desktop/main/index.ts, apps/desktop/main/ipc.ts
Added coldStartReady sync gate to IPC; conditional port allocation for launchd mode; legacy state migration during cold-start; wired launchd mode into orchestrator.
Runtime orchestration
apps/desktop/main/runtime/daemon-supervisor.ts, .../types.ts
Added launchd launch strategy, launchdManager, per-log offsets; implemented launchd lifecycle methods (start/stop/refresh/tail); added launchdLabel/launchdLogDir manifest fields.
Launchd service flow
apps/desktop/main/services/launchd-bootstrap.ts, apps/desktop/main/services/launchd-manager.ts
Refactored bootstrap: deterministic plist cleanup, free-port probing, attach vs cold-start logic, effectivePorts contract; installService now diffs plist content and reboots/bootouts; added waitForExit and rebootstrapFromPlist.
Plist & embedded server
apps/desktop/main/services/plist-generator.ts, apps/desktop/main/services/embedded-web-server.ts
Expanded PlistEnv with OpenClaw/controller env vars and conditional PATH injection; EmbeddedWebServer now reports actual port on start.
State migration & quit handling
apps/desktop/main/services/state-migration.ts, apps/desktop/main/services/quit-handler.ts
Added idempotent migrateOpenclawState (v0.1.5→v0.1.6) with agent/session merging; quit paths now bootout launchd units and use app.exit(0).
Types & UI
apps/desktop/shared/host.ts, apps/desktop/src/.../*.tsx, apps/desktop/src/hooks/*.ts
Added "launchd" to RuntimeUnitLaunchStrategy and launchd_* reason codes; UI treats launchd units as managed for start/stop and summary counts.
Tests
tests/desktop/...
Added comprehensive Vitest suites for launchd bootstrap scenarios, manager ops, startup edge cases, plist generation, and state migration.
Meta
apps/desktop/package.json
Version bumped 0.1.60.1.7.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Renderer as Renderer
    participant IPC as IPC Handlers
    participant Main as Main Process
    participant Bootstrap as Launchd Bootstrap
    participant Manager as Launchd Manager
    participant Orchestrator as Runtime Orchestrator
    participant Services as Launchd Services

    User->>Main: launch app
    Main->>Bootstrap: runLaunchdColdStart()
    Bootstrap->>Bootstrap: check legacy state
    alt legacy exists
        Bootstrap->>Bootstrap: migrateOpenclawState()
    end
    Bootstrap->>Manager: ensure/install services (generatePlist/findFreePort)
    Manager->>Services: launchctl bootstrap / bootout
    Bootstrap->>Bootstrap: startEmbeddedWebServer() -> returns effectivePorts
    Bootstrap-->>Main: return effectivePorts, isAttach
    Main->>Orchestrator: enableLaunchdMode(manager, labels, logDir)
    Main->>IPC: registerIpcHandlers(coldStartReady)
    IPC-->>Renderer: coldStartReady resolves
    Renderer->>IPC: host:invoke "env:get-runtime-config"
    IPC->>IPC: await coldStartReady
    IPC-->>Renderer: runtimeConfig (final ports)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • PerishCode
  • mrcfps
  • nettee

"I hopped through logs and plist delight,
Migrated sessions by moonlit night,
Ports found homes, services sing,
Cold-starts ready — what joy they bring!
A tiny rabbit cheers the orchestrator's flight." 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main fix as a launchd startup reliability hotfix, accurately summarizing the primary change across multiple files addressing launchd integration issues.
Description check ✅ Passed The PR description comprehensively covers all required template sections: What (clear summary of fixes), Why (lists specific problems solved), How (design decisions and approach), Affected areas (Desktop app checked), Checklist (all items addressed), and Notes for reviewers with test plan details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hotfix/launchd-startup-reliability

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lefarcen lefarcen force-pushed the hotfix/launchd-startup-reliability branch 7 times, most recently from 2c8f101 to 3962b61 Compare March 25, 2026 12:27
…ce, stale cleanup

Root causes fixed:
- Plist config drift: after app upgrade, launchd kept using old plist with
  stale paths/env vars. installService now compares content and re-bootstraps
  when changed.
- White screen on startup: renderer fetched runtimeConfig before cold-start
  updated ports (web port fallback to OS-assigned). IPC handler now gates on
  coldStartReady promise.
- Stop button ineffective: SIGTERM + KeepAlive = instant respawn. Stop now
  uses bootout (unregisters from launchd), Start re-bootstraps from plist.
- Stale plist cleanup: on startup, compares existing plists against freshly
  generated ones and cleans up mismatches from old installations/versions.

Other fixes:
- Packaged mode now passes actual nexuHome for NEXU_HOME validation
- Services running without runtime-ports.json are torn down cleanly
- Dead Electron PID detected via kill(pid,0) → fresh web port used
- Web port fallback uses port:0 (OS-assigned) instead of fragile +1
- EmbeddedWebServer exposes actual bound port
- effectivePorts always returned (not just on attach)
- Removed dead tryAttachToRunningServices code

Tests: 14 new startup scenario smoke tests covering all edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lefarcen lefarcen force-pushed the hotfix/launchd-startup-reliability branch from 3962b61 to ac311a6 Compare March 25, 2026 12:34
@lefarcen lefarcen changed the base branch from release/v0.1.6 to main March 25, 2026 12:38
…p-reliability

# Conflicts:
#	apps/desktop/main/services/launchd-bootstrap.ts
#	apps/desktop/package.json
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 25, 2026

Deploying nexu-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1a682e1
Status: ✅  Deploy successful!
Preview URL: https://0939c975.nexu-docs.pages.dev
Branch Preview URL: https://hotfix-launchd-startup-relia.nexu-docs.pages.dev

View logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (15)
apps/desktop/main/runtime/types.ts (1)

23-26: Consider making launchd metadata required when launchStrategy is "launchd".

launchdLabel and launchdLogDir are optional for all strategies, so type checks won’t prevent incomplete launchd manifests. A discriminated union would catch this earlier.

♻️ Suggested typing refinement
 export type RuntimeUnitManifest = {
   id: RuntimeUnitId;
   label: string;
   kind: RuntimeUnitKind;
   launchStrategy: RuntimeUnitLaunchStrategy;
@@
-  launchdLabel?: string;
-  launchdLogDir?: string;
+  launchdLabel?: string;
+  launchdLogDir?: string;
@@
 };
+
+export type RuntimeUnitManifestTyped =
+  | (RuntimeUnitManifest & {
+      launchStrategy: "launchd";
+      launchdLabel: string;
+      launchdLogDir: string;
+    })
+  | (RuntimeUnitManifest & {
+      launchStrategy: Exclude<RuntimeUnitLaunchStrategy, "launchd">;
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/runtime/types.ts` around lines 23 - 26, Refine the runtime
config types so that when launchStrategy is "launchd" the launchd-specific
fields are required: replace the single interface with a discriminated union
keyed on launchStrategy (e.g. type RuntimeConfig = LaunchdRuntimeConfig |
OtherRuntimeConfig), where LaunchdRuntimeConfig has launchStrategy: "launchd"
and required launchdLabel and launchdLogDir, and OtherRuntimeConfig has the
other strategy literal(s) and keeps those fields optional; update any references
expecting the old type to use the new RuntimeConfig/discriminant
(launchStrategy) for exhaustive checks.
tests/desktop/plist-generator.test.ts (1)

20-27: Add assertions for the newly introduced plist environment keys.

mockEnv now includes new controller/openclaw path fields, but the tests don’t verify those keys are present in generated plist output yet.

🧪 Suggested assertion additions
   const plist = generatePlist("controller", mockEnv);
@@
   expect(plist).toContain("<key>PORT</key>");
   expect(plist).toContain("<string>50800</string>");
+  expect(plist).toContain("<key>WEB_URL</key>");
+  expect(plist).toContain("<string>http://127.0.0.1:50801</string>");
+  expect(plist).toContain("<key>OPENCLAW_SKILLS_DIR</key>");
+  expect(plist).toContain("<key>SKILLHUB_STATIC_SKILLS_DIR</key>");
+  expect(plist).toContain("<key>PLATFORM_TEMPLATES_DIR</key>");
+  expect(plist).toContain("<key>OPENCLAW_BIN</key>");
+  expect(plist).toContain("<key>OPENCLAW_EXTENSIONS_DIR</key>");
+  expect(plist).toContain("<key>SKILL_NODE_PATH</key>");
+  expect(plist).toContain("<key>OPENCLAW_TMP_DIR</key>");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/desktop/plist-generator.test.ts` around lines 20 - 27, The test's
mockEnv now includes new controller/openclaw keys (webUrl, openclawSkillsDir,
skillhubStaticSkillsDir, platformTemplatesDir, openclawBinPath,
openclawExtensionsDir, skillNodePath, openclawTmpDir) but the plist-generator
test doesn't assert they appear in the generated plist; update the test that
consumes mockEnv (the plist generator test) to add assertions that the generated
plist output contains each of those keys and that their values match mockEnv
(e.g., assert presence of "webUrl" and its value, "openclawSkillsDir", etc.),
using the same generator invocation currently in the test to produce the plist
string/object.
tests/desktop/state-migration.test.ts (1)

25-41: Minor: Parent temp directory not cleaned up.

The base directory created by mkdtempSync (e.g., /tmp/nexu-migration-XXXXXX/) isn't tracked for cleanup in afterEach. Only sourceDir and targetDir (subdirectories) are deleted, leaving the empty parent behind. Over many test runs this accumulates garbage in /tmp.

♻️ Suggested fix to clean up parent directory
+let testBaseDir: string;
 let sourceDir: string;
 let targetDir: string;
 const logMessages: string[] = [];
 const log = (msg: string) => logMessages.push(msg);

 beforeEach(() => {
-  const base = mkdtempSync(join(tmpdir(), "nexu-migration-"));
-  sourceDir = join(base, "source");
-  targetDir = join(base, "target");
+  testBaseDir = mkdtempSync(join(tmpdir(), "nexu-migration-"));
+  sourceDir = join(testBaseDir, "source");
+  targetDir = join(testBaseDir, "target");
   mkdirSync(sourceDir, { recursive: true });
   mkdirSync(targetDir, { recursive: true });
   logMessages.length = 0;
 });

 afterEach(() => {
   try {
-    rmSync(sourceDir, { recursive: true, force: true });
-    rmSync(targetDir, { recursive: true, force: true });
+    rmSync(testBaseDir, { recursive: true, force: true });
   } catch {
     // cleanup best effort
   }
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/desktop/state-migration.test.ts` around lines 25 - 41, The temp parent
created by mkdtempSync in beforeEach isn't removed; change the setup to store
that parent path in a scoped variable (e.g., base or baseDir) accessible to
afterEach, and in afterEach call rmSync on that parent (with { recursive: true,
force: true }) in addition to removing sourceDir and targetDir; update
beforeEach/afterEach to reference the same variable names (created in beforeEach
and cleaned in afterEach) so the mkdtempSync parent folder is removed after each
test.
tests/desktop/launchd-bootstrap.test.ts (1)

57-61: Mock missing port property that production code expects.

The EmbeddedWebServer interface (in embedded-web-server.ts) now includes port: number, but this mock only returns { close: ... }. While current tests may pass because effectivePorts comes from env defaults, this could mask bugs if tests later check the actual web server port resolution.

♻️ Suggested fix to include port in mock
 vi.mock("../../apps/desktop/main/services/embedded-web-server", () => ({
   startEmbeddedWebServer: vi.fn().mockResolvedValue({
+    port: 50810,
     close: vi.fn().mockResolvedValue(undefined),
   }),
 }));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/desktop/launchd-bootstrap.test.ts` around lines 57 - 61, The test mock
for startEmbeddedWebServer is missing the required port property from the
EmbeddedWebServer interface; update the vi.mock for
"../../apps/desktop/main/services/embedded-web-server" so the mocked
startEmbeddedWebServer (used in launchd-bootstrap.test.ts) resolves to an object
that includes both port: <number> (use the same default/effective port your
tests expect) and close: fn(), e.g. return { port: <expectedPort>, close:
fn().mockResolvedValue(undefined) }; ensure the mocked shape matches the
EmbeddedWebServer interface so port resolution logic in code under test behaves
the same as production.
apps/desktop/main/services/quit-handler.ts (1)

211-231: Minor inconsistency in error logging severity.

In the window close handler (line 144), bootoutService errors use console.error, but here both bootout and waitForExit errors use console.warn. Consider aligning the severity for consistency.

♻️ Suggested alignment
       try {
         await opts.launchd.bootoutService(label);
       } catch (err) {
-        console.warn(
+        console.error(
           `bootout ${label} failed: ${err instanceof Error ? err.message : String(err)}`,
         );
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/quit-handler.ts` around lines 211 - 231, The catch
for opts.launchd.bootoutService is using console.warn here but elsewhere (window
close handler) bootoutService errors use console.error; update the catch for
bootoutService to use console.error (and optionally change the waitForExit catch
to console.error as well) inside the decision === "quit-completely" block so
error severity is consistent; target the catch blocks around
opts.launchd.bootoutService and opts.launchd.waitForExit to make this change.
tests/desktop/launchd-manager-ops.test.ts (2)

179-179: Remove unused variable _callCount.

The variable _callCount is declared and incremented but never used. This appears to be debug remnant.

♻️ Suggested fix
-      let _callCount = 0;
       mockExecFile.mockImplementation(
         (
           _cmd: string,
           args: string[],
           callback: (
             error: Error | null,
             result: { stdout: string; stderr: string },
           ) => void,
         ) => {
-          _callCount++;
           if (args.includes("print")) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/desktop/launchd-manager-ops.test.ts` at line 179, Remove the unused
debug variable _callCount: delete its declaration (let _callCount = 0;) and
remove any increments or references to _callCount within the test (e.g., places
that do _callCount++), leaving the rest of the test logic intact; this cleans up
the test file and eliminates an unused variable warning.

73-77: Consider restoring process.platform after tests.

Modifying process.platform via Object.defineProperty in beforeEach without restoring the original value in afterEach could leak state to other test files if running in the same process. While Vitest typically isolates test files, it's safer to capture and restore:

♻️ Suggested fix
 describe("LaunchdManager", () => {
+  const originalPlatform = process.platform;
+
   beforeEach(() => {
     vi.clearAllMocks();
     Object.defineProperty(process, "platform", { value: "darwin" });
   });
+
+  afterEach(() => {
+    Object.defineProperty(process, "platform", { value: originalPlatform });
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/desktop/launchd-manager-ops.test.ts` around lines 73 - 77, Save the
original process.platform to a variable and restore it after each test: declare
a variable (e.g., let originalPlatform: NodeJS.Platform) in the describe scope,
set originalPlatform = process.platform in beforeEach before you call
Object.defineProperty, and add an afterEach that restores process.platform using
Object.defineProperty(process, "platform", { value: originalPlatform }); keep
the existing vi.clearAllMocks() behavior.
apps/desktop/main/index.ts (2)

585-594: Direct mutation of runtimeConfig works but could be fragile.

The runtime config is mutated directly after launchd bootstrap to reflect effective ports. This works because the same object reference is passed to registerIpcHandlers. Consider adding a comment explaining this dependency, or refactoring to make the flow more explicit.

📝 Suggested documentation
   // Always sync runtimeConfig with actual effective ports — these may differ
   // from the initial config if ports were recovered from a previous session or
   // OS-assigned due to conflicts.
+  // NOTE: This mutates the same object passed to registerIpcHandlers(), which
+  // is gated by coldStartReady, so IPC callers always see the final values.
   const { controllerPort, openclawPort, webPort } =
     launchdResult.effectivePorts;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/index.ts` around lines 585 - 594, The code directly mutates
runtimeConfig after reading launchdResult.effectivePorts (controllerPort,
openclawPort, webPort) and relies on the fact that the same object reference was
previously passed to registerIpcHandlers; make this explicit by either (A)
adding a short comment above the mutation referencing registerIpcHandlers and
that it relies on object identity, or (B) refactoring to avoid implicit mutation
by creating a new config object (copying existing runtimeConfig, updating
ports/urls) and passing that updated object into registerIpcHandlers (or calling
registerIpcHandlers only after updating runtimeConfig) so the flow is explicit
and robust. Ensure references to runtimeConfig, launchdResult.effectivePorts,
and registerIpcHandlers are updated accordingly.

509-521: Migration code is synchronous but marked for removal—optional refactor only if proven problematic.

The migration correctly runs before services start and uses synchronous operations (cpSync) to migrate critical agent sessions and identity files. However, this migration is explicitly temporary—the code contains a TODO: "Remove this migration after v0.3+ when all users have upgraded past v0.1.6."

Since this is a one-time upgrade path scheduled for removal, converting to async is not urgent. Only refactor if startup responsiveness during v0.1.6→current upgrades becomes a reported issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/index.ts` around lines 509 - 521, The migration block runs
synchronously (using migrateOpenclawState) during startup which is intentional
and temporary; if you want to avoid blocking startup, make migrateOpenclawState
asynchronous and await it here instead of calling it synchronously: change
migrateOpenclawState to return a Promise, call await migrateOpenclawState({
targetStateDir: openclawStateDir, sourceStateDir: legacyStateDir, log: (msg) =>
logColdStart(`state-migration: ${msg}`) }) inside this non-dev branch (ensure
the surrounding function is async), otherwise leave the current synchronous
behavior as-is until the TODO removal.
apps/desktop/main/services/launchd-bootstrap.ts (1)

273-281: Port probing could be slow if many consecutive ports are occupied.

findFreePort probes up to 10 ports sequentially using detectPortOccupier (which spawns lsof). If ports 50800-50809 are all occupied, this spawns 10 lsof processes before falling back to port 0. Consider probing fewer ports or using a faster check:

♻️ Reduce probing attempts or use socket-based check
-async function findFreePort(preferred: number): Promise<number> {
-  for (let offset = 0; offset < 10; offset++) {
+async function findFreePort(preferred: number): Promise<number> {
+  // Try fewer ports — if preferred is taken, likely so are adjacent ones
+  for (let offset = 0; offset < 3; offset++) {
     const port = preferred + offset;
     const occupier = await detectPortOccupier(port);
     if (!occupier) return port;
   }
-  // All 10 ports occupied — let OS assign
+  // Ports occupied — let OS assign
   return 0;
 }

The current approach is acceptable for correctness; this is just an optimization suggestion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/launchd-bootstrap.ts` around lines 273 - 281, The
findFreePort function currently probes up to 10 ports sequentially using
detectPortOccupier (which shells out to lsof); to avoid slowdowns when many
ports are occupied, either reduce the max attempts (e.g., try 2–3 offsets
instead of 10) or replace detectPortOccupier with a socket-bind check that
attempts to listen on the candidate port (using Node's net.createServer/listen
and immediately close on success) and returns free/busy as a Promise; update
findFreePort to call the new/changed helper and fall back to 0 if none succeed.
apps/desktop/main/runtime/daemon-supervisor.ts (3)

670-695: execFileSync in refreshLaunchdUnit blocks the main thread.

Using execFileSync in the refresh path means getRuntimeState() blocks the main process while waiting for launchctl. With a 3s timeout, this could cause UI jank if launchctl is slow (e.g., system under load).

Consider whether this refresh needs to be synchronous, or if the orchestrator could use cached state with async updates:

However, I understand the sync requirement comes from getRuntimeState() being called synchronously. The 3s timeout is a reasonable safeguard. If this causes issues in practice, consider debouncing refreshes or making the IPC handler async.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/runtime/daemon-supervisor.ts` around lines 670 - 695,
refreshLaunchdUnit currently uses execFileSync which blocks the main thread;
change it to a non-blocking implementation: replace execFileSync with the async
child_process.execFile (or util.promisify(execFile)) inside refreshLaunchdUnit,
return a Promise (make refreshLaunchdUnit async) and propagate async behavior to
callers like getRuntimeState or the IPC handler that invokes it; if you cannot
change callers to async, implement a cached-state approach where
refreshLaunchdUnit performs an async refresh and updates an in-memory cache
(e.g., runtimeStateCache) that synchronous getRuntimeState reads, and ensure
callers of the IPC handler are updated to await the async refresh or read the
cached value as appropriate. Ensure you modify function names refreshLaunchdUnit
and getRuntimeState and the IPC handler that calls them when applying the fix.

851-863: Fixed 1-second delay after startService could be replaced with active polling.

The 1-second setTimeout delay before refreshing status is a workaround. A more responsive approach would poll getServiceStatus in a tight loop with backoff until running or timeout. However, this is a minor optimization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/runtime/daemon-supervisor.ts` around lines 851 - 863,
Replace the fixed 1-second setTimeout after launchdManager.startService with an
active polling loop: repeatedly call refreshLaunchdUnit(record) (or directly
query the manager via a getServiceStatus/getService method if available) with a
short interval and exponential backoff until record.phase === ("running" as
RuntimeUnitRecord["phase"]) or a configurable timeout elapses; then call
logStateChange with reasonCode "start_succeeded" or "start_failed" based on the
final state. Ensure the loop aborts on errors, respects a max timeout, and does
not block the event loop (use await with sleep intervals). Keep the existing
symbols: launchdManager.startService, refreshLaunchdUnit, logStateChange, and
the RuntimeUnitRecord["phase"] check.

740-805: Log tailing implementation is solid but has a subtle edge case.

The byte-offset tracking and line-buffering logic is well-implemented. However, if a log file is truncated/rotated (size becomes smaller than prevOffset), fileSize <= prevOffset will skip reading entirely. Consider detecting truncation:

♻️ Handle log rotation/truncation
         const prevOffset = this.launchdLogOffsets.get(logFile.path) ?? 0;
         const fileSize = stat.size;
 
-        if (fileSize <= prevOffset) continue;
+        if (fileSize < prevOffset) {
+          // File was truncated (rotated) — reset to start
+          this.launchdLogOffsets.set(logFile.path, 0);
+        } else if (fileSize === prevOffset) {
+          continue; // No new content
+        }
+        const currentOffset = this.launchdLogOffsets.get(logFile.path) ?? 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/runtime/daemon-supervisor.ts` around lines 740 - 805,
tailLaunchdLogs currently skips reading when fileSize <= prevOffset which misses
the truncation/rotation case; change the logic in tailLaunchdLogs (where
prevOffset is read from this.launchdLogOffsets) so that if fileSize < prevOffset
you treat the file as truncated: reset prevOffset to 0 (and update
this.launchdLogOffsets.set(logFile.path, 0) if desired) and continue processing
from the start of the file, whereas only when fileSize === prevOffset should you
continue (no new data). Adjust the readStart computation to use the updated
prevOffset so truncation will be handled correctly and you still apply the
existing maxRead cap.
apps/desktop/main/services/state-migration.ts (2)

158-166: Potential issue: cpSync without error handling for individual session files.

If copying a single session file fails (e.g., permission denied), the entire migration won't be marked complete. This is probably desired behavior, but consider whether logging the specific failed file would help debugging.

♻️ Suggested improvement for debuggability
     for (const file of sessionFiles) {
       const sourceFile = join(sourceSessionsDir, file);
       const targetFile = join(targetSessionsDir, file);
 
       if (!existsSync(targetFile)) {
-        cpSync(sourceFile, targetFile);
-        count++;
+        try {
+          cpSync(sourceFile, targetFile);
+          count++;
+        } catch (err) {
+          log(`failed to copy session file ${file}: ${err instanceof Error ? err.message : String(err)}`);
+          throw err; // Re-throw to prevent stamp from being written
+        }
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/state-migration.ts` around lines 158 - 166, Wrap
the per-file copy inside a try/catch so a single failing session file doesn't
abort the whole migration and so you can log which file failed; specifically, in
the loop that iterates sessionFiles and calls cpSync(sourceFile, targetFile)
(using sourceSessionsDir, targetSessionsDir, and updating count), catch errors
from cpSync, increment no count for failed files, and call the module's logger
(or console.error) with the file path and the caught error to aid debugging
while continuing to the next file.

191-196: safeReaddir silently swallows all errors.

While filtering out dotfiles is good, catching all errors without logging could hide issues like permission denied. Consider logging unexpected errors:

♻️ Suggested improvement
-function safeReaddir(dir: string): string[] {
-  try {
-    return readdirSync(dir).filter((name) => !name.startsWith("."));
-  } catch {
-    return [];
-  }
-}
+function safeReaddir(dir: string, log?: (msg: string) => void): string[] {
+  try {
+    return readdirSync(dir).filter((name) => !name.startsWith("."));
+  } catch (err) {
+    // ENOENT is expected if dir doesn't exist; log other errors
+    if (err instanceof Error && "code" in err && err.code !== "ENOENT" && log) {
+      log(`warning: could not read directory ${dir}: ${err.message}`);
+    }
+    return [];
+  }
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/desktop/main/services/state-migration.ts` around lines 191 - 196,
safeReaddir currently swallows all exceptions which can hide issues (e.g.,
permission errors); modify the function (safeReaddir) to catch the error into a
variable (catch (err)) and only silently return [] for expected cases like
ENOENT/ENOENT-like "not found" while logging unexpected errors (use
console.warn/console.error or the app logger) with context including the dir and
the error before returning []; keep the existing readdirSync usage and dotfile
filter but ensure unexpected errors are surfaced in logs instead of being
silently ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Line 86: Update the full-reset instruction in AGENTS.md so it uses the
app-scoped userData path instead of the broader Application Support folder:
replace the mention of `~/Library/Application Support/@nexu/` with
`~/Library/Application Support/@nexu/desktop/` in the sentence that currently
reads “To fully reset local desktop + controller state, stop the stack, remove
`.tmp/desktop/`, then remove `~/.nexu/` and `~/Library/Application
Support/@nexu/`.” Ensure the new wording keeps the same ordering and clarity
about stopping the stack and removing `.tmp/desktop/` and `~/.nexu/`.

---

Nitpick comments:
In `@apps/desktop/main/index.ts`:
- Around line 585-594: The code directly mutates runtimeConfig after reading
launchdResult.effectivePorts (controllerPort, openclawPort, webPort) and relies
on the fact that the same object reference was previously passed to
registerIpcHandlers; make this explicit by either (A) adding a short comment
above the mutation referencing registerIpcHandlers and that it relies on object
identity, or (B) refactoring to avoid implicit mutation by creating a new config
object (copying existing runtimeConfig, updating ports/urls) and passing that
updated object into registerIpcHandlers (or calling registerIpcHandlers only
after updating runtimeConfig) so the flow is explicit and robust. Ensure
references to runtimeConfig, launchdResult.effectivePorts, and
registerIpcHandlers are updated accordingly.
- Around line 509-521: The migration block runs synchronously (using
migrateOpenclawState) during startup which is intentional and temporary; if you
want to avoid blocking startup, make migrateOpenclawState asynchronous and await
it here instead of calling it synchronously: change migrateOpenclawState to
return a Promise, call await migrateOpenclawState({ targetStateDir:
openclawStateDir, sourceStateDir: legacyStateDir, log: (msg) =>
logColdStart(`state-migration: ${msg}`) }) inside this non-dev branch (ensure
the surrounding function is async), otherwise leave the current synchronous
behavior as-is until the TODO removal.

In `@apps/desktop/main/runtime/daemon-supervisor.ts`:
- Around line 670-695: refreshLaunchdUnit currently uses execFileSync which
blocks the main thread; change it to a non-blocking implementation: replace
execFileSync with the async child_process.execFile (or util.promisify(execFile))
inside refreshLaunchdUnit, return a Promise (make refreshLaunchdUnit async) and
propagate async behavior to callers like getRuntimeState or the IPC handler that
invokes it; if you cannot change callers to async, implement a cached-state
approach where refreshLaunchdUnit performs an async refresh and updates an
in-memory cache (e.g., runtimeStateCache) that synchronous getRuntimeState
reads, and ensure callers of the IPC handler are updated to await the async
refresh or read the cached value as appropriate. Ensure you modify function
names refreshLaunchdUnit and getRuntimeState and the IPC handler that calls them
when applying the fix.
- Around line 851-863: Replace the fixed 1-second setTimeout after
launchdManager.startService with an active polling loop: repeatedly call
refreshLaunchdUnit(record) (or directly query the manager via a
getServiceStatus/getService method if available) with a short interval and
exponential backoff until record.phase === ("running" as
RuntimeUnitRecord["phase"]) or a configurable timeout elapses; then call
logStateChange with reasonCode "start_succeeded" or "start_failed" based on the
final state. Ensure the loop aborts on errors, respects a max timeout, and does
not block the event loop (use await with sleep intervals). Keep the existing
symbols: launchdManager.startService, refreshLaunchdUnit, logStateChange, and
the RuntimeUnitRecord["phase"] check.
- Around line 740-805: tailLaunchdLogs currently skips reading when fileSize <=
prevOffset which misses the truncation/rotation case; change the logic in
tailLaunchdLogs (where prevOffset is read from this.launchdLogOffsets) so that
if fileSize < prevOffset you treat the file as truncated: reset prevOffset to 0
(and update this.launchdLogOffsets.set(logFile.path, 0) if desired) and continue
processing from the start of the file, whereas only when fileSize === prevOffset
should you continue (no new data). Adjust the readStart computation to use the
updated prevOffset so truncation will be handled correctly and you still apply
the existing maxRead cap.

In `@apps/desktop/main/runtime/types.ts`:
- Around line 23-26: Refine the runtime config types so that when launchStrategy
is "launchd" the launchd-specific fields are required: replace the single
interface with a discriminated union keyed on launchStrategy (e.g. type
RuntimeConfig = LaunchdRuntimeConfig | OtherRuntimeConfig), where
LaunchdRuntimeConfig has launchStrategy: "launchd" and required launchdLabel and
launchdLogDir, and OtherRuntimeConfig has the other strategy literal(s) and
keeps those fields optional; update any references expecting the old type to use
the new RuntimeConfig/discriminant (launchStrategy) for exhaustive checks.

In `@apps/desktop/main/services/launchd-bootstrap.ts`:
- Around line 273-281: The findFreePort function currently probes up to 10 ports
sequentially using detectPortOccupier (which shells out to lsof); to avoid
slowdowns when many ports are occupied, either reduce the max attempts (e.g.,
try 2–3 offsets instead of 10) or replace detectPortOccupier with a socket-bind
check that attempts to listen on the candidate port (using Node's
net.createServer/listen and immediately close on success) and returns free/busy
as a Promise; update findFreePort to call the new/changed helper and fall back
to 0 if none succeed.

In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 211-231: The catch for opts.launchd.bootoutService is using
console.warn here but elsewhere (window close handler) bootoutService errors use
console.error; update the catch for bootoutService to use console.error (and
optionally change the waitForExit catch to console.error as well) inside the
decision === "quit-completely" block so error severity is consistent; target the
catch blocks around opts.launchd.bootoutService and opts.launchd.waitForExit to
make this change.

In `@apps/desktop/main/services/state-migration.ts`:
- Around line 158-166: Wrap the per-file copy inside a try/catch so a single
failing session file doesn't abort the whole migration and so you can log which
file failed; specifically, in the loop that iterates sessionFiles and calls
cpSync(sourceFile, targetFile) (using sourceSessionsDir, targetSessionsDir, and
updating count), catch errors from cpSync, increment no count for failed files,
and call the module's logger (or console.error) with the file path and the
caught error to aid debugging while continuing to the next file.
- Around line 191-196: safeReaddir currently swallows all exceptions which can
hide issues (e.g., permission errors); modify the function (safeReaddir) to
catch the error into a variable (catch (err)) and only silently return [] for
expected cases like ENOENT/ENOENT-like "not found" while logging unexpected
errors (use console.warn/console.error or the app logger) with context including
the dir and the error before returning []; keep the existing readdirSync usage
and dotfile filter but ensure unexpected errors are surfaced in logs instead of
being silently ignored.

In `@tests/desktop/launchd-bootstrap.test.ts`:
- Around line 57-61: The test mock for startEmbeddedWebServer is missing the
required port property from the EmbeddedWebServer interface; update the vi.mock
for "../../apps/desktop/main/services/embedded-web-server" so the mocked
startEmbeddedWebServer (used in launchd-bootstrap.test.ts) resolves to an object
that includes both port: <number> (use the same default/effective port your
tests expect) and close: fn(), e.g. return { port: <expectedPort>, close:
fn().mockResolvedValue(undefined) }; ensure the mocked shape matches the
EmbeddedWebServer interface so port resolution logic in code under test behaves
the same as production.

In `@tests/desktop/launchd-manager-ops.test.ts`:
- Line 179: Remove the unused debug variable _callCount: delete its declaration
(let _callCount = 0;) and remove any increments or references to _callCount
within the test (e.g., places that do _callCount++), leaving the rest of the
test logic intact; this cleans up the test file and eliminates an unused
variable warning.
- Around line 73-77: Save the original process.platform to a variable and
restore it after each test: declare a variable (e.g., let originalPlatform:
NodeJS.Platform) in the describe scope, set originalPlatform = process.platform
in beforeEach before you call Object.defineProperty, and add an afterEach that
restores process.platform using Object.defineProperty(process, "platform", {
value: originalPlatform }); keep the existing vi.clearAllMocks() behavior.

In `@tests/desktop/plist-generator.test.ts`:
- Around line 20-27: The test's mockEnv now includes new controller/openclaw
keys (webUrl, openclawSkillsDir, skillhubStaticSkillsDir, platformTemplatesDir,
openclawBinPath, openclawExtensionsDir, skillNodePath, openclawTmpDir) but the
plist-generator test doesn't assert they appear in the generated plist; update
the test that consumes mockEnv (the plist generator test) to add assertions that
the generated plist output contains each of those keys and that their values
match mockEnv (e.g., assert presence of "webUrl" and its value,
"openclawSkillsDir", etc.), using the same generator invocation currently in the
test to produce the plist string/object.

In `@tests/desktop/state-migration.test.ts`:
- Around line 25-41: The temp parent created by mkdtempSync in beforeEach isn't
removed; change the setup to store that parent path in a scoped variable (e.g.,
base or baseDir) accessible to afterEach, and in afterEach call rmSync on that
parent (with { recursive: true, force: true }) in addition to removing sourceDir
and targetDir; update beforeEach/afterEach to reference the same variable names
(created in beforeEach and cleaned in afterEach) so the mkdtempSync parent
folder is removed after each test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0356839c-242b-400f-9d17-c56e2d0aa708

📥 Commits

Reviewing files that changed from the base of the PR and between 8129ff5 and 1a682e1.

📒 Files selected for processing (21)
  • AGENTS.md
  • apps/desktop/main/index.ts
  • apps/desktop/main/ipc.ts
  • apps/desktop/main/runtime/daemon-supervisor.ts
  • apps/desktop/main/runtime/types.ts
  • apps/desktop/main/services/embedded-web-server.ts
  • apps/desktop/main/services/launchd-bootstrap.ts
  • apps/desktop/main/services/launchd-manager.ts
  • apps/desktop/main/services/plist-generator.ts
  • apps/desktop/main/services/quit-handler.ts
  • apps/desktop/main/services/state-migration.ts
  • apps/desktop/package.json
  • apps/desktop/shared/host.ts
  • apps/desktop/src/components/runtime-unit-card.tsx
  • apps/desktop/src/hooks/use-runtime-state.ts
  • apps/desktop/src/main.tsx
  • tests/desktop/launchd-bootstrap.test.ts
  • tests/desktop/launchd-manager-ops.test.ts
  • tests/desktop/launchd-startup-scenarios.test.ts
  • tests/desktop/plist-generator.test.ts
  • tests/desktop/state-migration.test.ts

Comment thread AGENTS.md Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 72-73: The reset instructions in AGENTS.md currently remove only
`.tmp/desktop/` but omit the new dev launchd state directory `.tmp/launchd/`,
which can leave stale launchd plists/ports; update the reset guidance to also
remove `.tmp/launchd/` for dev mode and clarify packaged-mode paths by listing
`~/.nexu/` for state, `~/Library/LaunchAgents/` for plists, and `~/.nexu/logs/`
for logs (also update the corresponding lines referenced around the earlier
packaged-mode notes on lines ~85-86 to match this wording).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 639a5e97-bf96-48c9-b951-9503b00a713b

📥 Commits

Reviewing files that changed from the base of the PR and between 1a682e1 and 7753224.

📒 Files selected for processing (1)
  • AGENTS.md

Comment thread AGENTS.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants