fix(desktop): launchd startup reliability hotfix#544
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef360fe851
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ef360fe to
a13b2c9
Compare
📝 WalkthroughWalkthroughIntegrates launchd-mode for the desktop app, splits persistent vs packaged runtime state between Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Renderer as Renderer
participant IPC as IPC Handlers
participant Main as Main Process
participant Bootstrap as Launchd Bootstrap
participant Manager as Launchd Manager
participant Orchestrator as Runtime Orchestrator
participant Services as Launchd Services
User->>Main: launch app
Main->>Bootstrap: runLaunchdColdStart()
Bootstrap->>Bootstrap: check legacy state
alt legacy exists
Bootstrap->>Bootstrap: migrateOpenclawState()
end
Bootstrap->>Manager: ensure/install services (generatePlist/findFreePort)
Manager->>Services: launchctl bootstrap / bootout
Bootstrap->>Bootstrap: startEmbeddedWebServer() -> returns effectivePorts
Bootstrap-->>Main: return effectivePorts, isAttach
Main->>Orchestrator: enableLaunchdMode(manager, labels, logDir)
Main->>IPC: registerIpcHandlers(coldStartReady)
IPC-->>Renderer: coldStartReady resolves
Renderer->>IPC: host:invoke "env:get-runtime-config"
IPC->>IPC: await coldStartReady
IPC-->>Renderer: runtimeConfig (final ports)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
2c8f101 to
3962b61
Compare
…ce, stale cleanup Root causes fixed: - Plist config drift: after app upgrade, launchd kept using old plist with stale paths/env vars. installService now compares content and re-bootstraps when changed. - White screen on startup: renderer fetched runtimeConfig before cold-start updated ports (web port fallback to OS-assigned). IPC handler now gates on coldStartReady promise. - Stop button ineffective: SIGTERM + KeepAlive = instant respawn. Stop now uses bootout (unregisters from launchd), Start re-bootstraps from plist. - Stale plist cleanup: on startup, compares existing plists against freshly generated ones and cleans up mismatches from old installations/versions. Other fixes: - Packaged mode now passes actual nexuHome for NEXU_HOME validation - Services running without runtime-ports.json are torn down cleanly - Dead Electron PID detected via kill(pid,0) → fresh web port used - Web port fallback uses port:0 (OS-assigned) instead of fragile +1 - EmbeddedWebServer exposes actual bound port - effectivePorts always returned (not just on attach) - Removed dead tryAttachToRunningServices code Tests: 14 new startup scenario smoke tests covering all edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3962b61 to
ac311a6
Compare
…p-reliability # Conflicts: # apps/desktop/main/services/launchd-bootstrap.ts # apps/desktop/package.json
Deploying nexu-docs with
|
| Latest commit: |
1a682e1
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://0939c975.nexu-docs.pages.dev |
| Branch Preview URL: | https://hotfix-launchd-startup-relia.nexu-docs.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (15)
apps/desktop/main/runtime/types.ts (1)
23-26: Consider making launchd metadata required whenlaunchStrategyis"launchd".
launchdLabelandlaunchdLogDirare optional for all strategies, so type checks won’t prevent incomplete launchd manifests. A discriminated union would catch this earlier.♻️ Suggested typing refinement
export type RuntimeUnitManifest = { id: RuntimeUnitId; label: string; kind: RuntimeUnitKind; launchStrategy: RuntimeUnitLaunchStrategy; @@ - launchdLabel?: string; - launchdLogDir?: string; + launchdLabel?: string; + launchdLogDir?: string; @@ }; + +export type RuntimeUnitManifestTyped = + | (RuntimeUnitManifest & { + launchStrategy: "launchd"; + launchdLabel: string; + launchdLogDir: string; + }) + | (RuntimeUnitManifest & { + launchStrategy: Exclude<RuntimeUnitLaunchStrategy, "launchd">; + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/runtime/types.ts` around lines 23 - 26, Refine the runtime config types so that when launchStrategy is "launchd" the launchd-specific fields are required: replace the single interface with a discriminated union keyed on launchStrategy (e.g. type RuntimeConfig = LaunchdRuntimeConfig | OtherRuntimeConfig), where LaunchdRuntimeConfig has launchStrategy: "launchd" and required launchdLabel and launchdLogDir, and OtherRuntimeConfig has the other strategy literal(s) and keeps those fields optional; update any references expecting the old type to use the new RuntimeConfig/discriminant (launchStrategy) for exhaustive checks.tests/desktop/plist-generator.test.ts (1)
20-27: Add assertions for the newly introduced plist environment keys.
mockEnvnow includes new controller/openclaw path fields, but the tests don’t verify those keys are present in generated plist output yet.🧪 Suggested assertion additions
const plist = generatePlist("controller", mockEnv); @@ expect(plist).toContain("<key>PORT</key>"); expect(plist).toContain("<string>50800</string>"); + expect(plist).toContain("<key>WEB_URL</key>"); + expect(plist).toContain("<string>http://127.0.0.1:50801</string>"); + expect(plist).toContain("<key>OPENCLAW_SKILLS_DIR</key>"); + expect(plist).toContain("<key>SKILLHUB_STATIC_SKILLS_DIR</key>"); + expect(plist).toContain("<key>PLATFORM_TEMPLATES_DIR</key>"); + expect(plist).toContain("<key>OPENCLAW_BIN</key>"); + expect(plist).toContain("<key>OPENCLAW_EXTENSIONS_DIR</key>"); + expect(plist).toContain("<key>SKILL_NODE_PATH</key>"); + expect(plist).toContain("<key>OPENCLAW_TMP_DIR</key>");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/desktop/plist-generator.test.ts` around lines 20 - 27, The test's mockEnv now includes new controller/openclaw keys (webUrl, openclawSkillsDir, skillhubStaticSkillsDir, platformTemplatesDir, openclawBinPath, openclawExtensionsDir, skillNodePath, openclawTmpDir) but the plist-generator test doesn't assert they appear in the generated plist; update the test that consumes mockEnv (the plist generator test) to add assertions that the generated plist output contains each of those keys and that their values match mockEnv (e.g., assert presence of "webUrl" and its value, "openclawSkillsDir", etc.), using the same generator invocation currently in the test to produce the plist string/object.tests/desktop/state-migration.test.ts (1)
25-41: Minor: Parent temp directory not cleaned up.The
basedirectory created bymkdtempSync(e.g.,/tmp/nexu-migration-XXXXXX/) isn't tracked for cleanup inafterEach. OnlysourceDirandtargetDir(subdirectories) are deleted, leaving the empty parent behind. Over many test runs this accumulates garbage in/tmp.♻️ Suggested fix to clean up parent directory
+let testBaseDir: string; let sourceDir: string; let targetDir: string; const logMessages: string[] = []; const log = (msg: string) => logMessages.push(msg); beforeEach(() => { - const base = mkdtempSync(join(tmpdir(), "nexu-migration-")); - sourceDir = join(base, "source"); - targetDir = join(base, "target"); + testBaseDir = mkdtempSync(join(tmpdir(), "nexu-migration-")); + sourceDir = join(testBaseDir, "source"); + targetDir = join(testBaseDir, "target"); mkdirSync(sourceDir, { recursive: true }); mkdirSync(targetDir, { recursive: true }); logMessages.length = 0; }); afterEach(() => { try { - rmSync(sourceDir, { recursive: true, force: true }); - rmSync(targetDir, { recursive: true, force: true }); + rmSync(testBaseDir, { recursive: true, force: true }); } catch { // cleanup best effort } });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/desktop/state-migration.test.ts` around lines 25 - 41, The temp parent created by mkdtempSync in beforeEach isn't removed; change the setup to store that parent path in a scoped variable (e.g., base or baseDir) accessible to afterEach, and in afterEach call rmSync on that parent (with { recursive: true, force: true }) in addition to removing sourceDir and targetDir; update beforeEach/afterEach to reference the same variable names (created in beforeEach and cleaned in afterEach) so the mkdtempSync parent folder is removed after each test.tests/desktop/launchd-bootstrap.test.ts (1)
57-61: Mock missingportproperty that production code expects.The
EmbeddedWebServerinterface (in embedded-web-server.ts) now includesport: number, but this mock only returns{ close: ... }. While current tests may pass becauseeffectivePortscomes from env defaults, this could mask bugs if tests later check the actual web server port resolution.♻️ Suggested fix to include port in mock
vi.mock("../../apps/desktop/main/services/embedded-web-server", () => ({ startEmbeddedWebServer: vi.fn().mockResolvedValue({ + port: 50810, close: vi.fn().mockResolvedValue(undefined), }), }));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/desktop/launchd-bootstrap.test.ts` around lines 57 - 61, The test mock for startEmbeddedWebServer is missing the required port property from the EmbeddedWebServer interface; update the vi.mock for "../../apps/desktop/main/services/embedded-web-server" so the mocked startEmbeddedWebServer (used in launchd-bootstrap.test.ts) resolves to an object that includes both port: <number> (use the same default/effective port your tests expect) and close: fn(), e.g. return { port: <expectedPort>, close: fn().mockResolvedValue(undefined) }; ensure the mocked shape matches the EmbeddedWebServer interface so port resolution logic in code under test behaves the same as production.apps/desktop/main/services/quit-handler.ts (1)
211-231: Minor inconsistency in error logging severity.In the window close handler (line 144),
bootoutServiceerrors useconsole.error, but here bothbootoutandwaitForExiterrors useconsole.warn. Consider aligning the severity for consistency.♻️ Suggested alignment
try { await opts.launchd.bootoutService(label); } catch (err) { - console.warn( + console.error( `bootout ${label} failed: ${err instanceof Error ? err.message : String(err)}`, ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/quit-handler.ts` around lines 211 - 231, The catch for opts.launchd.bootoutService is using console.warn here but elsewhere (window close handler) bootoutService errors use console.error; update the catch for bootoutService to use console.error (and optionally change the waitForExit catch to console.error as well) inside the decision === "quit-completely" block so error severity is consistent; target the catch blocks around opts.launchd.bootoutService and opts.launchd.waitForExit to make this change.tests/desktop/launchd-manager-ops.test.ts (2)
179-179: Remove unused variable_callCount.The variable
_callCountis declared and incremented but never used. This appears to be debug remnant.♻️ Suggested fix
- let _callCount = 0; mockExecFile.mockImplementation( ( _cmd: string, args: string[], callback: ( error: Error | null, result: { stdout: string; stderr: string }, ) => void, ) => { - _callCount++; if (args.includes("print")) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/desktop/launchd-manager-ops.test.ts` at line 179, Remove the unused debug variable _callCount: delete its declaration (let _callCount = 0;) and remove any increments or references to _callCount within the test (e.g., places that do _callCount++), leaving the rest of the test logic intact; this cleans up the test file and eliminates an unused variable warning.
73-77: Consider restoringprocess.platformafter tests.Modifying
process.platformviaObject.definePropertyinbeforeEachwithout restoring the original value inafterEachcould leak state to other test files if running in the same process. While Vitest typically isolates test files, it's safer to capture and restore:♻️ Suggested fix
describe("LaunchdManager", () => { + const originalPlatform = process.platform; + beforeEach(() => { vi.clearAllMocks(); Object.defineProperty(process, "platform", { value: "darwin" }); }); + + afterEach(() => { + Object.defineProperty(process, "platform", { value: originalPlatform }); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/desktop/launchd-manager-ops.test.ts` around lines 73 - 77, Save the original process.platform to a variable and restore it after each test: declare a variable (e.g., let originalPlatform: NodeJS.Platform) in the describe scope, set originalPlatform = process.platform in beforeEach before you call Object.defineProperty, and add an afterEach that restores process.platform using Object.defineProperty(process, "platform", { value: originalPlatform }); keep the existing vi.clearAllMocks() behavior.apps/desktop/main/index.ts (2)
585-594: Direct mutation ofruntimeConfigworks but could be fragile.The runtime config is mutated directly after launchd bootstrap to reflect effective ports. This works because the same object reference is passed to
registerIpcHandlers. Consider adding a comment explaining this dependency, or refactoring to make the flow more explicit.📝 Suggested documentation
// Always sync runtimeConfig with actual effective ports — these may differ // from the initial config if ports were recovered from a previous session or // OS-assigned due to conflicts. + // NOTE: This mutates the same object passed to registerIpcHandlers(), which + // is gated by coldStartReady, so IPC callers always see the final values. const { controllerPort, openclawPort, webPort } = launchdResult.effectivePorts;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/index.ts` around lines 585 - 594, The code directly mutates runtimeConfig after reading launchdResult.effectivePorts (controllerPort, openclawPort, webPort) and relies on the fact that the same object reference was previously passed to registerIpcHandlers; make this explicit by either (A) adding a short comment above the mutation referencing registerIpcHandlers and that it relies on object identity, or (B) refactoring to avoid implicit mutation by creating a new config object (copying existing runtimeConfig, updating ports/urls) and passing that updated object into registerIpcHandlers (or calling registerIpcHandlers only after updating runtimeConfig) so the flow is explicit and robust. Ensure references to runtimeConfig, launchdResult.effectivePorts, and registerIpcHandlers are updated accordingly.
509-521: Migration code is synchronous but marked for removal—optional refactor only if proven problematic.The migration correctly runs before services start and uses synchronous operations (
cpSync) to migrate critical agent sessions and identity files. However, this migration is explicitly temporary—the code contains a TODO: "Remove this migration after v0.3+ when all users have upgraded past v0.1.6."Since this is a one-time upgrade path scheduled for removal, converting to async is not urgent. Only refactor if startup responsiveness during v0.1.6→current upgrades becomes a reported issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/index.ts` around lines 509 - 521, The migration block runs synchronously (using migrateOpenclawState) during startup which is intentional and temporary; if you want to avoid blocking startup, make migrateOpenclawState asynchronous and await it here instead of calling it synchronously: change migrateOpenclawState to return a Promise, call await migrateOpenclawState({ targetStateDir: openclawStateDir, sourceStateDir: legacyStateDir, log: (msg) => logColdStart(`state-migration: ${msg}`) }) inside this non-dev branch (ensure the surrounding function is async), otherwise leave the current synchronous behavior as-is until the TODO removal.apps/desktop/main/services/launchd-bootstrap.ts (1)
273-281: Port probing could be slow if many consecutive ports are occupied.
findFreePortprobes up to 10 ports sequentially usingdetectPortOccupier(which spawnslsof). If ports 50800-50809 are all occupied, this spawns 10 lsof processes before falling back to port 0. Consider probing fewer ports or using a faster check:♻️ Reduce probing attempts or use socket-based check
-async function findFreePort(preferred: number): Promise<number> { - for (let offset = 0; offset < 10; offset++) { +async function findFreePort(preferred: number): Promise<number> { + // Try fewer ports — if preferred is taken, likely so are adjacent ones + for (let offset = 0; offset < 3; offset++) { const port = preferred + offset; const occupier = await detectPortOccupier(port); if (!occupier) return port; } - // All 10 ports occupied — let OS assign + // Ports occupied — let OS assign return 0; }The current approach is acceptable for correctness; this is just an optimization suggestion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/launchd-bootstrap.ts` around lines 273 - 281, The findFreePort function currently probes up to 10 ports sequentially using detectPortOccupier (which shells out to lsof); to avoid slowdowns when many ports are occupied, either reduce the max attempts (e.g., try 2–3 offsets instead of 10) or replace detectPortOccupier with a socket-bind check that attempts to listen on the candidate port (using Node's net.createServer/listen and immediately close on success) and returns free/busy as a Promise; update findFreePort to call the new/changed helper and fall back to 0 if none succeed.apps/desktop/main/runtime/daemon-supervisor.ts (3)
670-695:execFileSyncinrefreshLaunchdUnitblocks the main thread.Using
execFileSyncin the refresh path meansgetRuntimeState()blocks the main process while waiting for launchctl. With a 3s timeout, this could cause UI jank if launchctl is slow (e.g., system under load).Consider whether this refresh needs to be synchronous, or if the orchestrator could use cached state with async updates:
However, I understand the sync requirement comes from
getRuntimeState()being called synchronously. The 3s timeout is a reasonable safeguard. If this causes issues in practice, consider debouncing refreshes or making the IPC handler async.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/runtime/daemon-supervisor.ts` around lines 670 - 695, refreshLaunchdUnit currently uses execFileSync which blocks the main thread; change it to a non-blocking implementation: replace execFileSync with the async child_process.execFile (or util.promisify(execFile)) inside refreshLaunchdUnit, return a Promise (make refreshLaunchdUnit async) and propagate async behavior to callers like getRuntimeState or the IPC handler that invokes it; if you cannot change callers to async, implement a cached-state approach where refreshLaunchdUnit performs an async refresh and updates an in-memory cache (e.g., runtimeStateCache) that synchronous getRuntimeState reads, and ensure callers of the IPC handler are updated to await the async refresh or read the cached value as appropriate. Ensure you modify function names refreshLaunchdUnit and getRuntimeState and the IPC handler that calls them when applying the fix.
851-863: Fixed 1-second delay afterstartServicecould be replaced with active polling.The 1-second
setTimeoutdelay before refreshing status is a workaround. A more responsive approach would pollgetServiceStatusin a tight loop with backoff until running or timeout. However, this is a minor optimization.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/runtime/daemon-supervisor.ts` around lines 851 - 863, Replace the fixed 1-second setTimeout after launchdManager.startService with an active polling loop: repeatedly call refreshLaunchdUnit(record) (or directly query the manager via a getServiceStatus/getService method if available) with a short interval and exponential backoff until record.phase === ("running" as RuntimeUnitRecord["phase"]) or a configurable timeout elapses; then call logStateChange with reasonCode "start_succeeded" or "start_failed" based on the final state. Ensure the loop aborts on errors, respects a max timeout, and does not block the event loop (use await with sleep intervals). Keep the existing symbols: launchdManager.startService, refreshLaunchdUnit, logStateChange, and the RuntimeUnitRecord["phase"] check.
740-805: Log tailing implementation is solid but has a subtle edge case.The byte-offset tracking and line-buffering logic is well-implemented. However, if a log file is truncated/rotated (size becomes smaller than
prevOffset),fileSize <= prevOffsetwill skip reading entirely. Consider detecting truncation:♻️ Handle log rotation/truncation
const prevOffset = this.launchdLogOffsets.get(logFile.path) ?? 0; const fileSize = stat.size; - if (fileSize <= prevOffset) continue; + if (fileSize < prevOffset) { + // File was truncated (rotated) — reset to start + this.launchdLogOffsets.set(logFile.path, 0); + } else if (fileSize === prevOffset) { + continue; // No new content + } + const currentOffset = this.launchdLogOffsets.get(logFile.path) ?? 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/runtime/daemon-supervisor.ts` around lines 740 - 805, tailLaunchdLogs currently skips reading when fileSize <= prevOffset which misses the truncation/rotation case; change the logic in tailLaunchdLogs (where prevOffset is read from this.launchdLogOffsets) so that if fileSize < prevOffset you treat the file as truncated: reset prevOffset to 0 (and update this.launchdLogOffsets.set(logFile.path, 0) if desired) and continue processing from the start of the file, whereas only when fileSize === prevOffset should you continue (no new data). Adjust the readStart computation to use the updated prevOffset so truncation will be handled correctly and you still apply the existing maxRead cap.apps/desktop/main/services/state-migration.ts (2)
158-166: Potential issue:cpSyncwithout error handling for individual session files.If copying a single session file fails (e.g., permission denied), the entire migration won't be marked complete. This is probably desired behavior, but consider whether logging the specific failed file would help debugging.
♻️ Suggested improvement for debuggability
for (const file of sessionFiles) { const sourceFile = join(sourceSessionsDir, file); const targetFile = join(targetSessionsDir, file); if (!existsSync(targetFile)) { - cpSync(sourceFile, targetFile); - count++; + try { + cpSync(sourceFile, targetFile); + count++; + } catch (err) { + log(`failed to copy session file ${file}: ${err instanceof Error ? err.message : String(err)}`); + throw err; // Re-throw to prevent stamp from being written + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/state-migration.ts` around lines 158 - 166, Wrap the per-file copy inside a try/catch so a single failing session file doesn't abort the whole migration and so you can log which file failed; specifically, in the loop that iterates sessionFiles and calls cpSync(sourceFile, targetFile) (using sourceSessionsDir, targetSessionsDir, and updating count), catch errors from cpSync, increment no count for failed files, and call the module's logger (or console.error) with the file path and the caught error to aid debugging while continuing to the next file.
191-196:safeReaddirsilently swallows all errors.While filtering out dotfiles is good, catching all errors without logging could hide issues like permission denied. Consider logging unexpected errors:
♻️ Suggested improvement
-function safeReaddir(dir: string): string[] { - try { - return readdirSync(dir).filter((name) => !name.startsWith(".")); - } catch { - return []; - } -} +function safeReaddir(dir: string, log?: (msg: string) => void): string[] { + try { + return readdirSync(dir).filter((name) => !name.startsWith(".")); + } catch (err) { + // ENOENT is expected if dir doesn't exist; log other errors + if (err instanceof Error && "code" in err && err.code !== "ENOENT" && log) { + log(`warning: could not read directory ${dir}: ${err.message}`); + } + return []; + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/desktop/main/services/state-migration.ts` around lines 191 - 196, safeReaddir currently swallows all exceptions which can hide issues (e.g., permission errors); modify the function (safeReaddir) to catch the error into a variable (catch (err)) and only silently return [] for expected cases like ENOENT/ENOENT-like "not found" while logging unexpected errors (use console.warn/console.error or the app logger) with context including the dir and the error before returning []; keep the existing readdirSync usage and dotfile filter but ensure unexpected errors are surfaced in logs instead of being silently ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 86: Update the full-reset instruction in AGENTS.md so it uses the
app-scoped userData path instead of the broader Application Support folder:
replace the mention of `~/Library/Application Support/@nexu/` with
`~/Library/Application Support/@nexu/desktop/` in the sentence that currently
reads “To fully reset local desktop + controller state, stop the stack, remove
`.tmp/desktop/`, then remove `~/.nexu/` and `~/Library/Application
Support/@nexu/`.” Ensure the new wording keeps the same ordering and clarity
about stopping the stack and removing `.tmp/desktop/` and `~/.nexu/`.
---
Nitpick comments:
In `@apps/desktop/main/index.ts`:
- Around line 585-594: The code directly mutates runtimeConfig after reading
launchdResult.effectivePorts (controllerPort, openclawPort, webPort) and relies
on the fact that the same object reference was previously passed to
registerIpcHandlers; make this explicit by either (A) adding a short comment
above the mutation referencing registerIpcHandlers and that it relies on object
identity, or (B) refactoring to avoid implicit mutation by creating a new config
object (copying existing runtimeConfig, updating ports/urls) and passing that
updated object into registerIpcHandlers (or calling registerIpcHandlers only
after updating runtimeConfig) so the flow is explicit and robust. Ensure
references to runtimeConfig, launchdResult.effectivePorts, and
registerIpcHandlers are updated accordingly.
- Around line 509-521: The migration block runs synchronously (using
migrateOpenclawState) during startup which is intentional and temporary; if you
want to avoid blocking startup, make migrateOpenclawState asynchronous and await
it here instead of calling it synchronously: change migrateOpenclawState to
return a Promise, call await migrateOpenclawState({ targetStateDir:
openclawStateDir, sourceStateDir: legacyStateDir, log: (msg) =>
logColdStart(`state-migration: ${msg}`) }) inside this non-dev branch (ensure
the surrounding function is async), otherwise leave the current synchronous
behavior as-is until the TODO removal.
In `@apps/desktop/main/runtime/daemon-supervisor.ts`:
- Around line 670-695: refreshLaunchdUnit currently uses execFileSync which
blocks the main thread; change it to a non-blocking implementation: replace
execFileSync with the async child_process.execFile (or util.promisify(execFile))
inside refreshLaunchdUnit, return a Promise (make refreshLaunchdUnit async) and
propagate async behavior to callers like getRuntimeState or the IPC handler that
invokes it; if you cannot change callers to async, implement a cached-state
approach where refreshLaunchdUnit performs an async refresh and updates an
in-memory cache (e.g., runtimeStateCache) that synchronous getRuntimeState
reads, and ensure callers of the IPC handler are updated to await the async
refresh or read the cached value as appropriate. Ensure you modify function
names refreshLaunchdUnit and getRuntimeState and the IPC handler that calls them
when applying the fix.
- Around line 851-863: Replace the fixed 1-second setTimeout after
launchdManager.startService with an active polling loop: repeatedly call
refreshLaunchdUnit(record) (or directly query the manager via a
getServiceStatus/getService method if available) with a short interval and
exponential backoff until record.phase === ("running" as
RuntimeUnitRecord["phase"]) or a configurable timeout elapses; then call
logStateChange with reasonCode "start_succeeded" or "start_failed" based on the
final state. Ensure the loop aborts on errors, respects a max timeout, and does
not block the event loop (use await with sleep intervals). Keep the existing
symbols: launchdManager.startService, refreshLaunchdUnit, logStateChange, and
the RuntimeUnitRecord["phase"] check.
- Around line 740-805: tailLaunchdLogs currently skips reading when fileSize <=
prevOffset which misses the truncation/rotation case; change the logic in
tailLaunchdLogs (where prevOffset is read from this.launchdLogOffsets) so that
if fileSize < prevOffset you treat the file as truncated: reset prevOffset to 0
(and update this.launchdLogOffsets.set(logFile.path, 0) if desired) and continue
processing from the start of the file, whereas only when fileSize === prevOffset
should you continue (no new data). Adjust the readStart computation to use the
updated prevOffset so truncation will be handled correctly and you still apply
the existing maxRead cap.
In `@apps/desktop/main/runtime/types.ts`:
- Around line 23-26: Refine the runtime config types so that when launchStrategy
is "launchd" the launchd-specific fields are required: replace the single
interface with a discriminated union keyed on launchStrategy (e.g. type
RuntimeConfig = LaunchdRuntimeConfig | OtherRuntimeConfig), where
LaunchdRuntimeConfig has launchStrategy: "launchd" and required launchdLabel and
launchdLogDir, and OtherRuntimeConfig has the other strategy literal(s) and
keeps those fields optional; update any references expecting the old type to use
the new RuntimeConfig/discriminant (launchStrategy) for exhaustive checks.
In `@apps/desktop/main/services/launchd-bootstrap.ts`:
- Around line 273-281: The findFreePort function currently probes up to 10 ports
sequentially using detectPortOccupier (which shells out to lsof); to avoid
slowdowns when many ports are occupied, either reduce the max attempts (e.g.,
try 2–3 offsets instead of 10) or replace detectPortOccupier with a socket-bind
check that attempts to listen on the candidate port (using Node's
net.createServer/listen and immediately close on success) and returns free/busy
as a Promise; update findFreePort to call the new/changed helper and fall back
to 0 if none succeed.
In `@apps/desktop/main/services/quit-handler.ts`:
- Around line 211-231: The catch for opts.launchd.bootoutService is using
console.warn here but elsewhere (window close handler) bootoutService errors use
console.error; update the catch for bootoutService to use console.error (and
optionally change the waitForExit catch to console.error as well) inside the
decision === "quit-completely" block so error severity is consistent; target the
catch blocks around opts.launchd.bootoutService and opts.launchd.waitForExit to
make this change.
In `@apps/desktop/main/services/state-migration.ts`:
- Around line 158-166: Wrap the per-file copy inside a try/catch so a single
failing session file doesn't abort the whole migration and so you can log which
file failed; specifically, in the loop that iterates sessionFiles and calls
cpSync(sourceFile, targetFile) (using sourceSessionsDir, targetSessionsDir, and
updating count), catch errors from cpSync, increment no count for failed files,
and call the module's logger (or console.error) with the file path and the
caught error to aid debugging while continuing to the next file.
- Around line 191-196: safeReaddir currently swallows all exceptions which can
hide issues (e.g., permission errors); modify the function (safeReaddir) to
catch the error into a variable (catch (err)) and only silently return [] for
expected cases like ENOENT/ENOENT-like "not found" while logging unexpected
errors (use console.warn/console.error or the app logger) with context including
the dir and the error before returning []; keep the existing readdirSync usage
and dotfile filter but ensure unexpected errors are surfaced in logs instead of
being silently ignored.
In `@tests/desktop/launchd-bootstrap.test.ts`:
- Around line 57-61: The test mock for startEmbeddedWebServer is missing the
required port property from the EmbeddedWebServer interface; update the vi.mock
for "../../apps/desktop/main/services/embedded-web-server" so the mocked
startEmbeddedWebServer (used in launchd-bootstrap.test.ts) resolves to an object
that includes both port: <number> (use the same default/effective port your
tests expect) and close: fn(), e.g. return { port: <expectedPort>, close:
fn().mockResolvedValue(undefined) }; ensure the mocked shape matches the
EmbeddedWebServer interface so port resolution logic in code under test behaves
the same as production.
In `@tests/desktop/launchd-manager-ops.test.ts`:
- Line 179: Remove the unused debug variable _callCount: delete its declaration
(let _callCount = 0;) and remove any increments or references to _callCount
within the test (e.g., places that do _callCount++), leaving the rest of the
test logic intact; this cleans up the test file and eliminates an unused
variable warning.
- Around line 73-77: Save the original process.platform to a variable and
restore it after each test: declare a variable (e.g., let originalPlatform:
NodeJS.Platform) in the describe scope, set originalPlatform = process.platform
in beforeEach before you call Object.defineProperty, and add an afterEach that
restores process.platform using Object.defineProperty(process, "platform", {
value: originalPlatform }); keep the existing vi.clearAllMocks() behavior.
In `@tests/desktop/plist-generator.test.ts`:
- Around line 20-27: The test's mockEnv now includes new controller/openclaw
keys (webUrl, openclawSkillsDir, skillhubStaticSkillsDir, platformTemplatesDir,
openclawBinPath, openclawExtensionsDir, skillNodePath, openclawTmpDir) but the
plist-generator test doesn't assert they appear in the generated plist; update
the test that consumes mockEnv (the plist generator test) to add assertions that
the generated plist output contains each of those keys and that their values
match mockEnv (e.g., assert presence of "webUrl" and its value,
"openclawSkillsDir", etc.), using the same generator invocation currently in the
test to produce the plist string/object.
In `@tests/desktop/state-migration.test.ts`:
- Around line 25-41: The temp parent created by mkdtempSync in beforeEach isn't
removed; change the setup to store that parent path in a scoped variable (e.g.,
base or baseDir) accessible to afterEach, and in afterEach call rmSync on that
parent (with { recursive: true, force: true }) in addition to removing sourceDir
and targetDir; update beforeEach/afterEach to reference the same variable names
(created in beforeEach and cleaned in afterEach) so the mkdtempSync parent
folder is removed after each test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0356839c-242b-400f-9d17-c56e2d0aa708
📒 Files selected for processing (21)
AGENTS.mdapps/desktop/main/index.tsapps/desktop/main/ipc.tsapps/desktop/main/runtime/daemon-supervisor.tsapps/desktop/main/runtime/types.tsapps/desktop/main/services/embedded-web-server.tsapps/desktop/main/services/launchd-bootstrap.tsapps/desktop/main/services/launchd-manager.tsapps/desktop/main/services/plist-generator.tsapps/desktop/main/services/quit-handler.tsapps/desktop/main/services/state-migration.tsapps/desktop/package.jsonapps/desktop/shared/host.tsapps/desktop/src/components/runtime-unit-card.tsxapps/desktop/src/hooks/use-runtime-state.tsapps/desktop/src/main.tsxtests/desktop/launchd-bootstrap.test.tstests/desktop/launchd-manager-ops.test.tstests/desktop/launchd-startup-scenarios.test.tstests/desktop/plist-generator.test.tstests/desktop/state-migration.test.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 72-73: The reset instructions in AGENTS.md currently remove only
`.tmp/desktop/` but omit the new dev launchd state directory `.tmp/launchd/`,
which can leave stale launchd plists/ports; update the reset guidance to also
remove `.tmp/launchd/` for dev mode and clarify packaged-mode paths by listing
`~/.nexu/` for state, `~/Library/LaunchAgents/` for plists, and `~/.nexu/logs/`
for logs (also update the corresponding lines referenced around the earlier
packaged-mode notes on lines ~85-86 to match this wording).
Relates to #510, #558
Summary
Fixes multiple launchd integration issues causing "app won't open" and "channels stuck on connecting" reports:
installServicenow compares plist content and re-bootstraps when changed (was silently using stale paths/env vars)runtimeConfigbefore cold-start resolved ports; IPC handler now gates oncoldStartReadypromisebootout(unregisters service) instead of SIGTERM; Start re-bootstraps from plistruntime-ports.jsonare torn down cleanly~/.nexufor validation (wasundefined→ skipped)port:0(OS-assigned) instead of fragile+1incrementruntimeConfigis always syncedTest plan
pnpm typecheck— passespnpm test— 176 tests pass (27 files), including 14 new startup scenario smoke testspnpm start— dev mode launches and displays UI correctly (even with packaged app occupying default port)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests