Skip to content

fix(desktop): pass --port to openclaw gateway and detect port conflicts#700

Merged
lefarcen merged 19 commits intorelease/v0.1.8from
hotfix/v0.1.8-port-conflict
Mar 31, 2026
Merged

fix(desktop): pass --port to openclaw gateway and detect port conflicts#700
lefarcen merged 19 commits intorelease/v0.1.8from
hotfix/v0.1.8-port-conflict

Conversation

@lefarcen
Copy link
Copy Markdown
Collaborator

@lefarcen lefarcen commented Mar 31, 2026

What

Fix openclaw port allocation so Nexu can coexist with other OpenClaw services (ClawX, global openclaw install, etc).

Why

The openclaw launchd plist never included --port in ProgramArguments (missing since initial implementation in #405). This meant:

  1. openclaw always bound to hardcoded default 18789 — even when findFreePort detected a conflict and allocated a different port, the openclaw process ignored it
  2. If another service occupied 18789 (ClawX, global openclaw install), our openclaw crashed on bind (EADDRINUSE) and entered a crash loop
  3. Controller connected to the wrong openclaw — the one on 18789 had a different auth token → gateway token mismatch → all channels stuck in "connecting"

This affected any user running ClawX or openclaw install alongside Nexu.

How

Three fixes:

  1. Pass --port to openclaw (plist-generator.ts): Add --port ${env.openclawPort} to ProgramArguments so openclaw actually uses the allocated port.

  2. Use net.connect for port detection (launchd-bootstrap.ts): Replace lsof-based detection with TCP probe. macOS hardened runtime blocks packaged Electron apps from seeing other processes' file descriptors via lsof, causing silent detection failures.

  3. Post-launch port theft recovery (launchd-bootstrap.ts): After starting openclaw, verify the port isn't stolen by a competing service. If it is, bootout, find a new port, regenerate plists for both openclaw and controller, and restart.

Also added structured logging via env.log callback so bootstrap diagnostics appear in cold-start.log (packaged mode loses console.log).

Verified scenarios

  • Nexu + ClawX on 18789 → Nexu auto-assigns 18790, both coexist
  • Nexu + global openclaw install (KeepAlive=true) → same auto-assignment
  • Nexu alone → uses default 18789 as before
  • No changes to dev mode behavior

Affected areas

  • Desktop app (Electron shell)

Checklist

  • pnpm typecheck passes
  • Startup scenario tests pass (29/29)
  • Manual packaged app testing with port conflict
  • No credentials or tokens in code or logs

Users with 'openclaw install' have a global ai.openclaw.gateway
launchd service on port 18789 with KeepAlive=true. When Nexu also
used 18789, launchd race conditions caused token mismatch or crash
loops. Changed default to 50789 (alongside controller:50800 and
web:50810). findFreePort() still handles further conflicts.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31319d09d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/desktop/shared/runtime-config.ts Outdated
Comment thread apps/desktop/main/index.ts Outdated
After starting the openclaw launchd service, verify the port listener
PID matches our service PID. If a competing service (e.g. global
ai.openclaw.gateway with KeepAlive=true) grabbed the port, bootout
our openclaw, find a new free port, regenerate both openclaw and
controller plists with the new port, and restart both services.
Scenario 27: competing service steals port after launch → bootstrap
detects PID mismatch and reassigns to next free port.
Scenario 28: our openclaw owns the port → no reassignment needed.
- desktop-ci-check.mjs: update health probe URL from 18789 to 50789
- dev-launchd.sh: update OPENCLAW_PORT default to 50789 so cleanup
  no longer kills a user's unrelated global openclaw on 18789
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 31, 2026

Deploying nexu-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 799396b
Status: ✅  Deploy successful!
Preview URL: https://cf1991bc.nexu-docs.pages.dev
Branch Preview URL: https://hotfix-v0-1-8-port-conflict.nexu-docs.pages.dev

View logs

- Replace lsof-based port detection with net.connect — lsof is blocked
  by macOS hardened runtime in packaged Electron apps, silently returning
  empty results even when ports are occupied.
- Add structured logging via env.log callback so bootstrap diagnostics
  appear in cold-start.log (console.log is lost in packaged mode).
- Fix recovery: add waitForExit after bootout before re-bootstrapping
  to prevent launchd race conditions.

Verified: packaged app correctly detects port 50789 occupied by global
openclaw, auto-assigns 50790, both services coexist.
…8789

The openclaw plist ProgramArguments never included --port, so openclaw
always bound to its hardcoded default 18789 regardless of what
findFreePort allocated. When another service (ClawX, global openclaw)
occupied 18789, our openclaw crashed on bind (EADDRINUSE) even though
bootstrap had correctly detected the conflict and assigned a new port.

Now passing --port explicitly in ProgramArguments. Also reverted the
default port back to 18789 since dynamic port allocation handles
conflicts correctly.
@lefarcen lefarcen changed the title fix(desktop): use port 50789 to avoid global openclaw port conflict fix(desktop): pass --port to openclaw gateway and detect port conflicts Mar 31, 2026
…ection

- Reverted all 50789 port changes back to 18789 (scripts, CI, tests)
- Updated detectPortOccupier to use net.createServer().listen() instead
  of net.connect (avoids conflict with probePort mock)
- Updated plist-generator tests for --port in ProgramArguments
- Updated port conflict scenario tests (15/16/27/28) to mock
  createServer instead of lsof/createConnection
- All 624 tests pass
Simulates a global openclaw (or ClawX) occupying port 18789 before
Nexu launches. Verifies:
- Nexu detects the conflict and auto-assigns an alternative port
- Controller comes up healthy
- OpenClaw runs on 18790+ instead of crashing
- The blocker on 18789 is NOT killed (coexistence)
Green card on pass, red card on failure. Shows test mode, source,
channel, and links to the CI run for details.
Only sends on failure (not success). Card includes:
- Trigger source: PR number, branch, commit SHA
- Who triggered it
- Test mode/source/channel
- Link to CI logs
When launchd has stale state for a service label (e.g. after repeated
bootout/bootstrap during port conflict recovery), bootstrap fails with
'Input/output error (code 5)'. Now detects this error, bootout to
clear the stale registration, waits 1s, and retries once.
1. Recovery uses bootoutAndWaitForExit (captures PID before bootout)
   instead of separate bootout + waitForExit without knownPid.

2. Attach path adds token validation: checks launchd service env
   OPENCLAW_GATEWAY_TOKEN matches expected token before attaching.
   Prevents attaching to a global openclaw or ClawX on the same port.

3. E2E openclaw port conflict: controller readiness is now a hard
   fail. Port discovery uses runtime-ports.json instead of hardcoded
   range, so any valid auto-assigned port is accepted.
Scenario: start app normally on 18789, kill openclaw, occupy 18789
with a blocker, force-quit and re-launch. Verifies the app detects
the stolen port on cold start, auto-assigns a new port, and recovers.

This covers the post-launch recovery path that unit tests can't
easily simulate (requires real launchd timing).
The openclaw port may be auto-assigned to 18790+ when 18789 is
occupied. Updated:
- packaged-e2e.mjs: read openclawPort from controller readiness
- run-e2e.sh: read from runtime-ports.json, scan port range for
  diagnostics and port-free checks
@lefarcen lefarcen merged commit 92105f1 into release/v0.1.8 Mar 31, 2026
11 of 12 checks passed
@lefarcen lefarcen mentioned this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants