fix: exit(0) when port owned by another OpenClaw instance#32897
fix: exit(0) when port owned by another OpenClaw instance#32897velamints2 wants to merge 2 commits into
Conversation
When handlePortError detects that the port is already held by another healthy OpenClaw process (via the existing regex), exit with code 0 instead of 1. On macOS with launchd KeepAlive=true + ThrottleInterval=1, exit(1) causes launchd to immediately respawn the gateway. The new instance finds the port still busy and exits(1) again, creating an infinite crash-loop (observed 4 963 restart cycles in a single gateway.log). exit(0) signals a clean shutdown so the supervisor leaves the process stopped while the healthy instance continues to serve. Relates-to: openclaw#29883
There was a problem hiding this comment.
Pull request overview
Updates port-conflict handling to avoid supervisor restart loops when the port is already owned by another healthy OpenClaw instance, primarily affecting LaunchAgent/systemd-managed gateway processes.
Changes:
- Exit with code
0whenEADDRINUSEis attributed (via regex) to another OpenClaw instance. - Update the existing test to assert the clean-exit behavior for the OpenClaw-owner case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/infra/ports.ts |
Adds runtime.exit(0) in the OpenClaw-owner branch of handlePortError to prevent restart loops. |
src/infra/ports.test.ts |
Adjusts the OpenClaw-owner test case to expect a clean exit code. |
| // Exit cleanly so process supervisors (launchd KeepAlive, | ||
| // systemd Restart=always) do NOT respawn us into an | ||
| // infinite crash-loop. The healthy instance owns the port. | ||
| runtime.exit(0); | ||
| } |
There was a problem hiding this comment.
runtime.exit(0) doesn’t stop control flow here, so if runtime.exit is mocked or implemented as non-terminating, execution will continue to the generic EADDRINUSE footer and call runtime.exit(1) as well (and may log extra messages). Add an explicit early return/throw after runtime.exit(0) (and ideally after runtime.exit(1) in this block) to guarantee only the intended exit path runs regardless of runtime implementation.
| // systemd Restart=always) do NOT respawn us into an | ||
| // infinite crash-loop. The healthy instance owns the port. |
There was a problem hiding this comment.
The comment mentions systemd Restart=always, but Restart=always will restart even on exit code 0. If the intent is to avoid restarts on clean exit, the relevant systemd setting is typically Restart=on-failure (or similar). Update the comment to avoid misleading operators about when exit(0) prevents respawns.
| // systemd Restart=always) do NOT respawn us into an | |
| // infinite crash-loop. The healthy instance owns the port. | |
| // systemd with Restart=on-failure or similar) do NOT respawn | |
| // us into an infinite crash-loop when a healthy instance owns | |
| // the port. Note: with Restart=always, systemd will respawn | |
| // even on exit(0). |
| // When another OpenClaw owns the port, exit cleanly so supervised | ||
| // environments (launchd, systemd) do NOT trigger a restart loop. | ||
| expect(runtime.exit).toHaveBeenCalledWith(0); |
There was a problem hiding this comment.
This test only asserts exit(0) was called, but it will still pass if handlePortError also calls exit(1) afterward (which can happen when runtime.exit is a mock/non-terminating implementation). Strengthen the assertion to ensure the clean-exit path is the only/last exit call (and fix the extra indentation on these lines so they align with the other expects in the test).
| // When another OpenClaw owns the port, exit cleanly so supervised | |
| // environments (launchd, systemd) do NOT trigger a restart loop. | |
| expect(runtime.exit).toHaveBeenCalledWith(0); | |
| // When another OpenClaw owns the port, exit cleanly so supervised | |
| // environments (launchd, systemd) do NOT trigger a restart loop. | |
| expect(runtime.exit).toHaveBeenCalledTimes(1); | |
| expect(runtime.exit).toHaveBeenLastCalledWith(0); |
Greptile SummaryThis PR fixes an infinite restart loop on macOS (launchd) and systemd setups where Key observations:
Confidence Score: 3/5
Last reviewed commit: 5046391 |
| expect(messages.join("\n")).toContain("another OpenClaw instance is already running"); | ||
| // When another OpenClaw owns the port, exit cleanly so supervised | ||
| // environments (launchd, systemd) do NOT trigger a restart loop. | ||
| expect(runtime.exit).toHaveBeenCalledWith(0); |
There was a problem hiding this comment.
The test assertion is incomplete. Because runtime.exit is mocked with vi.fn(), it does not actually terminate execution. After runtime.exit(0) is called on line 68 of ports.ts, control continues and reaches runtime.exit(1) on line 74.
toHaveBeenCalledWith(0) only asserts that exit was called with 0 at some point — it passes even if exit(1) is also called. This allows a regression to slip through: if someone mistakenly removed the exit(0) call, the test would still pass.
Add a complementary assertion to make the test guard the intended behavior:
| expect(runtime.exit).toHaveBeenCalledWith(0); | |
| expect(runtime.exit).toHaveBeenCalledWith(0); | |
| expect(runtime.exit).not.toHaveBeenCalledWith(1); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/ports.test.ts
Line: 65
Comment:
The test assertion is incomplete. Because `runtime.exit` is mocked with `vi.fn()`, it does not actually terminate execution. After `runtime.exit(0)` is called on line 68 of `ports.ts`, control continues and reaches `runtime.exit(1)` on line 74.
`toHaveBeenCalledWith(0)` only asserts that exit was called with `0` at some point — it passes even if `exit(1)` is also called. This allows a regression to slip through: if someone mistakenly removed the `exit(0)` call, the test would still pass.
Add a complementary assertion to make the test guard the intended behavior:
```suggestion
expect(runtime.exit).toHaveBeenCalledWith(0);
expect(runtime.exit).not.toHaveBeenCalledWith(1);
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 504639112a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Exit cleanly so process supervisors (launchd KeepAlive, | ||
| // systemd Restart=always) do NOT respawn us into an | ||
| // infinite crash-loop. The healthy instance owns the port. | ||
| runtime.exit(0); |
There was a problem hiding this comment.
Return immediately after runtime.exit(0)
RuntimeEnv.exit is typed as void, so callers can provide non-terminating implementations (for example test or embedded runtimes); in that case this new runtime.exit(0) call falls through to the unconditional runtime.exit(1) later in handlePortError, which reclassifies the same condition as failure and defeats the clean-exit behavior this commit is introducing. Add an immediate return/throw after the success exit path so it cannot continue into the failure path.
Useful? React with 👍 / 👎.
| // Exit cleanly so process supervisors (launchd KeepAlive, | ||
| // systemd Restart=always) do NOT respawn us into an | ||
| // infinite crash-loop. The healthy instance owns the port. | ||
| runtime.exit(0); |
There was a problem hiding this comment.
Tighten owner check before exiting with success
This branch now returns a success exit code based on a broad substring match (openclaw|src/index.ts|dist/index.js) over the full diagnostics text, so unrelated listeners whose command/path happens to contain these tokens can be misclassified as a healthy OpenClaw instance; that causes exit(0) on a real external port conflict, which can suppress restart-on-failure behavior and leave the gateway down. The success-exit path should rely on a stricter owner signal than this regex.
Useful? React with 👍 / 👎.
…mment - Add `return undefined as never` after `runtime.exit(0)` so non-terminating runtime implementations (mocks, embedded) do not fall through to the generic `exit(1)` path. - Replace `toHaveBeenCalledWith(0)` with `toHaveBeenCalledTimes(1)` + `toHaveBeenLastCalledWith(0)` to ensure the test fails if the early exit(0) is accidentally removed. - Fix comment: systemd `Restart=on-failure` (not `Restart=always`) is the setting where exit(0) prevents respawns. - Fix test indentation to match surrounding code.
|
Thanks for the reviews! I've pushed a follow-up commit (9bedc85) that addresses all feedback: Fixed:
Not addressed in this PR (follow-up):
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bedc85db9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // systemd with Restart=on-failure) do NOT respawn us into | ||
| // an infinite crash-loop. The healthy instance owns the | ||
| // port. Note: systemd Restart=always will still respawn. | ||
| runtime.exit(0); |
There was a problem hiding this comment.
Require a precise OpenClaw owner before exiting successfully
Fresh evidence in this commit is the new runtime.exit(0) success branch, which is currently gated by /openclaw|src\/index\.ts|dist\/index\.js/ against the full diagnostics text; that broad match can classify unrelated listeners (for example, any Node app launched from dist/index.js) as OpenClaw and then return exit code 0 even though the gateway failed to bind. In supervised environments that use restart-on-failure, this suppresses recovery and can leave the gateway down until manual intervention, so the zero-exit path should require a stricter owner signal.
Useful? React with 👍 / 👎.
|
Thanks for the reviews @copilot, @greptile-apps, @chatgpt-codex-connector — all three reviewers flagged the same two issues. Here are the fixes: Fix 1 —
|
|
To use Codex here, create a Codex account and connect to github. |
|
Perfect fixes - both changes directly address the review concerns: Regex tightening: Test assertion: Adding These two lines resolve all flagged issues. A maintainer with push access can apply them directly to your branch. |
|
Follow-up ping on mergeability:
If this looks good, could a maintainer take a final pass for merge? Happy to adjust anything else quickly. |
|
This pull request has been automatically marked as stale due to inactivity. |
|
Closing due to inactivity. |
Problem
When
handlePortErrordetectsEADDRINUSEand identifies the port owner as another OpenClaw instance (via the existing regex on line 59), it still callsruntime.exit(1).On macOS with a LaunchAgent configured as:
exit(1)is treated as an abnormal termination, solaunchdrespawns the gateway within 1 second. The new instance discovers the port is still busy and immediately exits(1) again — creating an infinite crash-loop.Observed impact: 4,963 restart cycles recorded in a single
gateway.logsession (~8 minutes of continuous tool re-registration every ~10 seconds).Fix
When the regex confirms the port owner is another healthy OpenClaw instance, call
runtime.exit(0)instead ofruntime.exit(1). A zero exit code tells launchd (and systemdRestart=on-failure) that the process shut down intentionally, so the supervisor leaves it stopped.Non-OpenClaw port owners still trigger
exit(1)so the user gets prompted to resolve the conflict.Changes
src/infra/ports.ts: Addedruntime.exit(0)inside the OpenClaw regex branch, before the genericexit(1)fallthrough.src/infra/ports.test.ts: Updated existing test to assertexit(0)when port details match another OpenClaw instance.Related