Summary
openclaw doctor --fix aborts with SIGTERM when the gateway service is running, instead of either completing safe-live fixes or cleanly skipping them.
Observed multiple times in the same session:
- Gateway active under systemd user supervision (
systemctl --user status openclaw-gateway.service)
/healthz returning {"ok":true,"status":"live"}
openclaw doctor --fix --non-interactive emits some output, then exits via SIGTERM before completing all fix actions
- Gateway-port section of doctor output includes:
Health check failed: GatewayTransportError: gateway timeout after 3000ms
Gateway target: ws://127.0.0.1:18789
...
Port 18789 is already in use.
- pid <X> shadeform: openclaw (127.0.0.1:18789)
- Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
The doctor self-confuses: it can't talk to the gateway over WS (3-second timeout), then concludes the port is "already in use" (by the same gateway that just answered /healthz), then SIGTERMs itself.
Environment
- OpenClaw 2026.5.4 (commit 325df3e)
- Linux (Ubuntu 24.04)
- Gateway running under
systemctl --user user supervision
Reproduction
- Have gateway running and healthy:
curl http://127.0.0.1:18789/healthz → {"ok":true,"status":"live"}
- Run
openclaw doctor --fix --non-interactive
- Observe the run aborts via SIGTERM before completing all fixes
Expected behavior
doctor --fix should either:
- (a) preflight-classify each fix as safe-live vs requires-restart, apply the safe-live ones, and skip-with-explanation the others, OR
- (b) refuse to run with
--fix against a live gateway and tell the user to stop it first
What it should NOT do is partially run, then SIGTERM itself mid-stream, leaving the user uncertain about what got applied.
Impact
- Operators can't use
--fix for routine maintenance against a running gateway
- Manual cleanup is required (renaming orphan transcripts, archiving stale agent dirs, etc.)
- Self-termination during incident response increases risk and confusion
Workarounds
- Do cleanup steps manually: archive orphan transcripts via
mv *.jsonl *.deleted.<ts>, move stale agent dirs to a .archived/ sibling, etc.
- Use
openclaw doctor --non-interactive (no --fix) to validate state — that one runs cleanly against a live gateway
Suggested fix
- The gateway-port check should not run
Port already in use warning when the gateway PID match equals the running gateway service PID (it's the same process answering /healthz and listening on 18789 — there's no conflict).
- The 3s WS timeout on a loopback gateway is too short under load; bump to 10s or read from config.
- The SIGTERM appears to come from doctor's own port-bind probe trying to bind 18789 itself to "verify" it's free. That probe should be skipped when a live gateway is detected.
- Consider whether
--fix should ever be allowed against a running gateway — if not, exit cleanly with code 78 ("config valid, but live gateway detected; stop it first") rather than SIGTERM.
Summary
openclaw doctor --fixaborts with SIGTERM when the gateway service is running, instead of either completing safe-live fixes or cleanly skipping them.Observed multiple times in the same session:
systemctl --user status openclaw-gateway.service)/healthzreturning{"ok":true,"status":"live"}openclaw doctor --fix --non-interactiveemits some output, then exits via SIGTERM before completing all fix actionsThe doctor self-confuses: it can't talk to the gateway over WS (3-second timeout), then concludes the port is "already in use" (by the same gateway that just answered
/healthz), then SIGTERMs itself.Environment
systemctl --useruser supervisionReproduction
curl http://127.0.0.1:18789/healthz→{"ok":true,"status":"live"}openclaw doctor --fix --non-interactiveExpected behavior
doctor --fixshould either:--fixagainst a live gateway and tell the user to stop it firstWhat it should NOT do is partially run, then SIGTERM itself mid-stream, leaving the user uncertain about what got applied.
Impact
--fixfor routine maintenance against a running gatewayWorkarounds
mv *.jsonl *.deleted.<ts>, move stale agent dirs to a.archived/sibling, etc.openclaw doctor --non-interactive(no--fix) to validate state — that one runs cleanly against a live gatewaySuggested fix
Port already in usewarning when the gateway PID match equals the running gateway service PID (it's the same process answering/healthzand listening on 18789 — there's no conflict).--fixshould ever be allowed against a running gateway — if not, exit cleanly with code 78 ("config valid, but live gateway detected; stop it first") rather than SIGTERM.