fix(daemon): detect system-scope systemd gateway units on Linux (#87577)#87618
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 1:51 PM ET / 17:51 UTC. Summary PR surface: Source +144, Tests +332. Total +476 across 5 files. Reproducibility: yes. source-reproducible: the linked issue describes a system-level systemd unit, and current main only checks the user unit path and uses systemctl --user before falling back to SIGTERM/SIGUSR1. I did not run a live systemd repro in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this PR after a maintainer explicitly accepts the fail-closed non-root system-scope policy, then close the linked bug and supersede narrower open attempts that solve only part of the same systemd-scope problem. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible: the linked issue describes a system-level systemd unit, and current main only checks the user unit path and uses systemctl --user before falling back to SIGTERM/SIGUSR1. I did not run a live systemd repro in this read-only review. Is this the best way to solve the issue? Yes, this looks like the best bounded fix location: systemd ownership belongs in the daemon service adapter and lifecycle fallback, and the PR reuses the existing marker scanner instead of adding a second detector. The unresolved part is the maintainer policy choice for non-root system-scope control. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 058152cf6990. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +144, Tests +332. Total +476 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg: ✨ hatched 🥚 common Neon Review Wisp. Rarity: 🥚 common. Trait: sparkles near resolved comments. DetailsShare on X: post this hatch
About:
|
…nclaw#87577) Extends findInstalledSystemdGatewayScope to fall back to marker-based discovery via findSystemGatewayServices when the canonical unit file is absent, so operators who install the gateway under a non-canonical name (e.g. /etc/systemd/system/openclaw.service in the linked reproducer) get the same system-scope routing. InstalledSystemdGatewayScope now carries the discovered unitName so is-enabled / show / restart / stop all target the actual installed unit instead of the canonical name. Addresses clawsweeper review on openclaw#87618.
|
@clawsweeper re-review Addressed:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Note: |
|
Rank-up actions: [P2] Non-root sudo-guidance behavior — maintainer acceptance request: this PR intentionally changes Linux system-scope |
…nclaw#87577) Extends findInstalledSystemdGatewayScope to fall back to marker-based discovery via findSystemGatewayServices when the canonical unit file is absent, so operators who install the gateway under a non-canonical name (e.g. /etc/systemd/system/openclaw.service in the linked reproducer) get the same system-scope routing. InstalledSystemdGatewayScope now carries the discovered unitName so is-enabled / show / restart / stop all target the actual installed unit instead of the canonical name. Addresses clawsweeper review on openclaw#87618.
57901d3 to
8d43ff3
Compare
8d43ff3 to
7bd4c57
Compare
db286a6 to
5555f08
Compare
5555f08 to
e2ec15c
Compare
|
@clawsweeper re-review Addressed the P1 (gate the non-root system-scope fail-closed policy at
Local proof (Linux, Node 22.19): Maintainer-owned items unchanged: explicit acceptance of the non-root fail-closed policy, plus the packaged-CLI proof gap and the |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-review Added the real-source terminal proof the last review asked for. The previous driver used copied helpers; this run imports the actual patched source. On a real systemd host (Linux
Full redacted terminal output is in the PR body under "Live re-review proof — lifecycle delegation at PR head Still maintainer-owned (no author repair): explicit acceptance of the non-root fail-closed policy, and the failed outbound-actions check rerun + packaged-CLI integration gap (I can't |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review The last review's remaining proof note was: "exercises actual patched systemd source functions and a copied non-exported lifecycle helper, not a packaged openclaw CLI binary" (option 3: a source-checkout CLI run that reaches the changed lifecycle path on a real systemd host). That's now in the PR body under "Actual patched CLI proof" — I ran the real
So the changed lifecycle path is proven via the actual source-checkout CLI, not just imported functions. Please re-evaluate / clear the lingering Still maintainer-owned: explicit acceptance of the non-root fail-closed policy, and the failed outbound-actions check rerun (no contributor |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
…claw#87577) Teach the Linux gateway daemon to recognize system-scope systemd units in addition to user-scope: the canonical user path, the canonical system path, and any non-canonically named system unit carrying the OpenClaw gateway marker (the reporter's /etc/systemd/system/openclaw.service shape). status, is-enabled, restart, and stop now route through the detected scope and unit name, querying the system manager (no --user) for system units. For a detected system-scope unit, root callers run systemctl <action> directly via the canonical service action; non-root callers fail closed with sudo systemctl guidance naming the real unit instead of signaling a supervisor-owned process. The unmanaged lifecycle fallback now delegates system-scope units to that same canonical path (root -> systemctl, non-root -> sudo guidance) rather than throwing unconditionally, so both code paths share one policy and one hint string and a root operator is never told to sudo a command it can already run. Adds regression coverage for detection, routing, and both root/non-root operator paths in the lifecycle fallback.
|
Land-ready verification for
|
Summary
Fixes #87577 by teaching the Linux gateway service code to recognize system-scope systemd units in three forms — the canonical user-scope path, the canonical system-scope path (
/etc/systemd/system/openclaw-gateway.serviceand siblings), and any non-canonically named system unit that carries the OpenClaw gateway marker (the reporter's/etc/systemd/system/openclaw.serviceshape).openclaw statusnow reports system-managed gateways correctly through the system manager, andopenclaw gateway restartno longer drops to an unmanaged SIGUSR1 against a process the system supervisor owns.Reuses two existing patterns already shipping in main:
findSystemGatewayServicesinsrc/daemon/inspect.ts:357(the marker scanner doctor and inspect already use to find arbitrary OpenClaw-marked system units).src/cli/update-cli/restart-helper.ts:98.Maintainer decision requested (clawsweeper P1)
The non-root stop/restart behavior for a detected system-scope unit is an intentional operator-policy change that clawsweeper flagged as a compatibility risk (root
AGENTS.md: removed fallbacks / fail-closed changes / new operator action are merge-sensitive). Requesting explicit maintainer acceptance of this policy:openclaw gateway stop/restartfell through to unmanaged SIGTERM/SIGUSR1 against whatever PID held the gateway port — including a process systemd owns. That signal-an-unmanaged-pid path is the Bug:openclaw gateway restartandopenclaw statusdo not detect system-level systemd service #87577 bug.systemctl <action> <unit>directly; non-root gets actionablesudo systemctl <action> <unit>guidance that names the real installed unit, and no signal is sent.This matches the operator guidance already shipping for the update-restart path (
src/cli/update-cli/restart-helper.ts:98) and AGENTS.md's "one canonical path, no buggy-fallback compat" rule. The old fallback is the bug, not a shipped contract, so it is removed rather than preserved. If a maintainer would instead prefer (a) silent sudo escalation, (b) polkit integration, or (c) keeping the SIGUSR1 fallback for system-scope units behind an explicit flag, say which and I'll revise.Behavior change
src/daemon/systemd.ts— addedfindInstalledSystemdGatewayScope(env)returning{ scope: "user" | "system", unitName, unitPath }. Three call sites now use it:isSystemdServiceEnabled— falls through to system-scopesystemctl is-enabled(no--user) for system units, usinginstalled.unitNameso non-canonical names (likeopenclaw.service) are queried correctly. Before: returnedfalsewhenever the canonical user unit file was absent, regardless of system units.readSystemdServiceRuntime— queries the system manager (no--user) when the unit is system-scope, usinginstalled.unitName. Before: always went through the user manager and surfacedFailed to connect to buson headless servers.runSystemdServiceAction(stop/restart) — when the unit is system-scope and the caller is not root, throwssudo systemctl <action> <installed.unitName>guidance (no silent privilege escalation). When root, runssystemctl <action>directly. Custom-name units get the real unit name in the hint, not a hardcoded canonical one.src/cli/daemon-cli/lifecycle.ts—stopGatewayWithoutServiceManagerandrestartGatewayWithoutServiceManagernow route a detected system-scope unit through the canonicalstopSystemdService/restartSystemdServicehelpers instead of the unmanaged SIGUSR1/SIGTERM fallback. This makes the lifecycle fallback and the systemd action path consistent: root runssystemctl <action> <installed.unitName>directly, non-root gets the same sudo-guidance messagerunSystemdServiceActionalready emits (one canonical hint string, no duplicate). Net effect: the unmanaged signal fallback can never reach a systemd-managed gateway process, and a root operator is never told tosudoa command it can already run. (The earlier revision threw unconditionally here, including for root — that inconsistency is removed.)findInstalledSystemdGatewayScoperesolution order:$HOME/.config/systemd/user/openclaw-gateway[-<profile>].service)./etc/systemd/system/,/usr/lib/systemd/system/,/lib/systemd/system/).findSystemGatewayServices(src/daemon/inspect.ts) which scans the same three system dirs and recognizes any.servicefile carrying theOPENCLAW_SERVICE_MARKER/OPENCLAW_SERVICE_KINDenv markers (or a marker-bearing ExecStart). The returned unit name is used verbatim by every downstreamsystemctlcall.User-scope behavior is unchanged.
Diff size
(
response.tschange: export the existingcreateNullWriterhelper so the lifecycle fallback can discard the delegatedsystemctlstatus line and surface its own message.)Test plan
findInstalledSystemdGatewayScope: user-scope preferred when both exist; canonical/etc/systemd/system/openclaw-gateway.service; canonical/usr/lib/systemd/system/openclaw-gateway.service; returns null when nothing is installed; marker fallback to custom-nameopenclaw.service; legacyclawdbot.servicemarker units ignored.isSystemdServiceEnabledagainst custom-name marker unit.restartSystemdServiceraises sudo guidance with the marker-owned unit's real name; root path runssystemctl restart <unit>directly.readSystemdServiceRuntimequeries the system manager for custom-name marker units.systemctlwith no unmanaged signal; non-root surfacessudo systemctl <action> <installed.unitName>guidance and never signals.src/daemon/systemd.test.ts(84),src/cli/daemon-cli/lifecycle.test.ts(24),src/daemon/inspect.test.ts,src/cli/daemon-cli/lifecycle-core.test.ts,src/cli/daemon-cli/status.gather.test.ts,src/cli/daemon-cli/response.test.ts,src/commands/doctor-gateway-daemon-flow.test.ts,src/cli/update-cli/restart-helper.test.ts— all green.tsgo -p tsconfig.core.json+ core test-types clean; oxlint clean on changed files.Real behavior proof
Behavior addressed: #87577 — on Linux the gateway service code only checked the user-scope systemd unit, so a system-scope unit (including the reporter's custom-named
/etc/systemd/system/openclaw.service) was reported wrong byopenclaw status, andopenclaw gateway restart/stopdropped to an unmanaged SIGUSR1/SIGTERM against the systemd-owned process. After the patch the code detects the system-scope unit (canonical or marker-named), reports it through the system manager, and routes stop/restart through systemd: root runssystemctl <action>, non-root fails closed withsudo systemctl <action> <unit>guidance and never signals.Real environment tested: real systemd host — Linux 6.8.0-71-generic, systemd 255 (255.4-1ubuntu8.14), Node v22.19 — against a transient custom-name unit
/etc/systemd/system/openclaw.servicecarrying the reporter's marker env (OPENCLAW_SERVICE_MARKER=openclaw,OPENCLAW_SERVICE_KIND=gateway) and a dedicated non-rootUser=(uid 1000). The unit wassystemctl started (not enabled) and fully removed after the run.Exact steps or command run after this patch: ran the real
openclawCLI straight from the source checkout at PR heade2ec15c—node scripts/run-node.mjs gateway stopandnode scripts/run-node.mjs gateway restart(this builds and runs the patched TypeScript; not a copy, not a stub) — as root (uid 0) and as the non-root unit user (uid 1000), against the live custom-name unit. Separately exercised the real exported functions (findInstalledSystemdGatewayScope,isSystemdServiceEnabled,readSystemdServiceRuntime,stopSystemdService,restartSystemdService) from an esbuild bundle ofsrc/daemon/systemd.tsat the same head, wrapped by a verbatim copy ofsrc/cli/daemon-cli/lifecycle.ts:handleSystemScopeSystemdGateway.Evidence after fix:
Actual patched CLI from the source checkout (
node scripts/run-node.mjs gateway <action>):Real exported functions from
src/daemon/systemd.ts(esbuild bundle of the actual source, same host):{ result: "restarted" }; MainPID 178463 -> 178808… sudo systemctl restart openclaw.service; MainPID unchanged, no signal{ result: "stopped" }; active -> inactive… sudo systemctl stop openclaw.service; stayed active, no signalObserved result after fix: the system-scope custom-name unit is detected via the marker fallback (
{ scope: "system", unitName: "openclaw.service" }), status is read through the system manager (no--user), and stop/restart route through systemd — root drives a realsystemctlaction (stop → inactive, restart → new MainPID), while non-root fails closed with the real-unitsudo systemctl <action> openclaw.servicehint and never signals the systemd-owned process. The only non-patch noise is the post-restart health check timing out because the stand-in listener is a plain HTTP server rather than a real gateway. User-scope and canonical-name deployments are unchanged (existing tests stay green).What was not tested: a fully packaged
openclawrelease binary (as opposed to the source checkout used above). The published 2026.5.26 tarball ships a brokennpm-shrinkwrap.json(omitsjson5,tslog, …) so it cannot materialize workingnode_modules/; the source-checkout CLI run above reaches the identical changedsrc/cli/daemon-cli/lifecycle.tspath. Happy to rerun against a maintainer-built tarball.Refs
openclaw gateway restartandopenclaw statusdo not detect system-level systemd service #87577openclaw.serviceshape; non-root system-scope fail-closed policy gated for maintainer acceptance; real source-checkout CLI proof on a live systemd host).src/daemon/inspect.ts:357(findSystemGatewayServices) and the sudo-guidance shape fromsrc/cli/update-cli/restart-helper.ts:98.