Skip to content

fix(status): detect system-level systemd units via INVOCATION_ID#85151

Open
yiyouguisu wants to merge 1 commit into
openclaw:mainfrom
yiyouguisu:fix/status-deep-systemd-false-positive
Open

fix(status): detect system-level systemd units via INVOCATION_ID#85151
yiyouguisu wants to merge 1 commit into
openclaw:mainfrom
yiyouguisu:fix/status-deep-systemd-false-positive

Conversation

@yiyouguisu

Copy link
Copy Markdown

Summary

When the gateway runs under a system-level systemd unit (not user-level), systemctl --user is unavailable and openclaw status --deep reports "systemd user services unavailable" as a false positive, even though the gateway is healthy.

Fix

Detect the INVOCATION_ID environment variable that systemd sets for managed processes. When present, fall back to system-level systemctl to check the actual service status instead of returning "unknown".

How it works

  1. readSystemdServiceRuntime catches the assertSystemdAvailable error
  2. Checks for INVOCATION_ID env var (set by systemd for managed processes)
  3. If present, tries systemctl show (system-level) instead of systemctl --user
  4. Returns the actual service status from the system-level unit

Before

Gateway service: systemd user services unavailable.
systemd user services are unavailable; install/enable systemd or run the gateway under your supervisor.

After

When running under a system-level unit, the actual service status is shown correctly.

Testing

  • Verified the fix reads system-level systemctl when INVOCATION_ID is set
  • Existing systemd behavior unchanged when not running under a system-level unit

Fixes #85094

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds a system-level systemctl show fallback in readSystemdServiceRuntime when user-systemd availability fails.

Reproducibility: yes. at source level: the linked issue gives a concrete system-level systemd workflow, and current main returns unknown with user-systemd unavailable detail when assertSystemdAvailable fails. I did not run a real OpenClaw system unit in this read-only review.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Summary: The PR is not quality-ready because real behavior proof is missing and the patch has blocking implementation defects.

Rank-up moves:

  • Replace exitCode with the actual code result field and cover that path.
  • Request/check LoadState and preserve current unavailable output when no system unit exists.
  • Post redacted after-fix terminal output or logs from a real system-level OpenClaw unit.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs real behavior proof before merge: The PR body has a testing claim but no redacted terminal output, logs, screenshot, recording, or linked artifact showing after-fix behavior against a real system-level OpenClaw unit; after adding proof, update the PR body to trigger review or ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • The branch will not typecheck as written because execFileUtf8 returns code, not exitCode.
  • A host with no matching system unit can be reported as stopped instead of preserving the current user-systemd unavailable detail because systemctl show returns success for LoadState=not-found.
  • The linked report uses a system unit named openclaw.service; probing only the resolved default openclaw-gateway.service may miss the actual externally managed unit.
  • No redacted terminal output, logs, recording, or linked artifact shows the after-fix openclaw status --deep behavior on a real system-level systemd unit.

Maintainer options:

  1. Fix the systemd fallback before merge (recommended)
    Use the existing code result field, request/check LoadState, discover or probe the actual system-scope OpenClaw unit, and keep the old unavailable detail when no system unit exists.
  2. Pause if system-scope status is a product choice
    If maintainers do not want openclaw status --deep to report externally managed system units, pause this PR and settle the status contract on the linked issue first.

Next step before merge
The PR has concrete code defects and missing real-environment proof, but an external contributor needs to provide proof after rework before this should enter an automated repair lane.

Security
Cleared: The diff adds a systemctl show probe through execFile and does not introduce a concrete security or supply-chain concern.

Review findings

  • [P1] Use the exec result's code field — src/daemon/systemd.ts:1096
  • [P2] Check LoadState before accepting systemd show output — src/daemon/systemd.ts:1094
Review details

Best possible solution:

Make the status path probe the actual discovered system-scope OpenClaw unit, preserve the current unavailable result when no system unit exists, and back it with focused tests plus a real systemd transcript.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: the linked issue gives a concrete system-level systemd workflow, and current main returns unknown with user-systemd unavailable detail when assertSystemdAvailable fails. I did not run a real OpenClaw system unit in this read-only review.

Is this the best way to solve the issue?

No. The durable fix should use the actual exec result contract, check systemd LoadState, and discover/probe the real system-scope gateway unit instead of treating any successful systemctl show output as a valid unit.

Label changes:

  • add merge-risk: 🚨 compatibility: Merging this fallback as written can change existing user-systemd-unavailable diagnostics into a misleading stopped service result on hosts without a matching system unit.

Label justifications:

  • P2: This is a focused gateway diagnostic bug with limited blast radius, but the current PR has blocking correctness defects.
  • merge-risk: 🚨 compatibility: Merging this fallback as written can change existing user-systemd-unavailable diagnostics into a misleading stopped service result on hosts without a matching system unit.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🧂 unranked krab, and The PR is not quality-ready because real behavior proof is missing and the patch has blocking implementation defects.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body has a testing claim but no redacted terminal output, logs, screenshot, recording, or linked artifact showing after-fix behavior against a real system-level OpenClaw unit; after adding proof, update the PR body to trigger review or ask a maintainer to comment @clawsweeper re-review.

Full review comments:

  • [P1] Use the exec result's code field — src/daemon/systemd.ts:1096
    execFileUtf8 returns { stdout, stderr, code }, so systemRes.exitCode is not a valid property. This should fail the TS check before the fallback can be merged; use systemRes.code and add focused coverage for the fallback branch.
    Confidence: 0.98
  • [P2] Check LoadState before accepting systemd show output — src/daemon/systemd.ts:1094
    systemctl show missing.service can exit 0 with LoadState=not-found and ActiveState=inactive. Because this probe does not request or check LoadState, a host without the default system unit can be reported as stopped instead of preserving the current unavailable detail, and it can still miss the linked openclaw.service workflow.
    Confidence: 0.91

Overall correctness: patch is incorrect
Overall confidence: 0.95

Acceptance criteria:

  • node scripts/run-vitest.mjs src/daemon/systemd.test.ts src/commands/status.service-summary.test.ts
  • Redacted live openclaw status --deep or openclaw gateway status --deep output from a system-level systemd OpenClaw unit

What I checked:

  • PR diff adds the fallback under review: Commit 0db2a0a adds the system-level probe in readSystemdServiceRuntime; the success check is on systemRes.exitCode at the changed line. (src/daemon/systemd.ts:1096, 0db2a0a7fd16)
  • Local exec result contract uses code, not exitCode: execFileUtf8 returns ExecResult = { stdout, stderr, code } and resolves code for both success and failure, so the PR's exitCode access is not a valid typed property. (src/daemon/exec-file.ts:3, c8a35c4645dc)
  • Missing system units still produce a successful systemctl show result: On systemd 255, systemctl show definitely-openclaw-missing-85151.service exited 0 with LoadState=not-found and ActiveState=inactive; the PR does not request or check LoadState, so an absent system unit can be reported as stopped.
  • Current main preserves the unavailable detail: Current main returns status: "unknown" with the assertSystemdAvailable error detail when the user systemd manager is unavailable. (src/daemon/systemd.ts:1080, c8a35c4645dc)
  • Current main already has a system-level service discovery surface: findSystemGatewayServices scans /etc/systemd/system, /usr/lib/systemd/system, and /lib/systemd/system, and doctor uses that path to report system-level OpenClaw gateway services when the user service is missing. (src/daemon/inspect.ts:357, c8a35c4645dc)
  • History provenance for the touched area: Available blame in this shallow checkout traces the central systemd runtime and system-service discovery code to commit 01d95b9. (src/daemon/systemd.ts:1077, 01d95b9757a0)

Likely related people:

  • Super Zheng: The available blame/log history attributes the central systemd runtime, service state, and system-level gateway service discovery files to commit 01d95b9. (role: introduced behavior in available history; confidence: medium; commits: 01d95b9757a0; files: src/daemon/systemd.ts, src/daemon/inspect.ts, src/commands/doctor-gateway-services.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against c8a35c4645dc.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

When the gateway runs under a system-level systemd unit (not user-level),
systemctl --user is unavailable and the status check reports
"systemd user services unavailable" as a false positive.

Instead of relying on INVOCATION_ID detection, directly probe
system-level systemctl when user-level fails. If the unit exists
and responds at system level, return the actual status.

Fixes openclaw#85094
@yiyouguisu yiyouguisu force-pushed the fix/status-deep-systemd-false-positive branch from 1ee5ebb to 0db2a0a Compare May 22, 2026 02:07
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 22, 2026
@amknight

Copy link
Copy Markdown
Member

Reviewing this against src/daemon/systemd.ts at head 0db2a0a and src/daemon/exec-file.ts on main:

Blocking — the fallback is dead code. execFileUtf8 returns { stdout, stderr, code: number } (see src/daemon/exec-file.ts:3), but line 1096 reads systemRes.exitCode, which is always undefined. The if (systemRes && systemRes.exitCode === 0) { … } branch never runs, so the system-scope fallback never returns its parsed result — execution falls through to the original { status: "unknown", detail }. As-is, this PR does not change behavior.

Fix: systemRes.exitCode === 0systemRes.code === 0.

High — PR title/body advertise INVOCATION_ID gating that the diff doesn't implement. Either gate the fallback via detectRespawnSupervisor(env, "linux") === "systemd" from src/infra/supervisor-markers.ts (preferred — that helper already checks OPENCLAW_SYSTEMD_UNIT, INVOCATION_ID, SYSTEMD_EXEC_PID, JOURNAL_STREAM), or rewrite the title/body to "unconditionally fall back to system scope on user-scope failure".

High — no tests. src/daemon/systemd.test.ts already mocks execFileUtf8/assertSystemdAvailable; a single regression test there would have caught the .exitCode typo and would lock in INVOCATION_ID gating once added.

Smaller follow-ups in the same patch:

  • .catch(() => null) swallows the system-scope error — surface it in the unknown-fallback detail so the operator isn't shown the stale systemctl --user unavailable: … message.
  • Use the existing execSystemctl(args, env) helper so resolveSystemctlProcessEnv(env) normalization isn't bypassed.
  • The fallback drops Id, KillMode, TasksCurrent, MemoryCurrent and returns no systemd: { unit, … } block — at minimum return systemd: { unit: unitName } so consumers don't see silently degraded shape only on the fallback path.

Scope note. This patches readSystemdServiceRuntime only. isSystemdServiceEnabled, restartSystemdService, and stopSystemdService still hit execSystemctlUser and will throw/silently fail on system-scope installs — the restart half of #87577. Worth deciding whether this PR is scoped to #85094 only, or should also cover restart/stop/is-enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: XS status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

openclaw status --deep reports 'systemd user not installed' false positive when gateway runs under systemctl system

2 participants