Skip to content

Daemon: handle degraded systemd status checks#39325

Merged
vincentkoc merged 3 commits intomainfrom
vincentkoc-code/systemd-status-degraded-session
Mar 8, 2026
Merged

Daemon: handle degraded systemd status checks#39325
vincentkoc merged 3 commits intomainfrom
vincentkoc-code/systemd-status-degraded-session

Conversation

@vincentkoc
Copy link
Member

Summary

  • Problem: degraded systemctl --user status results were still treated as unavailable on main, and status surfaces still reported externally managed running services as not installed.
  • Why it matters: Linux users with failed user units can lose daemon lifecycle/status behavior even though systemd is reachable, and headless/system-level users get misleading status output.
  • What changed: isSystemdUserServiceAvailable() and assertSystemdAvailable() now distinguish degraded sessions from truly unavailable user-bus/systemd cases; status summary logic now reports running unmanaged services as running (externally managed) instead of not installed; added regression coverage and changelog note.
  • What did NOT change: no automatic system-scope fallback, no change to install/onboarding probe scoping, and truly unavailable user-bus cases still error early.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • openclaw status / openclaw status --all no longer claim systemd not installed when the gateway is running under an externally managed service.
  • Linux daemon flows now treat degraded-but-reachable systemctl --user status results as available, while still surfacing real user-bus unavailability early.

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS host, repository test environment
  • Runtime/container: Node 22 + pnpm
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): N/A

Steps

  1. Run pnpm vitest src/daemon/systemd.test.ts src/commands/status.service-summary.test.ts src/commands/configure.daemon.test.ts src/cli/daemon-cli/install.test.ts src/infra/wsl.test.ts src/daemon/systemd-hints.test.ts.
  2. Run pnpm build.
  3. Verify degraded systemctl --user status is treated as available and externally managed running services are not shown as not installed.

Expected

  • Degraded user-systemd sessions remain usable.
  • Truly unavailable user-bus cases still fail early.
  • Status surfaces distinguish unmanaged running services from missing services.

Actual

  • 69 targeted tests passed.
  • pnpm build completed successfully locally.
  • The new status helper reports externally managed running services as running (externally managed).

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: degraded systemctl --user status; unavailable user-bus errors; service stop guard behavior; externally managed running service summary detection.
  • Edge cases checked: no-service false path, OpenClaw-managed service path, install/onboarding non-fatal probe path remains unchanged.
  • What you did not verify: live EC2/system-level systemd deployment in this run.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR.
  • Files/config to restore: src/daemon/systemd.ts, src/commands/status*.ts, and src/commands/status.service-summary.ts.
  • Known bad symptoms reviewers should watch for: degraded sessions being misreported as unavailable again, or unmanaged running services still showing not installed.

Risks and Mitigations

  • Risk: broadening degraded-session detection could hide a real systemd failure.
    • Mitigation: only known unavailable patterns still fail early; the new tests cover both degraded and truly unavailable cases.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR fixes two related daemon lifecycle bugs on Linux: (1) isSystemdUserServiceAvailable() and assertSystemdAvailable() previously treated any non-zero systemctl --user status exit code as "unavailable", which caused degraded-but-reachable sessions (exit 1 with output like "degraded") to block all daemon operations; (2) status surfaces reported externally managed, running services as not installed because installed was defined solely as command != null.

Key changes:

  • New isSystemdUserScopeUnavailable() helper centralises known unavailability patterns (missing binary, bus errors, not booted, not supported); non-matching non-zero exits are now treated as degraded-but-available.
  • New readServiceStatusSummary() shared helper consolidates service status logic across status, status --all, and daemon summary modules; introduces managedByOpenClaw and externallyManaged flags.
  • installed is broadened to managedByOpenClaw || loaded || externallyManaged; status surfaces use managedByOpenClaw (not installed) to decide whether to show the "installed ·" prefix.
  • Regression tests added for degraded sessions, user-bus unavailability, and externally-managed service detection.

Key issue identified:

  • The new installed = managedByOpenClaw || loaded || externallyManaged formula marks a unit as installed whenever it is loaded (enabled) in systemd, even when it is stopped and not managed by OpenClaw. This edge case lacks test coverage and could surface confusing output for units enabled by third parties.

The core bug fixes are well-motivated and covered by targeted tests. However, the broadened installed definition introduces an untested edge case that could change user-visible output for stopped, enabled systemd units that are neither OpenClaw-managed nor externally running.

Confidence Score: 4/5

  • Safe to merge with attention to the untested edge case for loaded-but-not-running, unmanaged systemd units.
  • The PR's core logic fixes are sound and well-tested. Degraded session detection works correctly, and externally managed services are properly identified. The only concern is an edge case in the installed formula where stopped, enabled systemd units that are neither OpenClaw-managed nor currently running could report as installed. This is a potential user experience issue rather than a runtime bug, and the behavior is deterministic and testable. No breaking changes or critical bugs identified.
  • src/commands/status.service-summary.ts — the installed formula and the missing test for the loaded-but-not-running, unmanaged-service case.

Last reviewed commit: af55ca1

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@vincentkoc vincentkoc merged commit 556a74d into main Mar 8, 2026
13 checks passed
@vincentkoc vincentkoc deleted the vincentkoc-code/systemd-status-degraded-session branch March 8, 2026 01:30
vincentkoc added a commit to BryanTegomoh/openclaw-fork that referenced this pull request Mar 8, 2026
* Daemon: handle degraded systemd status checks

* Changelog: note systemd status handling

* Update src/commands/status.service-summary.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
ziomancer pushed a commit to ziomancer/openclaw that referenced this pull request Mar 8, 2026
* Daemon: handle degraded systemd status checks

* Changelog: note systemd status handling

* Update src/commands/status.service-summary.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
openperf pushed a commit to openperf/moltbot that referenced this pull request Mar 8, 2026
* Daemon: handle degraded systemd status checks

* Changelog: note systemd status handling

* Update src/commands/status.service-summary.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
mcaxtr pushed a commit to mcaxtr/openclaw that referenced this pull request Mar 8, 2026
* Daemon: handle degraded systemd status checks

* Changelog: note systemd status handling

* Update src/commands/status.service-summary.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Saitop pushed a commit to NomiciAI/openclaw that referenced this pull request Mar 8, 2026
* Daemon: handle degraded systemd status checks

* Changelog: note systemd status handling

* Update src/commands/status.service-summary.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
GordonSH-oss pushed a commit to GordonSH-oss/openclaw that referenced this pull request Mar 9, 2026
* Daemon: handle degraded systemd status checks

* Changelog: note systemd status handling

* Update src/commands/status.service-summary.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations gateway Gateway runtime maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: openclaw status incorrectly reports "systemd not installed" when gateway is running via systemd

1 participant