fix(daemon): correct systemd availability checks for degraded user sessions#34551
fix(daemon): correct systemd availability checks for degraded user sessions#34551jnMetaCode wants to merge 3 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c5a735897
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/daemon/systemd.ts
Outdated
| // Non-zero exit without missing indicators means systemd is available | ||
| // but has degraded/failed units — acceptable for service operations. |
There was a problem hiding this comment.
Preserve unavailability errors in assertSystemdAvailable
assertSystemdAvailable() now returns on any non-zero systemctl --user status result unless isSystemctlMissing() matches, so errors like Failed to connect to bus are treated as available. That change is user-visible because callers proceed as if systemd is usable; for example, uninstallSystemdService does not check the disable --now result and can report success after deleting the unit file even though systemd never processed the disable/stop request. Please keep throwing for non-degraded unavailability cases instead of falling through for all non-missing failures.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed — assertSystemdAvailable() now checks for bus-unavailability patterns (failed to connect, not been booted, no such file or directory, not supported) and throws with a descriptive message. See latest commit.
Greptile SummaryThis PR fixes Verified finding: When the systemd user bus is unreachable, Impact: The Confidence Score: 2/5
Last reviewed commit: 9c5a735 |
src/daemon/systemd.ts
Outdated
| @@ -211,7 +213,8 @@ async function assertSystemdAvailable() { | |||
| if (isSystemctlMissing(detail)) { | |||
| throw new Error("systemctl not available; systemd user services are required on Linux."); | |||
| } | |||
| throw new Error(`systemctl --user unavailable: ${detail || "unknown error"}`.trim()); | |||
| // Non-zero exit without missing indicators means systemd is available | |||
| // but has degraded/failed units — acceptable for service operations. | |||
| } | |||
There was a problem hiding this comment.
assertSystemdAvailable() does not catch bus-unavailability errors after this fix.
Compare with isSystemdUserServiceAvailable() (lines 178–205): it explicitly checks for unavailability patterns like "failed to connect" (line 190), "not been booted" (line 193), and "not supported" (line 199), and returns false if any match.
The current implementation only checks isSystemctlMissing(), which catches binary-absence errors (lines 149–160) but NOT bus-unavailability. This means if a system's D-Bus is unreachable (e.g., remote SSH without lingering D-Bus socket), assertSystemdAvailable() silently returns, and subsequent systemctl --user stop/restart calls fail downstream with opaque errors instead of the early, helpful "systemctl not available" message.
Suggested fix: Apply the same unavailability checks:
async function assertSystemdAvailable() {
const res = await execSystemctl(["--user", "status"]);
if (res.code === 0) {
return;
}
const detail = readSystemctlDetail(res);
const detailLower = `${res.stderr} ${res.stdout}`.toLowerCase();
if (
isSystemctlMissing(detail) ||
detailLower.includes("failed to connect") ||
detailLower.includes("not been booted") ||
detailLower.includes("not supported")
) {
throw new Error(`systemctl --user unavailable: ${detail || "unknown error"}`.trim());
}
// Non-zero exit with none of the above patterns → degraded session, still usable.
}Also add a regression test (e.g., in the "systemd service control" suite) that verifies stopSystemdService throws when the bus is unavailable.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/systemd.ts
Line: 207-218
Comment:
`assertSystemdAvailable()` does not catch bus-unavailability errors after this fix.
Compare with `isSystemdUserServiceAvailable()` (lines 178–205): it explicitly checks for unavailability patterns like `"failed to connect"` (line 190), `"not been booted"` (line 193), and `"not supported"` (line 199), and returns `false` if any match.
The current implementation only checks `isSystemctlMissing()`, which catches binary-absence errors (lines 149–160) but NOT bus-unavailability. This means if a system's D-Bus is unreachable (e.g., remote SSH without lingering D-Bus socket), `assertSystemdAvailable()` silently returns, and subsequent `systemctl --user stop/restart` calls fail downstream with opaque errors instead of the early, helpful `"systemctl not available"` message.
**Suggested fix**: Apply the same unavailability checks:
```ts
async function assertSystemdAvailable() {
const res = await execSystemctl(["--user", "status"]);
if (res.code === 0) {
return;
}
const detail = readSystemctlDetail(res);
const detailLower = `${res.stderr} ${res.stdout}`.toLowerCase();
if (
isSystemctlMissing(detail) ||
detailLower.includes("failed to connect") ||
detailLower.includes("not been booted") ||
detailLower.includes("not supported")
) {
throw new Error(`systemctl --user unavailable: ${detail || "unknown error"}`.trim());
}
// Non-zero exit with none of the above patterns → degraded session, still usable.
}
```
Also add a regression test (e.g., in the "systemd service control" suite) that verifies `stopSystemdService` throws when the bus is unavailable.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed — assertSystemdAvailable() now mirrors the unavailability checks from isSystemdUserServiceAvailable(). Both functions check the same set of patterns.
|
Thanks for the review! Good catch — I've added bus-unavailability checks ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ec1023f60
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return false; | ||
| // Non-zero exit without any unavailability indicator means systemd is | ||
| // present but has degraded/failed units — still available for our purposes. | ||
| return true; |
There was a problem hiding this comment.
Treat ENOENT/EACCES as unavailable in user service check
isSystemdUserServiceAvailable() now returns true for any non-zero systemctl --user status result that does not match the hardcoded substrings, but execFileUtf8 reports missing/blocked binaries as messages like spawn systemctl ENOENT / spawn systemctl EACCES (see isSystemctlMissing), which do not match this list. In those environments, availability probes used by onboarding/doctor will incorrectly report systemd as available and proceed down the wrong flow. Reuse isSystemctlMissing (or equivalent checks) here before falling through to true.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed — added isSystemctlMissing(detail) check before the return true fallthrough in isSystemdUserServiceAvailable(), so ENOENT/EACCES now correctly returns false. Added a test covering the ENOENT case.
2ec1023 to
518ee79
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 518ee79cf7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Non-zero exit without any unavailability indicator means systemd is | ||
| // present but has degraded/failed units — still available for our purposes. | ||
| return true; |
There was a problem hiding this comment.
Treat spawn ENOENT/EACCES as unavailable
isSystemdUserServiceAvailable() now falls through to true for any non-zero systemctl --user status result that does not match the hardcoded substrings, but execFileUtf8 reports missing or blocked binaries as messages like spawn systemctl ENOENT/spawn systemctl EACCES (see isSystemctlMissing in the same file). On hosts where systemctl is absent or not executable, onboarding/doctor paths that rely on this probe will incorrectly report systemd as available and proceed down the wrong flow.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in the latest commit — same fix as above. isSystemctlMissing() catches both ENOENT and EACCES patterns.
| // Non-zero exit without missing/unavailability indicators means systemd is | ||
| // available but has degraded/failed units — acceptable for service operations. | ||
| } |
There was a problem hiding this comment.
Fail fast on unknown status errors before service actions
The new fall-through in assertSystemdAvailable() treats all non-matching systemctl --user status failures as acceptable, which means non-degraded errors (for example permission/policy failures that don't contain the four checked substrings) now proceed into service operations; uninstallSystemdService() then ignores the disable --now result and can print a successful removal after deleting the unit file even though systemd never accepted the disable/stop request. This regression is introduced by returning success on unknown status failures here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed — assertSystemdAvailable() now checks isSystemctlMissing() (throws) and bus-unavailability patterns (throws) before silently accepting the degraded state. Only a genuine degraded-but-available systemd falls through without throwing.
6c94d77 to
3a77074
Compare
…vailable for degraded systemd
…s-unavailable test mock
3a77074 to
da90474
Compare
|
Thanks for the work here. Closing this as superseded by #39325, which rebases the degraded-session fix onto current The core degraded-session problem you identified is real and still present on |
Summary
isSystemdUserServiceAvailable()andassertSystemdAvailable()both incorrectly treat a degraded systemd user session as unavailable.systemctl --user statusreturns a non-zero exit code when any user unit has failed, which is common on real Linux systems. The current code checks for specific unavailability indicators ("not found", "failed to connect", etc.) and correctly returnsfalsefor those, but the fall-through also returnsfalse/ throws, even though none of the unavailability patterns matched — meaning systemd IS available.Compare with
isSystemctlAvailable()(line 410) in the same file, which handles this correctly:Root cause
isSystemdUserServiceAvailable()line 202: fall-through returnsfalseinstead oftrueassertSystemdAvailable()line 214: throws instead of returning normallyImpact
On Linux systems with any failed systemd user units:
openclaw gateway install/stop/restartthrows "systemctl --user unavailable"Related to #33633, #34237, #34464.
Note: This fixes different functions from the existing is-enabled PRs — those address
isSystemdServiceEnabled()/readSystemctlDetail(), while this PR fixesisSystemdUserServiceAvailable()andassertSystemdAvailable().Tests
truefromisSystemdUserServiceAvailablestopSystemdServicesucceeds when systemd is degraded