Skip to content

fix: handle missing systemctl in containers (#26089)#26699

Merged
vincentkoc merged 8 commits intoopenclaw:mainfrom
sahilsatralkar:fix/issue-26089-systemctl-unavailable-docker
Mar 2, 2026
Merged

fix: handle missing systemctl in containers (#26089)#26699
vincentkoc merged 8 commits intoopenclaw:mainfrom
sahilsatralkar:fix/issue-26089-systemctl-unavailable-docker

Conversation

@sahilsatralkar
Copy link
Contributor

@sahilsatralkar sahilsatralkar commented Feb 25, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: When gateway fails to start in Docker containers, the error handler calls maybeExplainGatewayServiceStop() which attempts to run systemctl --user status. In containers where systemctl doesn't exist, this throws an unhandled error: spawn systemctl ENOENT.
  • Why it matters: The Docker container crashes on startup in containerized environments (Kubernetes, Docker, Zeabur), preventing the application from running.
  • What changed: Added isSystemctlPresent() function to check if systemctl binary exists before executing it. Modified isSystemdServiceEnabled(), assertSystemdAvailable(), readSystemdServiceRuntime(), installSystemdService(), and uninstallSystemdService() to handle missing systemctl gracefully.
  • What did NOT change (scope boundary): Did not add container environment detection - simple binary check is sufficient. Did not modify service resolution logic in service.ts.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Gateway no longer crashes when systemd is unavailable (e.g., in containers)
  • Clear error messages when attempting service operations in containers: "systemctl not found; cannot install systemd service. This operation is not supported in containerized environments."

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Linux (containerized)
  • Runtime/container: Docker/Kubernetes
  • Relevant config: gateway.mode=local, --allow-unconfigured

Steps

  1. Deploy OpenClaw Docker image to Kubernetes/Docker
  2. Set environment: OPENCLAW_GATEWAY_PORT=18789, OPENCLAW_GATEWAY_BIND=0.0.0.0, NODE_ENV=production
  3. Start the container

Expected

Container starts successfully, application listens on port 18789

Actual (before fix)

Container crashes with error: systemctl --user unavailable: spawn systemctl ENOENT

Evidence

Attach at least one:

  • Failing test/log before + passing after

Before fix: The existing tests would fail in CI environments where systemctl is not available, or the code would throw unhandled errors in containers.
After fix: All tests pass:
✓ src/daemon/systemd.test.ts (17 tests)

  • returns true when systemctl --user succeeds
  • returns false when systemd user bus is unavailable
  • isSystemdServiceEnabled returns false when systemctl is not present ✓
  • isSystemdServiceEnabled calls systemctl is-enabled when systemctl is present ✓
  • stops the resolved user unit ✓
  • restarts a profile-specific user unit ✓
  • surfaces stop failures with systemctl detail ✓
    Test Files: 1 passed (1)
    Tests: 17 passed (17)

  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: Unit tests pass for systemd functions with mocked systemctl missing
  • Edge cases checked: install/uninstall when systemctl is present vs absent
  • What you did NOT verify: Actual Docker container deployment

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert commit ceb08fd
  • Files/config to restore: src/daemon/systemd.ts, src/daemon/systemd.test.ts

Risks and Mitigations

  • Risk: None identified - the change is defensive and only adds error handling where none existed
  • Mitigation: Tests verify behavior with and without systemctl present

PR generated with OpenCode and MiniMax M2.5

Build plan prompt-

Implementation Plan: Fix systemctl unavailable in Docker container deployment

Issue: #26089

  • Problem: When gateway fails to start in Docker containers, the error handler calls maybeExplainGatewayServiceStop() which attempts to check systemd status via systemctl --user status. In Docker containers where systemctl doesn't exist, this throws an unhandled error.
  • Error: systemctl --user unavailable: spawn systemctl ENOENT
  • Root Cause:
    1. Gateway fails to start (port binding issue, etc.)
    2. Error handler calls maybeExplainGatewayServiceStop() in src/cli/gateway-cli/shared.ts
    3. This calls service.isLoaded()isSystemdServiceEnabled()assertSystemdAvailable()
    4. assertSystemdAvailable() tries systemctl --user status → ENOENT in containers where systemctl doesn't exist

NOTE: This plan file should NOT be included in git commits. It is needed later when creating the PR description.


Branch Name

fix/issue-26089-systemctl-unavailable-docker


Implementation Steps

Step 1: Create branch and establish baseline

  • 1.1 Create new branch: git checkout -b fix/issue-26089-systemctl-unavailable-docker
  • 1.2 Install dependencies: pnpm install
  • 1.3 Run baseline tests:
    • pnpm build
    • pnpm check
    • pnpm test

Step 2: Add isSystemctlPresent() function

  • 2.1 In src/daemon/systemd.ts, add new function:
    async function isSystemctlPresent(): Promise<boolean> {
      try {
        await execFileUtf8("command", ["-v", "systemctl"]);
        return true;
      } catch {
        return false;
      }
    }
  • 2.2 Add unit tests for isSystemctlPresent() in src/daemon/systemd.test.ts

Step 3: Modify isSystemdServiceEnabled()

  • 3.1 Update to check if systemctl exists first
  • 3.2 Return false gracefully if systemctl is not available (instead of throwing)
  • 3.3 Add/update tests for this behavior

Step 4: Modify assertSystemdAvailable()

  • 4.1 Update to check if systemctl binary exists first
  • 4.2 Provide clear error messages:
    • "systemctl binary not found" when systemctl doesn't exist
    • "systemd user bus unavailable" when systemd isn't running

Step 5: Modify readSystemdServiceRuntime()

  • 5.1 Handle case where systemctl is not available
  • 5.2 Return { status: 'unknown', detail: '...' } instead of throwing

Step 6: Modify install/uninstall functions

  • 6.1 Update installSystemdService() to check systemctl availability first
  • 6.2 Update uninstallSystemdService() to handle missing systemctl
  • 6.3 Provide clear error messages for container environments

Step 7: Verify maybeExplainGatewayServiceStop() handles errors

  • 7.1 In src/cli/gateway-cli/shared.ts, ensure error handling around isLoaded() call
  • 7.2 Make it gracefully handle the case when systemctl is unavailable

Step 8: Run tests and verify

  • 8.1 Run unit tests: pnpm test
  • 8.2 Run type check: pnpm tsgo
  • 8.3 Run lint/format: pnpm check

Step 9: Commit changes

  • 9.1 Build: pnpm build
  • 9.2 Test: pnpm test
  • 9.3 Format: pnpm format
  • 9.4 Commit: scripts/committer "Daemon: handle missing systemctl in containers" <files>

Step 10: Push and create PR


CI/CD Tests to Run Locally

Before pushing:

  1. pnpm build
  2. pnpm tsgo
  3. pnpm check
  4. pnpm format
  5. pnpm test
  6. pnpm protocol:check

Key Files to Modify

  1. src/daemon/systemd.ts - Add isSystemctlPresent() and graceful error handling
  2. src/daemon/systemd.test.ts - Add tests for new behavior
  3. src/cli/gateway-cli/shared.ts - Handle errors in maybeExplainGatewayServiceStop()

NOT Doing (Decision)

  • Container environment detection (not needed - simple binary check is sufficient)
  • Complex local container simulation for testing
  • Service.ts modifications

Expected Outcome

  • Gateway starts successfully in Docker containers
  • If systemd functions are called, they handle missing systemctl gracefully
  • Clear error messages guide users when they try to use service management in containers

Greptile Summary

Added isSystemctlPresent() function to check for systemctl binary availability before attempting systemd operations, preventing crashes in containerized environments where systemctl doesn't exist.

  • Modified isSystemdServiceEnabled() to return false gracefully when systemctl is absent
  • Updated readSystemdServiceRuntime() to return status unknown with descriptive detail when systemctl is missing
  • Enhanced installSystemdService() and uninstallSystemdService() to check for systemctl presence first
  • Modified assertSystemdAvailable() to provide clearer error messages distinguishing between missing binary and unavailable systemd user bus
  • Added comprehensive test coverage for the new behavior

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is defensive, well-tested, and follows established patterns. The changes only add error handling where none existed before, preventing crashes without altering core logic. Tests verify both paths (systemctl present and absent), and the error handling properly propagates through the service abstraction layer.
  • No files require special attention

Last reviewed commit: ceb08fd

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime size: S labels Feb 25, 2026
@vincentkoc vincentkoc force-pushed the fix/issue-26089-systemctl-unavailable-docker branch from 2025351 to d801792 Compare March 2, 2026 05:17
@vincentkoc
Copy link
Member

@aisle-research-bot review

@aisle-research-bot
Copy link

aisle-research-bot bot commented Mar 2, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Systemd service status check masks systemctl errors (false “disabled” result)

1. 🟡 Systemd service status check masks systemctl errors (false “disabled” result)

Property Value
Severity Medium
CWE CWE-703
Location src/daemon/systemd.ts:331-336

Description

isSystemdServiceEnabled() no longer calls assertSystemdAvailable() and now returns res.code === 0 for systemctl --user is-enabled.

This introduces an error-masking / state-confusion bug:

  • Any failure to execute or talk to systemctl (e.g. ENOENT missing binary, EACCES, DBus/user-session errors like “Failed to connect to bus”, unexpected stderr) is treated as false (i.e., “service disabled”).
  • Callers commonly interpret false as “not installed / safe to proceed” and may skip stop/uninstall safety steps or omit warnings.

Concrete call paths affected (Linux):

  • resolveGatewayService() wires GatewayService.isLoaded to isSystemdServiceEnabled().
  • src/commands/uninstall.ts (stopAndUninstallService) returns early on !loaded (skips stop/uninstall entirely).
  • src/commands/reset.ts (stopGatewayIfRunning) returns early on !loaded (skips stop), then proceeds to delete/reset state.
  • src/cli/gateway-cli/shared.ts (maybeExplainGatewayServiceStop) returns early when loaded === false, skipping the user warning that a supervised service may still be running.

As a result, a user-session/DBus issue or PATH/systemctl execution failure can cause the CLI to silently treat a running/enabled systemd user service as “disabled”, potentially leaving the gateway running when the user believes it has been stopped/uninstalled/reset.

Recommendation

Treat systemctl execution/availability and DBus/user-session failures as unknown/error, not “disabled”. Options:

  1. Restore the availability assertion:
export async function isSystemdServiceEnabled(args: GatewayServiceEnvArgs): Promise<boolean> {
  await assertSystemdAvailable();
  const serviceName = resolveSystemdServiceName(args.env ?? {});
  const unitName = `${serviceName}.service`;
  const res = await execSystemctl(["--user", "is-enabled", unitName]);
  return res.code === 0;
}
  1. Or, if you want non-throwing behavior, return a tri-state (true | false | null) or {enabled, status, detail} and update callers to handle unknown by warning/failing rather than skipping stop/uninstall:
export async function isSystemdServiceEnabled(args: GatewayServiceEnvArgs): Promise<boolean | null> {
  const serviceName = resolveSystemdServiceName(args.env ?? {});
  const unitName = `${serviceName}.service`;
  const res = await execSystemctl(["--user", "is-enabled", unitName]);
  if (res.code === 0) return true;

  const detail = readSystemctlDetail(res);// Known “not enabled” states -> false
  if (/\b(disabled|static|indirect|masked)\b/i.test(detail)) return false;// Execution/bus/session problems -> unknown
  throw new Error(`systemctl is-enabled failed: ${detail || "unknown error"}`);
}

Also consider using isSystemctlMissing(detail) to specifically throw on missing/permission spawn errors so call sites can present actionable guidance.


Analyzed PR: #26699 at commit 07b54e3

Last updated on: 2026-03-02T05:41:11Z

@vincentkoc vincentkoc merged commit cda119b into openclaw:main Mar 2, 2026
21 checks passed
safzanpirani pushed a commit to safzanpirani/clawdbot that referenced this pull request Mar 2, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
steipete pushed a commit to Sid-Qin/openclaw that referenced this pull request Mar 2, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
robertchang-ga pushed a commit to robertchang-ga/openclaw that referenced this pull request Mar 2, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
hanqizheng pushed a commit to hanqizheng/openclaw that referenced this pull request Mar 2, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
execute008 pushed a commit to execute008/openclaw that referenced this pull request Mar 2, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
sachinkundu pushed a commit to sachinkundu/openclaw that referenced this pull request Mar 6, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
zooqueen pushed a commit to hanzoai/bot that referenced this pull request Mar 6, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
atlastacticalbot pushed a commit to tensakulabs/atlasbot that referenced this pull request Mar 6, 2026
…w#26699)

* Daemon: handle missing systemctl in containers

* Daemon: harden missing-systemctl detection

* Daemon tests: cover systemctl spawn failure path

* Changelog: note container systemctl service-check fix

* Update CHANGELOG.md

* Daemon: fail closed on unknown systemctl is-enabled errors

* Daemon tests: cover is-enabled unknown-error path

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
(cherry picked from commit cda119b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [Bug]: systemctl --user unavailable in Docker container deployment

2 participants