Skip to content

fix(cli): block package updates from inside running gateway service#75729

Merged
hxy91819 merged 5 commits intomainfrom
fix/update-block-gateway-service
May 2, 2026
Merged

fix(cli): block package updates from inside running gateway service#75729
hxy91819 merged 5 commits intomainfrom
fix/update-block-gateway-service

Conversation

@hxy91819
Copy link
Copy Markdown
Member

@hxy91819 hxy91819 commented May 1, 2026

Summary

  • Problem: Running openclaw update from inside a managed gateway service process could replace the active OpenClaw dist tree while the live gateway lazy-loads old chunks, leading to runtime errors or inconsistent state.
  • Why it matters: Package updates from within the gateway service are unsafe because the update mutates files that the running process may still reference.
  • What changed:
    • Added detection for when the current process is running inside a gateway service (isRunningInsideGatewayService).
    • Added shouldBlockPackageUpdateFromGatewayServiceEnv to block package updates if the gateway is still running (with conservative fallback when service state cannot be inspected).
    • Extended PrePackageServiceStop to track inspected and running state for accurate guard decisions.
    • Added stripGatewayServiceMarkerEnv to prevent gateway service markers from leaking into the post-core update child process.
    • Added tests covering all three scenarios: blocking when running, allowing when not running, and stripping env markers.
  • What did NOT change: Git-based updates are unaffected. The block only applies to package manager updates (npm/pnpm/bun).

Change Type

  • Bug fix

Scope

  • Gateway / orchestration
  • UI / DX

Linked Issue/PR

  • N/A (discovered during internal review)

Root Cause

  • Root cause: The update command did not distinguish between running inside a gateway service process versus a normal shell. Package updates from within the service process mutate the dist tree in-place, which races with lazy chunk loading.
  • Missing detection / guardrail: No check for OPENCLAW_SERVICE_MARKER / OPENCLAW_SERVICE_KIND env vars before allowing package replacement.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: src/cli/update-cli.test.ts
  • Scenario the test should lock in: Package update from inside gateway service with --no-restart must be blocked; package update when gateway is not running must be allowed.
  • Why this is the smallest reliable guardrail: The guard is a conditional early-exit in the update command flow; unit tests with mocked service state are the cheapest way to verify all branch combinations.

User-visible / Behavior Changes

  • openclaw update now exits with an error and a helpful message when run from inside a running gateway service process (unless the service is stopped first or --no-restart is used with the service already stopped).

Security Impact

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Linux
  • Runtime: Node 22

Steps

  1. Start the gateway service (openclaw gateway start).
  2. From within the service context (or with OPENCLAW_SERVICE_MARKER=openclaw OPENCLAW_SERVICE_KIND=gateway), run openclaw update.

Expected

  • Update is blocked with a clear error message explaining the conflict.

Actual

  • Update proceeds, potentially corrupting the running gateway's loaded modules.

Human Verification

  • Verified scenarios:
    • All 70 existing update-cli.test.ts tests pass.
    • New tests cover blocking when running, allowing when stopped, and env stripping.
  • Edge cases checked:
    • Service state inspection failure → conservative block.
    • Service not installed → allow (not inside a managed service).
    • Post-core update process strips markers so child does not inherit gateway identity.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Risks and Mitigations

  • Risk: Overly aggressive blocking when service state cannot be read.
    • Mitigation: This is intentional conservative behavior; users can run the update from a normal shell or stop the service first.

@openclaw-barnacle openclaw-barnacle Bot added cli CLI command changes size: S maintainer Maintainer-authored PR labels May 1, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Codex review: found issues before merge.

Summary
The PR replaces the blanket service-marker package-update block with runtime-state inspection, strips service markers from the post-core child process, and adds CLI tests plus a changelog entry.

Reproducibility: yes. A source-level reproduction is a package-manager openclaw update launched by a Gateway-owned exec child with service markers while the systemd-managed Gateway is running; the PR reaches the service stop path before the block.

Next step before merge
Maintainers should decide whether this PR absorbs the ancestry/fail-closed guard or is sequenced with #75819 before merge.

Security
Cleared: The diff changes CLI update control flow, tests, and changelog text without adding dependencies, workflows, package sources, permissions, downloads, or secret handling.

Review findings

  • [P1] Block service-owned updates before stopping the Gateway — src/cli/update-cli/update-command.ts:290-291
Review details

Best possible solution:

Keep the runtime-state guard and marker cleanup, but block service-owned callers before stopping the Gateway or hand off to a detached updater outside the service control group.

Do we have a high-confidence way to reproduce the issue?

Yes. A source-level reproduction is a package-manager openclaw update launched by a Gateway-owned exec child with service markers while the systemd-managed Gateway is running; the PR reaches the service stop path before the block.

Is this the best way to solve the issue?

No. The tri-state runtime inspection is useful, but allowing stopState.stopped from a service-marked caller is not the narrowest maintainable fix without an ancestry guard or detached handoff.

Full review comments:

  • [P1] Block service-owned updates before stopping the Gateway — src/cli/update-cli/update-command.ts:290-291
    Returning false when stopState.stopped is true lets a service-marked updater proceed after it stops the service that owns its process tree. Systemd units use KillMode=control-group, and OpenClaw service-mode exec children are attached, so systemctl stop can terminate the updater before package replacement/restart completes; block before stopping or hand off to a detached updater.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.87

Acceptance criteria:

  • pnpm test src/cli/update-cli.test.ts -t "inherited gateway service env|post-core update process|stops a running managed gateway|runtime status is unknown|runtime probe fails|service definition is missing"
  • pnpm test src/cli/update-cli.test.ts
  • git diff --check

What I checked:

Likely related people:

  • steipete: Recent history shows repeated work on package update restart safety, service-mode systemd hardening, and the update command paths involved here. (role: recent maintainer and historical update/service owner; confidence: high; commits: 067375cee3e2, 6077941d0bd7, cc7e61612aa8; files: src/cli/update-cli/update-command.ts, src/daemon/systemd-unit.ts, src/process/supervisor/adapters/child.ts)
  • vincentkoc: Recent commits touched package update service environment handling, restart port verification, and post-update sync behavior near the affected update flow. (role: recent update-path maintainer; confidence: medium; commits: cad2cef0fb8c, debb8ac76c84, 43da08979035; files: src/cli/update-cli/update-command.ts)
  • hclsys: Recent process-supervisor work refined service-managed child detachment and kill-tree behavior, which is central to whether a Gateway-owned updater survives service stop. (role: adjacent process-supervisor maintainer; confidence: medium; commits: 4a72e1b99017; files: src/process/supervisor/adapters/child.ts)

Remaining risk / open question:

  • Merging this PR before an ancestry/fail-closed guard can turn a service-owned update into a stopped Gateway plus a killed updater instead of a completed package replacement and restart.
  • The sequencing between this PR and related PR fix(cli): block gateway-owned package updates #75819 is still a maintainer decision.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 81e1deade2b1.

@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 1, 2026

Fixes #75691

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed22327046

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/update-cli/update-command.ts Outdated
@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4bdb1a634a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/update-cli/update-command.ts Outdated
@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 1, 2026

Follow-up fixed in 6a4db93. I verified the concern was still present after the prior patch: installed: false returned before considering runtime state, so a missing/unreadable service definition could be treated as safe in an inherited service-marked environment.

The update guard now keeps the runtime result even when the service definition is missing:

  • if runtime inspection is unavailable, package replacement stays blocked;
  • if runtime reports the Gateway is still running, package replacement stays blocked;
  • only an explicit non-running runtime state or a successful stop permits the update to continue.

Added regression coverage for the missing-definition-but-live-runtime case.

Validation:

  • pnpm test src/cli/update-cli.test.ts -t "inherited gateway service env|post-core update process|stops a running managed gateway|runtime inspection is inconclusive|service definition is missing"
  • pnpm test src/cli/update-cli.test.ts
  • git diff --check

@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 2, 2026

Thanks everyone for the review and follow-up testing.

Plan from my side:

  • I think this PR should land first. It addresses the broader service-marker/runtime-state safety path for package updates.
  • The remaining PPID-ancestry case is real, but it is a separate guard: it covers callers that are in the active Gateway process tree even when the service-marker env is absent.
  • I am asking for that narrower guard to continue in fix(cli): block gateway-owned package updates #75819 after it is rebased on top of this PR, so we do not conflate the two paths or keep expanding this PR indefinitely.

I am also running/asking for black-box coverage separately. Given the tradeoff, I do not think full systemd self-update acceptance should block this focused guard fix; if the black-box run finds another edge case, we can follow up with a small PR.

@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a4db93d6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/update-cli/update-command.ts
@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 2, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@hxy91819 hxy91819 self-assigned this May 2, 2026
@hxy91819 hxy91819 force-pushed the fix/update-block-gateway-service branch from 459ca14 to 26b053b Compare May 2, 2026 13:45
@hxy91819 hxy91819 force-pushed the fix/update-block-gateway-service branch from 26b053b to 8f301c5 Compare May 2, 2026 13:47
@hxy91819 hxy91819 merged commit 0b09cfb into main May 2, 2026
79 checks passed
@hxy91819 hxy91819 deleted the fix/update-block-gateway-service branch May 2, 2026 13:48
@hxy91819
Copy link
Copy Markdown
Member Author

hxy91819 commented May 2, 2026

Merged via squash.

Thanks @hxy91819!

lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
…penclaw#75729)

Merged via squash.

Prepared head SHA: 8f301c5
Co-authored-by: hxy91819 <8814856+hxy91819@users.noreply.github.com>
Co-authored-by: hxy91819 <8814856+hxy91819@users.noreply.github.com>
Reviewed-by: @hxy91819
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…penclaw#75729)

Merged via squash.

Prepared head SHA: 8f301c5
Co-authored-by: hxy91819 <8814856+hxy91819@users.noreply.github.com>
Co-authored-by: hxy91819 <8814856+hxy91819@users.noreply.github.com>
Reviewed-by: @hxy91819
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI command changes maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant