Skip to content

Commit e131eae

Browse files
committed
fix: force package update restart handoff
1 parent 6efb449 commit e131eae

11 files changed

Lines changed: 279 additions & 38 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ Docs: https://docs.openclaw.ai
5353
- Plugins/runtime-deps: recover interrupted bundled runtime-dependency installs whose package sentinels exist but generated materialization is incomplete, forcing npm/pnpm repair in Gateway startup, doctor, and lazy plugin loads instead of leaving channels crash-looping on missing packages. Fixes #75309; refs #75310, #75296, and #75304. Thanks @scottgl9.
5454
- Plugins/runtime-deps: treat no-main and export-map package sentinels without reachable entry files as incomplete, so Gateway startup, doctor, and lazy plugin loads repair interrupted bundled dependency installs instead of accepting package.json-only partial installs. Fixes #75309; refs #75183. Thanks @shakkernerd.
5555
- Plugins/runtime-deps: keep runtime inspection and channel maintenance commands from downloading bundled plugin dependencies, route explicit repairs through `openclaw plugins deps --repair`, and still allow Gateway/DO paths to repair missing deps before import. Refs #75069. Thanks @xiaohuaxi.
56+
- Updates: force non-deferred update restarts after package-manager updates requested through the live Gateway control plane and fail release validation on post-swap stale chunk import crashes, so Telegram/Discord imports do not stay pointed at removed dist files. Fixes #75206. Thanks @xonaman and @faux123.
5657
- Agents/tool-result guard: use the resolved runtime context token budget for non-context-engine tool-result overflow checks, so long tool-heavy sessions no longer compact early when `contextTokens` is larger than native `contextWindow`. Fixes #74917. Thanks @kAIborg24.
5758
- Gateway/systemd: exit with sysexits 78 for supervised lock and `EADDRINUSE` conflicts so `RestartPreventExitStatus=78` stops `Restart=always` restart loops instead of repeatedly reloading plugins against an occupied port. Fixes #75115. Thanks @yhyatt.
5859
- Agents/runtime: skip blank visible user prompts at the embedded-runner boundary before provider submission while still allowing internal runtime-only turns and media-only prompts, so Telegram/group sessions no longer leak raw empty-input provider errors when replay history exists. Fixes #74137. Thanks @yelog, @Gracker, and @nhaener.

docs/cli/update.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,11 @@ install method aligned:
8282
- `beta` → prefers npm dist-tag `beta`, but falls back to `latest` when beta is
8383
missing or older than the current stable release.
8484

85-
The Gateway core auto-updater (when enabled via config) reuses this same update path.
85+
The Gateway core auto-updater (when enabled via config) launches the CLI update path
86+
outside the live Gateway request handler. Control-plane `update.run` package-manager
87+
updates force a non-deferred update restart after the package swap, because the old
88+
Gateway process may still have in-memory chunks that point at files removed by the
89+
new package.
8690

8791
For package-manager installs, `openclaw update` resolves the target package
8892
version before invoking the package manager. npm global installs use a staged
@@ -151,7 +155,7 @@ If an exact pinned npm plugin update resolves to an artifact whose integrity dif
151155
<Note>
152156
Post-update plugin sync failures fail the update result and stop restart follow-up work. Fix the plugin install or update error, then rerun `openclaw update`.
153157

154-
When the updated Gateway starts, enabled bundled plugin runtime dependencies are staged before plugin activation. Update-triggered restarts drain any active runtime-dependency staging before closing the Gateway, so service-manager restarts do not interrupt an in-flight npm install.
158+
When the updated Gateway starts, enabled bundled plugin runtime dependencies are staged before plugin activation. Package-manager `update.run` restarts bypass the normal idle deferral after the package tree has been swapped, so the old process cannot keep lazy-loading removed chunks. Service-manager restarts still drain runtime-dependency staging before closing the Gateway.
155159

156160
If pnpm bootstrap still fails, the updater stops early with a package-manager-specific error instead of trying `npm run build` inside the checkout.
157161
</Note>

docs/gateway/protocol.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,7 @@ enumeration of `src/gateway/server-methods/*.ts`.
378378
- `config.apply` validates + replaces the full config payload.
379379
- `config.schema` returns the live config schema payload used by Control UI and CLI tooling: schema, `uiHints`, version, and generation metadata, including plugin + channel schema metadata when the runtime can load it. The schema includes field `title` / `description` metadata derived from the same labels and help text used by the UI, including nested object, wildcard, array-item, and `anyOf` / `oneOf` / `allOf` composition branches when matching field documentation exists.
380380
- `config.schema.lookup` returns a path-scoped lookup payload for one config path: normalized path, a shallow schema node, matched hint + `hintPath`, and immediate child summaries for UI/CLI drill-down. Lookup schema nodes keep the user-facing docs and common validation fields (`title`, `description`, `type`, `enum`, `const`, `format`, `pattern`, numeric/string/array/object bounds, and flags like `additionalProperties`, `deprecated`, `readOnly`, `writeOnly`). Child summaries expose `key`, normalized `path`, `type`, `required`, `hasChildren`, plus the matched `hint` / `hintPath`.
381-
- `update.run` runs the gateway update flow and schedules a restart only when the update itself succeeded.
381+
- `update.run` runs the gateway update flow and schedules a restart only when the update itself succeeded. Package-manager updates force a non-deferred update restart after the package swap so the old Gateway process does not keep lazy-loading from a replaced `dist` tree.
382382
- `update.status` returns the latest cached update restart sentinel, including the post-restart running version when available.
383383
- `wizard.start`, `wizard.next`, `wizard.status`, and `wizard.cancel` expose the onboarding wizard over WS RPC.
384384

docs/install/updating.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,13 @@ The auto-updater is off by default. Enable it in `~/.openclaw/openclaw.json`:
168168
The gateway also logs an update hint on startup (disable with `update.checkOnStart: false`).
169169
For downgrade or incident recovery, set `OPENCLAW_NO_AUTO_UPDATE=1` in the gateway environment to block automatic applies even when `update.auto.enabled` is configured. Startup update hints can still run unless `update.checkOnStart` is also disabled.
170170

171+
Package-manager updates requested through the live Gateway control-plane handler
172+
force a non-deferred update restart after the package swap. That avoids leaving
173+
an old in-memory process around long enough to lazy-load chunks from a package
174+
tree that has already been replaced. Shell `openclaw update` remains the
175+
preferred path for supervised installs because it can stop and restart the
176+
service around the update.
177+
171178
## After updating
172179

173180
<Steps>
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Update run package self-upgrade
2+
3+
```yaml qa-scenario
4+
id: update-run-package-self-upgrade
5+
title: Update run package self-upgrade
6+
surface: runtime
7+
coverage:
8+
primary:
9+
- runtime.update-run
10+
secondary:
11+
- runtime.gateway-restart
12+
- runtime.package-update
13+
objective: Verify an agent can self-update an installed OpenClaw package from 2026.4.26 to latest by using the gateway update.run action, then recover through the forced restart.
14+
successCriteria:
15+
- The agent is explicitly instructed to use the gateway tool action update.run instead of shell package-manager commands.
16+
- The update request carries a restart note marker that can be observed after the gateway restart.
17+
- Gateway and qa-channel return healthy after update.run restarts the process.
18+
docsRefs:
19+
- docs/cli/update.md
20+
- docs/install/updating.md
21+
- docs/gateway/protocol.md
22+
codeRefs:
23+
- src/agents/tools/gateway-tool.ts
24+
- src/gateway/server-methods/update.ts
25+
- src/infra/restart.ts
26+
execution:
27+
kind: flow
28+
summary: "Opt-in destructive package-update lane: ask the agent to update a 2026.4.26 install to latest via gateway action update.run and verify the restart marker after recovery."
29+
config:
30+
requiredProviderMode: live-frontier
31+
sourceVersion: "2026.4.26"
32+
targetTag: latest
33+
allowEnv: OPENCLAW_QA_ALLOW_UPDATE_RUN_SELF
34+
channelId: qa-room
35+
```
36+
37+
```yaml qa-flow
38+
steps:
39+
- name: asks the agent to self-update through update.run
40+
actions:
41+
- if:
42+
expr: "env.gateway.runtimeEnv[config.allowEnv] !== '1'"
43+
then:
44+
- assert: "true"
45+
else:
46+
- call: waitForGatewayHealthy
47+
args:
48+
- ref: env
49+
- 60000
50+
- call: waitForQaChannelReady
51+
args:
52+
- ref: env
53+
- 60000
54+
- call: reset
55+
- set: sessionKey
56+
value:
57+
expr: "buildAgentSessionKey({ agentId: 'qa', channel: 'qa-channel', peer: { kind: 'channel', id: config.channelId } })"
58+
- call: createSession
59+
args:
60+
- ref: env
61+
- Update run package self-upgrade
62+
- ref: sessionKey
63+
- call: readEffectiveTools
64+
saveAs: tools
65+
args:
66+
- ref: env
67+
- ref: sessionKey
68+
- assert:
69+
expr: "tools.has('gateway')"
70+
message: gateway tool not present for update.run self-upgrade scenario
71+
- set: startIndex
72+
value:
73+
expr: state.getSnapshot().messages.length
74+
- set: marker
75+
value:
76+
expr: "`QA-UPDATE-RUN-${randomUUID().slice(0, 8)}`"
77+
- call: startAgentRun
78+
saveAs: started
79+
args:
80+
- ref: env
81+
- sessionKey:
82+
ref: sessionKey
83+
to:
84+
expr: "`channel:${config.channelId}`"
85+
message:
86+
expr: |-
87+
`Update-run self-upgrade QA check. The OpenClaw package under test was installed from openclaw@${config.sourceVersion} and must update itself to openclaw@${config.targetTag}. Use the gateway tool with action=update.run. Do not run npm, pnpm, bun, git pull, or shell package-manager commands yourself. Set note exactly to "${marker} update.run complete" and restartDelayMs to 0 so the post-restart channel message proves recovery.`
88+
timeoutMs:
89+
expr: liveTurnTimeoutMs(env, 180000)
90+
- call: waitForGatewayHealthy
91+
args:
92+
- ref: env
93+
- 180000
94+
- call: waitForQaChannelReady
95+
args:
96+
- ref: env
97+
- 180000
98+
- call: waitForOutboundMessage
99+
saveAs: outbound
100+
args:
101+
- ref: state
102+
- lambda:
103+
params: [candidate]
104+
expr: "candidate.text.includes(marker)"
105+
- expr: liveTurnTimeoutMs(env, 180000)
106+
- sinceIndex:
107+
ref: startIndex
108+
- call: env.gateway.call
109+
saveAs: updateStatus
110+
args:
111+
- update.status
112+
- {}
113+
- timeoutMs: 30000
114+
- assert:
115+
expr: "Boolean(updateStatus?.sentinel)"
116+
message:
117+
expr: "`update.status did not report a restart sentinel after update.run: ${JSON.stringify(updateStatus)}`"
118+
detailsExpr: "env.gateway.runtimeEnv[config.allowEnv] !== '1' ? `skipped destructive package self-update; set ${config.allowEnv}=1 to run` : `runId=${started.runId} marker=${marker} outbound=${outbound.text}`"
119+
```

scripts/openclaw-cross-os-release-checks.ts

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1256,46 +1256,18 @@ export function buildRealUpdateEnv(env) {
12561256
return updateEnv;
12571257
}
12581258

1259-
export function verifyPackagedUpgradeUpdateResult(result, options) {
1259+
export function verifyPackagedUpgradeUpdateResult(result, _options) {
12601260
if (result.exitCode === 0) {
12611261
return;
12621262
}
12631263

1264-
let payload = null;
1265-
try {
1266-
payload = JSON.parse(result.stdout);
1267-
} catch {
1268-
payload = null;
1269-
}
1270-
1271-
const steps = Array.isArray(payload?.steps) ? payload.steps : [];
1272-
const allStepsSucceeded = steps.every((step) => step?.exitCode === 0);
1273-
const afterVersion = typeof payload?.after?.version === "string" ? payload.after.version : "";
1274-
if (
1275-
payload?.status === "ok" &&
1276-
afterVersion === options.candidateVersion &&
1277-
allStepsSucceeded &&
1278-
isSelfSwappedPackageProcessExit(result.stderr)
1279-
) {
1280-
return;
1281-
}
1282-
12831264
throw new Error(
12841265
`Packaged upgrade failed (${result.exitCode}): ${trimForSummary(
12851266
`${result.stdout}\n${result.stderr}`,
12861267
)}`,
12871268
);
12881269
}
12891270

1290-
function isSelfSwappedPackageProcessExit(stderr) {
1291-
return (
1292-
typeof stderr === "string" &&
1293-
stderr.includes("[openclaw] Failed to start CLI:") &&
1294-
stderr.includes("ERR_MODULE_NOT_FOUND") &&
1295-
/[\\/]node_modules[\\/]openclaw[\\/]dist[\\/]/u.test(stderr)
1296-
);
1297-
}
1298-
12991271
export function resolveExplicitBaselineVersion(baselineSpec) {
13001272
const trimmed = baselineSpec.trim();
13011273
if (!trimmed || trimmed === "openclaw@latest") {

src/gateway/server-methods/update.test.ts

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,34 @@ describe("update.run restart scheduling", () => {
276276
);
277277
});
278278

279-
it("blocks unmanaged global installs before package mutation when restart is unavailable", async () => {
279+
it("forces an immediate restart after successful package-manager updates", async () => {
280+
resolveUpdateInstallSurfaceMock.mockResolvedValueOnce({
281+
kind: "global",
282+
mode: "npm",
283+
root: "/tmp/openclaw-global",
284+
packageRoot: "/tmp/openclaw-global",
285+
});
286+
287+
let payload:
288+
| { ok: boolean; result?: { status?: string; reason?: string; mode?: string } }
289+
| undefined;
290+
291+
await invokeUpdateRun({}, (_ok: boolean, response: unknown) => {
292+
payload = response as typeof payload;
293+
});
294+
295+
expect(runGatewayUpdateMock).toHaveBeenCalledTimes(1);
296+
expect(scheduleGatewaySigusr1RestartMock).toHaveBeenCalledWith(
297+
expect.objectContaining({
298+
delayMs: 0,
299+
reason: "update.run",
300+
skipDeferral: true,
301+
}),
302+
);
303+
expect(payload?.ok).toBe(true);
304+
});
305+
306+
it("blocks global package installs when the gateway cannot restart afterward", async () => {
280307
isRestartEnabledMock.mockReturnValue(false);
281308
detectRespawnSupervisorMock.mockReturnValue(null);
282309
resolveUpdateInstallSurfaceMock.mockResolvedValueOnce({

src/gateway/server-methods/update.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,11 +140,13 @@ export const updateHandlers: GatewayRequestHandlers = {
140140
// Only restart the gateway when the update actually succeeded.
141141
// Restarting after a failed update leaves the process in a broken state
142142
// (corrupted node_modules, partial builds) and causes a crash loop.
143+
const updateWasPackageSwap = result.status === "ok" && result.mode !== "git";
143144
const restart =
144145
result.status === "ok"
145146
? scheduleGatewaySigusr1Restart({
146-
delayMs: restartDelayMs,
147+
delayMs: updateWasPackageSwap ? 0 : restartDelayMs,
147148
reason: "update.run",
149+
skipDeferral: updateWasPackageSwap,
148150
audit: {
149151
actor: actor.actor,
150152
deviceId: actor.deviceId,

src/infra/infra-runtime.test.ts

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -483,6 +483,85 @@ describe("infra runtime", () => {
483483
}
484484
});
485485

486+
it("bypasses the pre-restart deferral check when requested", async () => {
487+
const emitSpy = vi.spyOn(process, "emit");
488+
const pendingCheck = vi.fn(() => 5);
489+
const handler = () => {};
490+
process.on("SIGUSR1", handler);
491+
try {
492+
setPreRestartDeferralCheck(pendingCheck);
493+
scheduleGatewaySigusr1Restart({
494+
delayMs: 0,
495+
reason: "update.run",
496+
skipDeferral: true,
497+
});
498+
499+
await vi.advanceTimersByTimeAsync(0);
500+
501+
expect(pendingCheck).not.toHaveBeenCalled();
502+
expect(emitSpy).toHaveBeenCalledWith("SIGUSR1");
503+
expect(peekGatewaySigusr1RestartReason()).toBe("update.run");
504+
} finally {
505+
process.removeListener("SIGUSR1", handler);
506+
}
507+
});
508+
509+
it("upgrades an already scheduled restart to bypass deferral", async () => {
510+
const emitSpy = vi.spyOn(process, "emit");
511+
const pendingCheck = vi.fn(() => 5);
512+
const handler = () => {};
513+
process.on("SIGUSR1", handler);
514+
try {
515+
setPreRestartDeferralCheck(pendingCheck);
516+
scheduleGatewaySigusr1Restart({ delayMs: 1_000, reason: "config.patch" });
517+
const forced = scheduleGatewaySigusr1Restart({
518+
delayMs: 1_000,
519+
reason: "update.run",
520+
skipDeferral: true,
521+
});
522+
523+
expect(forced.coalesced).toBe(false);
524+
525+
await vi.advanceTimersByTimeAsync(1_000);
526+
527+
expect(pendingCheck).not.toHaveBeenCalled();
528+
expect(emitSpy).toHaveBeenCalledWith("SIGUSR1");
529+
expect(peekGatewaySigusr1RestartReason()).toBe("update.run");
530+
} finally {
531+
process.removeListener("SIGUSR1", handler);
532+
}
533+
});
534+
535+
it("bypasses an active restart deferral when a forced restart arrives", async () => {
536+
const emitSpy = vi.spyOn(process, "emit");
537+
const staleBeforeEmit = vi.fn(async () => {});
538+
const handler = () => {};
539+
process.on("SIGUSR1", handler);
540+
try {
541+
setPreRestartDeferralCheck(() => 5);
542+
scheduleGatewaySigusr1Restart({
543+
delayMs: 0,
544+
reason: "config.patch",
545+
emitHooks: { beforeEmit: staleBeforeEmit },
546+
});
547+
await vi.advanceTimersByTimeAsync(0);
548+
expect(emitSpy).not.toHaveBeenCalledWith("SIGUSR1");
549+
550+
const forced = scheduleGatewaySigusr1Restart({
551+
delayMs: 0,
552+
reason: "update.run",
553+
skipDeferral: true,
554+
});
555+
556+
expect(forced.coalesced).toBe(false);
557+
expect(emitSpy).toHaveBeenCalledWith("SIGUSR1");
558+
expect(staleBeforeEmit).not.toHaveBeenCalled();
559+
expect(peekGatewaySigusr1RestartReason()).toBe("update.run");
560+
} finally {
561+
process.removeListener("SIGUSR1", handler);
562+
}
563+
});
564+
486565
it("emits SIGUSR1 after the default deferral timeout while work is still pending", async () => {
487566
const emitSpy = vi.spyOn(process, "emit");
488567
const handler = () => {};

0 commit comments

Comments
 (0)