fix(cli): clarify completion cache timeout message after openclaw update by iot2edge · Pull Request #72850 · openclaw/openclaw

iot2edge · 2026-04-27T14:32:20Z

Fixes #72842

Summary

Problem: When openclaw update finishes, it spawns a child process to refresh the shell completion cache with a 30-second timeout. On slower environments (Debian/Docker, slow disk, large bundled-plugin trees) that child can exceed 30s and is killed. The user sees Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT — Node's raw error string with no context.
Why it matters: The reporter of [Bug]: Why have the last two upgrades both reported the error: Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT? #72842 saw this error twice in a row and explicitly asked "why does this happen?" The current message exposes a low-level Node error without telling the user (a) it's non-fatal, (b) what the cache does, or (c) how to recover. It looks alarming for what is actually a benign tab-completion staleness.
What changed: In tryWriteCompletionCache (src/cli/update-cli/shared.ts), when the spawned child returns result.error, detect ETIMEDOUT and surface "timed out after 30s" instead of the raw Error: spawnSync /usr/bin/node ETIMEDOUT. Both the result.error and the non-zero result.status branches now also append a single-line manual-refresh hint: "Shell tab-completion may be stale; refresh manually with: openclaw completion --write-state". Added a regression test in src/cli/update-cli.test.ts that asserts the friendly text is logged and the raw Error: spawnSync string is not.
What did NOT change (scope boundary): The 30-second timeout itself, the spawnSync call shape, the OPENCLAW_COMPLETION_SKIP_PLUGIN_COMMANDS env var, the silent return on error (still non-fatal), JSON-mode silence, and the duplicate-but-separate COMPLETION_CACHE_WRITE_TIMEOUT_MS constant in src/commands/doctor-completion.ts. Did not touch the underlying performance issue (fast plugin-load on cold start) — that is environment-specific and out of scope for a UX fix.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: Why have the last two upgrades both reported the error: Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT? #72842
Related #
This PR fixes a bug or regression

Root Cause

Root cause: After a successful openclaw update, update-command.ts calls tryWriteCompletionCache(postUpdateRoot, jsonMode). That function spawns node openclaw.mjs completion --write-state with a 30-second timeout. When the child takes longer than 30s (slow disk, bundled plugin tree cold-load, Docker overlayfs), Node's spawnSync kills the child and returns {error: SystemError(code: "ETIMEDOUT")}. The existing handler interpolates that error directly into the user-facing log: Completion cache update failed: ${String(result.error)}. The result string is Error: spawnSync /usr/bin/node ETIMEDOUT — accurate but unhelpful.
Missing detection / guardrail: there was no test asserting the user-visible message text on the timeout/error path; the only existing coverage at src/cli/update-cli.test.ts:534 checked the spawn invocation arguments. So the message wording was never exercised.
Contributing context (if known): the same 30-second timeout pattern is duplicated in src/commands/doctor-completion.ts:19-41, but that path uses result.status === 0 only and doesn't surface the error to the user, so the unhelpful message was Update-only.

Regression Test Plan

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/cli/update-cli.test.ts (in the same describe block as "bounds completion cache refresh during update follow-up").
Scenario the test should lock in:
1. spawnSync returns {error: <ETIMEDOUT>} (mocked once).
2. Logged warning includes "timed out after 30s".
3. Logged warning includes the manual-refresh hint "openclaw completion --write-state".
4. Logged warning does not include the raw "Error: spawnSync" string (regression guard against falling back to String(result.error)).
Why this is the smallest reliable guardrail: tryWriteCompletionCache is a small async function with one external call (spawnSync). Mocking that single seam exercises every branch deterministically; no real subprocess is needed and the test runs in milliseconds.
Existing test that already covers this (if any): None — bounds completion cache refresh during update follow-up only asserts the spawn invocation arguments, not the error path or message text.
If no new test is added, why not: N/A — 1 new test added.

User-visible / Behavior Changes

Before:

Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT

After:

Completion cache update failed: timed out after 30s. Shell tab-completion may be stale; refresh manually with: openclaw completion --write-state

For non-timeout errors (rare), the message now also appends the manual-refresh hint. For non-zero exit status, same.
JSON mode (--json) still suppresses the message entirely. Behavior for the success path is unchanged.
The function is still non-fatal (returns silently after logging); openclaw update still reports overall success.

Diagram

openclaw update flow

Before:
[update succeeds]
  -> tryWriteCompletionCache spawns child (30s timeout)
  -> child times out (ETIMEDOUT)
  -> log: "Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT"
  -> user: "is my install broken?"

After:
[update succeeds]
  -> tryWriteCompletionCache spawns child (30s timeout)
  -> child times out (ETIMEDOUT)
  -> log: "Completion cache update failed: timed out after 30s.
          Shell tab-completion may be stale; refresh manually with:
          openclaw completion --write-state"
  -> user knows it's harmless and how to recover

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No (same spawnSync shape)
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Debian (Docker), per the issue reporter
Runtime/container: Node 22+, npm-installed openclaw global
Model/provider: N/A — this is a CLI flow, not a model call
Integration/channel (if any): N/A
Relevant config (redacted): None

Steps

Install openclaw via npm install -g openclaw@latest in an environment where the bundled-plugin-tree cold-load takes >30s (e.g. Docker on slow disk).
Run openclaw update.
Observe the closing log line.

Expected

Update reports overall success.
Closing message: Completion cache update failed: timed out after 30s. Shell tab-completion may be stale; refresh manually with: openclaw completion --write-state.

Actual (before fix)

Update reports overall success.
Closing message: Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT — opaque and alarming.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

$ pnpm test src/cli/update-cli.test.ts
 Test Files  1 passed (1)
      Tests  63 passed (63)   # 62 existing + 1 new for the friendly-timeout message

pnpm check:changed is green: conflict markers, typecheck, lint, runtime import cycles, plus the various core repo guard checks all pass.

Human Verification (required)

Verified scenarios:
- Targeted vitest run for src/cli/update-cli.test.ts (63/63 pass locally).
- Full pnpm check:changed gate (all lanes green, including the additional core guards that fire when src/cli/** is touched).
- Re-read the function before and after the change to confirm the success and JSON-mode paths are untouched.
Edge cases checked:
- result.error with code: "ETIMEDOUT" → friendly message with manual-refresh hint.
- result.error without code: "ETIMEDOUT" → falls back to String(result.error), still appends manual-refresh hint.
- Non-zero result.status (child ran but exited with error) → keeps original (${stderr}) detail and appends manual-refresh hint.
- JSON mode (jsonMode: true) → no message logged in any branch.
- Successful run (status: 0, no error) → no message logged (unchanged).
What you did not verify:
- Reproducing the original ETIMEDOUT on a real Debian/Docker install. The fix is purely user-facing message text and is fully covered by the unit test mock; reproducing the underlying performance issue requires a slow-disk environment that I do not have access to. The test asserts the exact branch the reporter hit.
- The duplicate COMPLETION_CACHE_WRITE_TIMEOUT_MS in src/commands/doctor-completion.ts. That path doesn't surface a user-facing failure message and is therefore not affected by the bug; harmonizing the two constants would be a separate refactor and is out of scope for this PR.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

(Both will be checked once review activity lands. Currently no bot review conversations on this PR.)

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: a downstream tool that scrapes the existing Error: spawnSync /usr/bin/node ETIMEDOUT string will no longer match.
- Mitigation: this is a warning log for human consumption, not part of any documented machine-readable surface. JSON mode (--json), which is what scriptable callers use, is unchanged and still emits no completion-cache message at all.
Risk: the manual-refresh hint slightly lengthens the warning line on narrow terminals.
- Mitigation: it's still a single line and warning-level (yellow). The added information directly answers the question the reporter asked. If a maintainer prefers a two-line layout, splitting on theme.warn(...) is a one-line follow-up.

greptile-apps · 2026-04-27T14:34:22Z

Greptile Summary

This PR improves the user-facing error message when the shell-completion cache write times out after openclaw update, replacing the opaque Error: spawnSync /usr/bin/node ETIMEDOUT with a friendly "timed out after 30s" string and a manual-refresh hint. It also bundles two unrelated fixes: dynamic 1 M context-window resolution for GitHub Copilot models with -1m/-1m-internal suffixes, and a display-label correction for nested provider model IDs (OpenRouter) in the Telegram extension.

Confidence Score: 4/5

Safe to merge; the core fix is correct and well-tested, with one low-priority question about an "internal" model ID in public defaults.

All three changes are small, targeted, and covered by new tests. The only finding is a P2 question about claude-opus-4.7-1m-internal appearing in the default Copilot model list — not a bug, but worth confirming intent before it ships to all users.

extensions/github-copilot/models-defaults.ts — confirm the "internal" model ID in DEFAULT_MODEL_IDS is intentional.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/github-copilot/models-defaults.ts
Line: 11

Comment:
**"internal" model ID in public defaults**

`claude-opus-4.7-1m-internal` contains "internal" in its name, which normally signals a model not intended for broad user exposure. The comment above the list says unknown model IDs are harmless (Copilot returns an error and users can remove it), but surfacing an internal-tagged model in the default list could surprise users who see it appear in their model picker or wonder why it errors out. Was this intentional, or should it be gated behind a separate config opt-in?

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(cli): clarify completion cache timeo..." | Re-trigger Greptile}

greptile-apps · 2026-04-27T14:34:26Z

 const DEFAULT_MAX_TOKENS = 8192;

 // Copilot model ids vary by plan/org and can change.
 // We keep this list intentionally broad; if a model isn't available Copilot will
 // return an error and users can remove it from their config.
 const DEFAULT_MODEL_IDS = [
  "claude-opus-4.7",
+  "claude-opus-4.7-1m-internal",


"internal" model ID in public defaults

claude-opus-4.7-1m-internal contains "internal" in its name, which normally signals a model not intended for broad user exposure. The comment above the list says unknown model IDs are harmless (Copilot returns an error and users can remove it), but surfacing an internal-tagged model in the default list could surprise users who see it appear in their model picker or wonder why it errors out. Was this intentional, or should it be gated behind a separate config opt-in?

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/github-copilot/models-defaults.ts Line: 11 Comment: **"internal" model ID in public defaults** `claude-opus-4.7-1m-internal` contains "internal" in its name, which normally signals a model not intended for broad user exposure. The comment above the list says unknown model IDs are harmless (Copilot returns an error and users can remove it), but surfacing an internal-tagged model in the default list could surprise users who see it appear in their model picker or wonder why it errors out. Was this intentional, or should it be gated behind a separate config opt-in? How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

iot2edge · 2026-04-27T14:44:26Z

claude-opus-4.7-1m-internal is the exact id that GitHub Copilot's /models endpoint advertises (per the evidence in #72805 — the reporter pasted the upstream listing). It's labeled "Internal only" by Copilot's own UI, meaning it's gated by plan/org rather than a hidden test model. Following the file's existing policy — "Copilot model ids vary by plan/org and can change. We keep this list intentionally broad; if a model isn't available Copilot will return an error and users can remove it from their config." — including it is consistent with how o1 and other plan-restricted entries are already shipped. If maintainers prefer to gate it behind a separate opt-in, happy to split.

When the post-update completion cache refresh times out (slow disk, large bundled plugin tree, Docker overlayfs), the user previously saw the opaque 'Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT'. Detect ETIMEDOUT specifically, surface 'timed out after 30s', and append a manual refresh hint pointing at 'openclaw completion --write-state' so users know it's non-fatal and how to recover. Fixes openclaw#72842

openclaw-barnacle Bot added channel: telegram Channel integration: telegram cli CLI command changes size: S labels Apr 27, 2026

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

iot2edge force-pushed the fix/update-completion-cache-timeout-message branch from b3d54e7 to 4161b24 Compare April 27, 2026 14:42

openclaw-barnacle Bot added size: XS and removed channel: telegram Channel integration: telegram size: S labels Apr 27, 2026

aaronescobar09 mentioned this pull request Apr 27, 2026

fix(cli): allow slower completion cache refreshes #72910

Closed

steipete force-pushed the fix/update-completion-cache-timeout-message branch 4 times, most recently from ce9f837 to c6b9e13 Compare April 27, 2026 20:09

steipete force-pushed the fix/update-completion-cache-timeout-message branch from c6b9e13 to cec39ab Compare April 27, 2026 20:12

steipete merged commit 9892838 into openclaw:main Apr 27, 2026
67 of 68 checks passed

clawsweeper Bot mentioned this pull request Apr 30, 2026

[Bug]: Onboarding slowdown in recent versions #74879

Closed

clawsweeper Bot mentioned this pull request May 21, 2026

fix(update): make completion-cache spawn timeout configurable #76227

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli): clarify completion cache timeout message after openclaw update#72850

fix(cli): clarify completion cache timeout message after openclaw update#72850
steipete merged 1 commit into
openclaw:mainfrom
iot2edge:fix/update-completion-cache-timeout-message

iot2edge commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

iot2edge commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

iot2edge commented Apr 27, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

iot2edge commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants