fix(cli): clarify completion cache timeout message after openclaw update#72850
Conversation
Greptile SummaryThis PR improves the user-facing error message when the shell-completion cache write times out after Confidence Score: 4/5Safe to merge; the core fix is correct and well-tested, with one low-priority question about an "internal" model ID in public defaults. All three changes are small, targeted, and covered by new tests. The only finding is a P2 question about claude-opus-4.7-1m-internal appearing in the default Copilot model list — not a bug, but worth confirming intent before it ships to all users. extensions/github-copilot/models-defaults.ts — confirm the "internal" model ID in DEFAULT_MODEL_IDS is intentional. Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/github-copilot/models-defaults.ts
Line: 11
Comment:
**"internal" model ID in public defaults**
`claude-opus-4.7-1m-internal` contains "internal" in its name, which normally signals a model not intended for broad user exposure. The comment above the list says unknown model IDs are harmless (Copilot returns an error and users can remove it), but surfacing an internal-tagged model in the default list could surprise users who see it appear in their model picker or wonder why it errors out. Was this intentional, or should it be gated behind a separate config opt-in?
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(cli): clarify completion cache timeo..." | Re-trigger Greptile |
| const DEFAULT_MAX_TOKENS = 8192; | ||
|
|
||
| // Copilot model ids vary by plan/org and can change. | ||
| // We keep this list intentionally broad; if a model isn't available Copilot will | ||
| // return an error and users can remove it from their config. | ||
| const DEFAULT_MODEL_IDS = [ | ||
| "claude-opus-4.7", | ||
| "claude-opus-4.7-1m-internal", |
There was a problem hiding this comment.
"internal" model ID in public defaults
claude-opus-4.7-1m-internal contains "internal" in its name, which normally signals a model not intended for broad user exposure. The comment above the list says unknown model IDs are harmless (Copilot returns an error and users can remove it), but surfacing an internal-tagged model in the default list could surprise users who see it appear in their model picker or wonder why it errors out. Was this intentional, or should it be gated behind a separate config opt-in?
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/github-copilot/models-defaults.ts
Line: 11
Comment:
**"internal" model ID in public defaults**
`claude-opus-4.7-1m-internal` contains "internal" in its name, which normally signals a model not intended for broad user exposure. The comment above the list says unknown model IDs are harmless (Copilot returns an error and users can remove it), but surfacing an internal-tagged model in the default list could surprise users who see it appear in their model picker or wonder why it errors out. Was this intentional, or should it be gated behind a separate config opt-in?
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
b3d54e7 to
4161b24
Compare
|
claude-opus-4.7-1m-internal is the exact id that GitHub Copilot's /models endpoint advertises (per the evidence in #72805 — the reporter pasted the upstream listing). It's labeled "Internal only" by Copilot's own UI, meaning it's gated by plan/org rather than a hidden test model. Following the file's existing policy — "Copilot model ids vary by plan/org and can change. We keep this list intentionally broad; if a model isn't available Copilot will return an error and users can remove it from their config." — including it is consistent with how o1 and other plan-restricted entries are already shipped. If maintainers prefer to gate it behind a separate opt-in, happy to split. |
ce9f837 to
c6b9e13
Compare
When the post-update completion cache refresh times out (slow disk, large bundled plugin tree, Docker overlayfs), the user previously saw the opaque 'Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT'. Detect ETIMEDOUT specifically, surface 'timed out after 30s', and append a manual refresh hint pointing at 'openclaw completion --write-state' so users know it's non-fatal and how to recover. Fixes openclaw#72842
c6b9e13 to
cec39ab
Compare
Fixes #72842
Summary
openclaw updatefinishes, it spawns a child process to refresh the shell completion cache with a 30-second timeout. On slower environments (Debian/Docker, slow disk, large bundled-plugin trees) that child can exceed 30s and is killed. The user seesCompletion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT— Node's raw error string with no context.tryWriteCompletionCache(src/cli/update-cli/shared.ts), when the spawned child returnsresult.error, detectETIMEDOUTand surface "timed out after 30s" instead of the rawError: spawnSync /usr/bin/node ETIMEDOUT. Both theresult.errorand the non-zeroresult.statusbranches now also append a single-line manual-refresh hint: "Shell tab-completion may be stale; refresh manually with: openclaw completion --write-state". Added a regression test insrc/cli/update-cli.test.tsthat asserts the friendly text is logged and the rawError: spawnSyncstring is not.spawnSynccall shape, theOPENCLAW_COMPLETION_SKIP_PLUGIN_COMMANDSenv var, the silent return on error (still non-fatal), JSON-mode silence, and the duplicate-but-separateCOMPLETION_CACHE_WRITE_TIMEOUT_MSconstant insrc/commands/doctor-completion.ts. Did not touch the underlying performance issue (fast plugin-load on cold start) — that is environment-specific and out of scope for a UX fix.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause
openclaw update,update-command.tscallstryWriteCompletionCache(postUpdateRoot, jsonMode). That function spawnsnode openclaw.mjs completion --write-statewith a 30-second timeout. When the child takes longer than 30s (slow disk, bundled plugin tree cold-load, Docker overlayfs), Node'sspawnSynckills the child and returns{error: SystemError(code: "ETIMEDOUT")}. The existing handler interpolates that error directly into the user-facing log:Completion cache update failed: ${String(result.error)}. The result string isError: spawnSync /usr/bin/node ETIMEDOUT— accurate but unhelpful.src/cli/update-cli.test.ts:534checked the spawn invocation arguments. So the message wording was never exercised.src/commands/doctor-completion.ts:19-41, but that path usesresult.status === 0only and doesn't surface the error to the user, so the unhelpful message was Update-only.Regression Test Plan
src/cli/update-cli.test.ts(in the samedescribeblock as "bounds completion cache refresh during update follow-up").spawnSyncreturns{error: <ETIMEDOUT>}(mocked once)."timed out after 30s"."openclaw completion --write-state"."Error: spawnSync"string (regression guard against falling back toString(result.error)).tryWriteCompletionCacheis a small async function with one external call (spawnSync). Mocking that single seam exercises every branch deterministically; no real subprocess is needed and the test runs in milliseconds.bounds completion cache refresh during update follow-uponly asserts the spawn invocation arguments, not the error path or message text.User-visible / Behavior Changes
--json) still suppresses the message entirely. Behavior for the success path is unchanged.openclaw updatestill reports overall success.Diagram
Security Impact
spawnSyncshape)Yes, explain risk + mitigation: N/ARepro + Verification
Environment
openclawglobalSteps
openclawvianpm install -g openclaw@latestin an environment where the bundled-plugin-tree cold-load takes >30s (e.g. Docker on slow disk).openclaw update.Expected
Completion cache update failed: timed out after 30s. Shell tab-completion may be stale; refresh manually with: openclaw completion --write-state.Actual (before fix)
Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT— opaque and alarming.Evidence
pnpm check:changedis green: conflict markers, typecheck, lint, runtime import cycles, plus the various core repo guard checks all pass.Human Verification (required)
src/cli/update-cli.test.ts(63/63 pass locally).pnpm check:changedgate (all lanes green, including the additional core guards that fire whensrc/cli/**is touched).result.errorwithcode: "ETIMEDOUT"→ friendly message with manual-refresh hint.result.errorwithoutcode: "ETIMEDOUT"→ falls back toString(result.error), still appends manual-refresh hint.result.status(child ran but exited with error) → keeps original(${stderr})detail and appends manual-refresh hint.jsonMode: true) → no message logged in any branch.status: 0, no error) → no message logged (unchanged).COMPLETION_CACHE_WRITE_TIMEOUT_MSinsrc/commands/doctor-completion.ts. That path doesn't surface a user-facing failure message and is therefore not affected by the bug; harmonizing the two constants would be a separate refactor and is out of scope for this PR.Review Conversations
(Both will be checked once review activity lands. Currently no bot review conversations on this PR.)
Compatibility / Migration
Risks and Mitigations
Error: spawnSync /usr/bin/node ETIMEDOUTstring will no longer match.--json), which is what scriptable callers use, is unchanged and still emits no completion-cache message at all.theme.warn(...)is a one-line follow-up.