Skip to content

fix: detect and recover from incomplete bundled runtime deps staging#75310

Closed
scottgl9 wants to merge 1 commit intoopenclaw:mainfrom
scottgl9:fix/incomplete-bundled-runtime-deps-staging
Closed

fix: detect and recover from incomplete bundled runtime deps staging#75310
scottgl9 wants to merge 1 commit intoopenclaw:mainfrom
scottgl9:fix/incomplete-bundled-runtime-deps-staging

Conversation

@scottgl9
Copy link
Copy Markdown
Contributor

@scottgl9 scottgl9 commented Apr 30, 2026

What bug this fixes

Closes #75309

When openclaw update times out during the post-install pnpm staging step (ETIMEDOUT), node_modules is left in a corrupt/incomplete state. Both gateway startup (prestageGatewayBundledRuntimeDeps) and openclaw doctor --repair (maybeRepairBundledPluginRuntimeDeps) have an early-return at missing.length === 0 that fires even when the install is corrupt, leaving channels crashing at runtime.

Relationship to upstream refactor

The upstream refactor of isRuntimeDepSatisfied (in bundled-runtime-deps-materialization.ts) now also checks hasInstalledRuntimeDepEntryFiles — verifying that the package's main entry file exists on disk. This catches the specific dotenv case from the original bug report. However, packages that do not declare a main field return true from hasInstalledRuntimeDepEntryFiles unconditionally, so a corrupt install of such a package still passes the sentinel check and missing.length === 0 returns early without repair.

Fix

This PR adds defense-in-depth using isRuntimeDepsPlanMaterialized — the same completeness check the repair function itself uses internally. It verifies both the generated install manifest (package.json with name: "openclaw-runtime-deps-install") and that all package entry files satisfy hasSatisfiedInstallSpecPackages, so corrupt stages are caught regardless of whether a package declares a main field.

hasPreviousIncompleteInstall(installRoot, installSpecs) combines the check:

  1. node_modules exists → there was a previous install attempt
  2. !isRuntimeDepsPlanMaterialized(installRoot, installSpecs) → the install isn't actually complete

At both early-return sites: if incomplete, treat all deps as missing to force a clean reinstall.

Is this the best fix?

Yes. isRuntimeDepsPlanMaterialized is the authoritative completeness check already used by the repair functions — using it to gate the early-return is the minimally invasive fix with the broadest coverage.

Evidence

  • Confirmed root cause and fix on scottgl-aipc after v2026.4.29 upgrade with staging timeout
  • All 7 gateway startup tests pass
  • All 29 doctor repair tests pass
  • New tests added for both code paths

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Codex review: needs changes before merge.

What this changes:

The PR adds a bundled runtime-deps completion marker, checks gateway startup and openclaw doctor --repair for node_modules without that marker, forces all detected deps through repair when incomplete state is detected, and adds regression tests plus a changelog entry.

Required change before merge:

There is a concrete P2 bug in the PR branch that an automated repair can address in the runtime-deps installer and focused tests; check whether #75183 has merged before starting to avoid duplicate work.

Security review:

Security review cleared: The diff is limited to runtime-deps repair/installer logic, tests, and changelog; it adds no dependencies, workflow permissions, package sources, secret handling, or new execution surface beyond the existing npm/pnpm staging path.

Review findings:

  • [P2] Honor missing completion markers in the installer — src/plugins/bundled-runtime-deps-install.ts:346-350
Review details

Best possible solution:

Keep the bug tracked until the runtime-deps repair path treats a missing completion marker as non-converged all the way through the installer, with regression coverage for the generated-manifest plus no-main package sentinel case in gateway startup and doctor repair. If #75183 lands first, this PR can be closed as superseded by that merged implementation.

Do we have a high-confidence way to reproduce the issue?

Yes. A source-level reproduction can create an install root with node_modules/<dep>/package.json for a package without main, a generated openclaw-runtime-deps-install manifest, missing real package contents, and no .install-complete; the PR calls repair, but the default installer can still return before reinstalling or writing the marker.

Is this the best way to solve the issue?

No. Gateway startup and doctor repair are the right boundaries, and a post-success marker is viable, but the marker must be integrated with the installer/materialization fast path rather than checked only before calling repair.

Full review comments:

  • [P2] Honor missing completion markers in the installer — src/plugins/bundled-runtime-deps-install.ts:346-350
    repairBundledRuntimeDepsInstallRootAsync writes the generated package.json before it calls installBundledRuntimeDepsAsync. For the corrupt no-main package state this PR is trying to recover from, that can make isRuntimeDepsPlanMaterialized(...) true, so the installer returns before reaching the new marker removal/reinstall code. The marker stays absent and the corrupt node_modules tree is reused; gate the materialized fast path on the completion marker or force install when the marker is missing.
    Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm test src/plugins/bundled-runtime-deps.test.ts src/commands/doctor-bundled-plugin-runtime-deps.test.ts src/gateway/server-startup-plugins.test.ts
  • pnpm exec oxfmt --check --threads=1 src/plugins/bundled-runtime-deps-install.ts src/plugins/bundled-runtime-deps.ts src/plugins/bundled-runtime-deps-materialization.ts src/plugins/bundled-runtime-deps.test.ts src/commands/doctor-bundled-plugin-runtime-deps.ts src/commands/doctor-bundled-plugin-runtime-deps.test.ts src/gateway/server-startup-plugins.ts src/gateway/server-startup-plugins.test.ts CHANGELOG.md

What I checked:

Likely related people:

  • steipete: Current-main blame for the central runtime-deps materialization, installer, gateway startup, and doctor repair paths points to Peter Steinberger, and the related open fix: simplify bundled runtime dependency repair #75183 by steipete is an active maintainer-side refactor for the same bundled runtime-deps repair behavior. (role: recent maintainer and adjacent owner; confidence: medium; commits: 4987482e4c54; files: src/plugins/bundled-runtime-deps.ts, src/plugins/bundled-runtime-deps-install.ts, src/plugins/bundled-runtime-deps-materialization.ts)

Remaining risk / open question:

Codex review notes: model gpt-5.5, reasoning high; reviewed against 42d73fd955af.

@scottgl9 scottgl9 force-pushed the fix/incomplete-bundled-runtime-deps-staging branch 2 times, most recently from 2d4175f to 2920631 Compare May 1, 2026 00:43
When `openclaw update` times out during the post-install pnpm staging step
(spawnSync ETIMEDOUT), node_modules is left in a corrupt/incomplete state.
Package version sentinels (node_modules/<pkg>/package.json) may already exist
with the correct version, but internal package files can be missing.

The previous missing-dep scan used hasDependencySentinel, which checks only
the version sentinel. Both prestageGatewayBundledRuntimeDeps (gateway startup)
and maybeRepairBundledPluginRuntimeDeps (openclaw doctor --repair) had an early
return that fired when missing.length === 0, even for a corrupt install.

While the upstream refactor of isRuntimeDepSatisfied now also checks that the
package main entry file exists (catching most corrupt cases), packages without
a main field still slip through. This fix adds defense-in-depth by checking
isRuntimeDepsPlanMaterialized, which verifies both the generated install manifest
and all package entry files, so corrupt stages are caught regardless of whether
a package declares a main field.

Add hasPreviousIncompleteInstall(installRoot, installSpecs) to
bundled-runtime-deps that checks node_modules exists AND
!isRuntimeDepsPlanMaterialized. Both early-return sites now check this and
treat all deps as missing when incomplete, forcing a clean reinstall.

Confirmed on scottgl-aipc after v2026.4.29 upgrade with staging ETIMEDOUT.
@scottgl9 scottgl9 force-pushed the fix/incomplete-bundled-runtime-deps-staging branch from 2920631 to a7c5c7d Compare May 1, 2026 05:59
@steipete
Copy link
Copy Markdown
Contributor

steipete commented May 1, 2026

Thanks for jumping on this. We ended up landing the broader runtime-deps repair in #75183, which closes #75309 and covers the partial pnpm tree recovery path from this PR, including the generated-manifest/no-main package hole noted in review. Since this branch is now conflicted and superseded by the merged fix, I’m closing it to keep the queue clean.

@steipete steipete closed this May 1, 2026
@scottgl9
Copy link
Copy Markdown
Contributor Author

scottgl9 commented May 2, 2026

Thanks for jumping on this. We ended up landing the broader runtime-deps repair in #75183, which closes #75309 and covers the partial pnpm tree recovery path from this PR, including the generated-manifest/no-main package hole noted in review. Since this branch is now conflicted and superseded by the merged fix, I’m closing it to keep the queue clean.

Any ETA on when this will make it to the next release version of OpenClaw? I'm still encountering this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: gateway startup and doctor --repair miss corrupt bundled runtime deps from interrupted staging

2 participants