fix: detect and recover from incomplete bundled runtime deps staging#75310
fix: detect and recover from incomplete bundled runtime deps staging#75310scottgl9 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
|
Codex review: needs changes before merge. What this changes: The PR adds a bundled runtime-deps completion marker, checks gateway startup and Required change before merge: There is a concrete P2 bug in the PR branch that an automated repair can address in the runtime-deps installer and focused tests; check whether #75183 has merged before starting to avoid duplicate work. Security review: Security review cleared: The diff is limited to runtime-deps repair/installer logic, tests, and changelog; it adds no dependencies, workflow permissions, package sources, secret handling, or new execution surface beyond the existing npm/pnpm staging path. Review findings:
Review detailsBest possible solution: Keep the bug tracked until the runtime-deps repair path treats a missing completion marker as non-converged all the way through the installer, with regression coverage for the generated-manifest plus no-main package sentinel case in gateway startup and doctor repair. If #75183 lands first, this PR can be closed as superseded by that merged implementation. Do we have a high-confidence way to reproduce the issue? Yes. A source-level reproduction can create an install root with Is this the best way to solve the issue? No. Gateway startup and doctor repair are the right boundaries, and a post-success marker is viable, but the marker must be integrated with the installer/materialization fast path rather than checked only before calling repair. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 42d73fd955af. |
2d4175f to
2920631
Compare
When `openclaw update` times out during the post-install pnpm staging step (spawnSync ETIMEDOUT), node_modules is left in a corrupt/incomplete state. Package version sentinels (node_modules/<pkg>/package.json) may already exist with the correct version, but internal package files can be missing. The previous missing-dep scan used hasDependencySentinel, which checks only the version sentinel. Both prestageGatewayBundledRuntimeDeps (gateway startup) and maybeRepairBundledPluginRuntimeDeps (openclaw doctor --repair) had an early return that fired when missing.length === 0, even for a corrupt install. While the upstream refactor of isRuntimeDepSatisfied now also checks that the package main entry file exists (catching most corrupt cases), packages without a main field still slip through. This fix adds defense-in-depth by checking isRuntimeDepsPlanMaterialized, which verifies both the generated install manifest and all package entry files, so corrupt stages are caught regardless of whether a package declares a main field. Add hasPreviousIncompleteInstall(installRoot, installSpecs) to bundled-runtime-deps that checks node_modules exists AND !isRuntimeDepsPlanMaterialized. Both early-return sites now check this and treat all deps as missing when incomplete, forcing a clean reinstall. Confirmed on scottgl-aipc after v2026.4.29 upgrade with staging ETIMEDOUT.
2920631 to
a7c5c7d
Compare
|
Thanks for jumping on this. We ended up landing the broader runtime-deps repair in #75183, which closes #75309 and covers the partial pnpm tree recovery path from this PR, including the generated-manifest/no-main package hole noted in review. Since this branch is now conflicted and superseded by the merged fix, I’m closing it to keep the queue clean. |
Any ETA on when this will make it to the next release version of OpenClaw? I'm still encountering this issue. |
What bug this fixes
Closes #75309
When
openclaw updatetimes out during the post-install pnpm staging step (ETIMEDOUT),node_modulesis left in a corrupt/incomplete state. Both gateway startup (prestageGatewayBundledRuntimeDeps) andopenclaw doctor --repair(maybeRepairBundledPluginRuntimeDeps) have an early-return atmissing.length === 0that fires even when the install is corrupt, leaving channels crashing at runtime.Relationship to upstream refactor
The upstream refactor of
isRuntimeDepSatisfied(inbundled-runtime-deps-materialization.ts) now also checkshasInstalledRuntimeDepEntryFiles— verifying that the package'smainentry file exists on disk. This catches the specific dotenv case from the original bug report. However, packages that do not declare amainfield returntruefromhasInstalledRuntimeDepEntryFilesunconditionally, so a corrupt install of such a package still passes the sentinel check andmissing.length === 0returns early without repair.Fix
This PR adds defense-in-depth using
isRuntimeDepsPlanMaterialized— the same completeness check the repair function itself uses internally. It verifies both the generated install manifest (package.jsonwithname: "openclaw-runtime-deps-install") and that all package entry files satisfyhasSatisfiedInstallSpecPackages, so corrupt stages are caught regardless of whether a package declares amainfield.hasPreviousIncompleteInstall(installRoot, installSpecs)combines the check:node_modulesexists → there was a previous install attempt!isRuntimeDepsPlanMaterialized(installRoot, installSpecs)→ the install isn't actually completeAt both early-return sites: if incomplete, treat all
depsas missing to force a clean reinstall.Is this the best fix?
Yes.
isRuntimeDepsPlanMaterializedis the authoritative completeness check already used by the repair functions — using it to gate the early-return is the minimally invasive fix with the broadest coverage.Evidence
scottgl-aipcafter v2026.4.29 upgrade with staging timeout