fix(plugins): recover managed npm root after corrupt install failure (#85113)#86122
fix(plugins): recover managed npm root after corrupt install failure (#85113)#86122p3nchan wants to merge 2 commits into
Conversation
…penclaw#85113) On a managed npm root (~/.openclaw/npm), npm Arborist can crash with ERR_INVALID_ARG_TYPE and leave the root half-broken, so post-update official plugin sync fails and disables the plugin; subsequent installs keep hitting the same corrupted root. Add a generic, defensive recovery: on the observed corrupt-root npm error signature, quarantine only the managed root's own node_modules + lockfiles into _openclaw-quarantined-npm-roots/, rebuild once, and retry the install once. Fail closed (preserving the quarantine for inspection) otherwise. Not specialized for any plugin. Also stop printing the misleading "Also not a valid hook pack" message when the primary npm install failed.
…very failure
Address review: the managed-npm-root recovery rebuild-failure error
("managed npm root recovery failed after quarantining…") did not match the
hook-pack-fallback skip predicate, so the misleading "Also not a valid hook
pack" message could still print on that path. Add the recovery-failure prefix
to shouldSkipHookPackFallbackAfterPluginInstallError, add a CLI regression
test for the rebuild-failure path, and document the managed-root no-lock
quarantine assumption.
|
Codex review: needs real behavior proof before merge. Reviewed May 24, 2026, 12:33 PM ET / 16:33 UTC. Summary PR surface: Source +172, Tests +303. Total +475 across 7 files. Reproducibility: no. high-confidence live reproduction was established in this review. The linked issue has concrete logs and current-main source shows no corrupt-root rebuild path, while this PR proves the new behavior only with mocked npm failures. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Keep the bounded managed-root recovery, preserve documented hook-pack fallback semantics, and require redacted real install/update proof before merge. Do we have a high-confidence way to reproduce the issue? No high-confidence live reproduction was established in this review. The linked issue has concrete logs and current-main source shows no corrupt-root rebuild path, while this PR proves the new behavior only with mocked npm failures. Is this the best way to solve the issue? No, not yet. The recovery direction is maintainable, but returning before the hook-pack fallback is tried is not the safest compatible implementation for the documented unified installer. Full review comments:
Overall correctness: patch is incorrect Codex review notes: model gpt-5.5, reasoning high; reviewed against 79ee70c8ad8a. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +172, Tests +303. Total +475 across 7 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
Thanks for the PR. Before this can move forward, please add live proof from the affected surface, not just unit tests, mocked tests, or source inspection. A useful proof update should include:
Please redact secrets, tokens, phone numbers, and private message content from logs or screenshots. |
This comment was marked as spam.
This comment was marked as spam.
|
Thanks @p3nchan. I landed a reworked fix on main in a326faa10c. This PR was the closest report/fix shape for the npm managed-root corruption, but current main has since moved managed npm installs to per-plugin project roots, so the flat-root rebuild approach here was no longer the right owner boundary. The landed fix detects npm's ERR_INVALID_ARG_TYPE/"from" argument corruption signature in the per-plugin project, quarantines node_modules and lockfiles, retries once, and keeps rebuilt dependency scanning intact. It also separately covers the Windows native-module EPERM swap cleanup path by keeping the successful new install while surfacing preserved old backups for delayed cleanup. Proof:
Closing this PR as superseded by the landed main fix. |
Summary
Fixes #85113. On a managed npm root (
~/.openclaw/npm), npm's Arborist can crash withERR_INVALID_ARG_TYPEand leave the root half-broken. After that, post-update official plugin sync fails (anddisableOnFailuredisables the plugin), and subsequentplugins install --forcekeeps hitting the same corrupted root.This adds a generic, defensive recovery for the managed npm root, and removes a misleading message on the failure path.
What changed
src/infra/npm-managed-root.tsshouldRebuildManagedNpmRootAfterInstallFailure()— gates recovery on the specific corrupt-root npm signature only (requiresERR_INVALID_ARG_TYPEand"from" argumentandReceived undefinedto co-occur, so registry/network/auth failures don't trigger it).quarantineManagedNpmRootForRebuild()— moves only the managed root's ownnode_modules/package-lock.json/npm-shrinkwrap.jsoninto_openclaw-quarantined-npm-roots/corrupt-*(all destinations inside the managed root; usesfs.rename, which does not follow symlinks; no recursive delete). Root metadata (package.json,.npmrc, pack archives) is preserved.src/plugins/install.ts— on the corrupt signature: quarantine → rebuild once → retry install once. If rebuild or retry still fails, it fails closed, preserving the quarantine and surfacing both the quarantine path and the original npm error.src/cli/plugins-command-helpers.ts/plugins-install-command.ts— when the primary npm install (or the recovery rebuild) fails, no longer attempt the hook-pack fallback or print the misleadingAlso not a valid hook pack. Manifest-type failures still keep the hook-pack fallback.Scope / honesty
ERR_INVALID_ARG_TYPEcrash is live/package/root-state dependent — I did not reproduce it from source. This is framed as defensive recovery for a corrupt managed root; proof is via mocked failures exercising both the success (quarantine → rebuild → retry → success) and fail-closed (retry-after-recovery fails) paths.Tests
All run via
node scripts/run-vitest.mjs <file> -- --reporter=verbose:src/infra/npm-managed-root.test.ts— 20 passedsrc/plugins/install.npm-spec.test.ts— 39 passed (covers quarantine→rebuild→retry success and fail-closed)src/cli/plugins-cli.install.test.ts— 87 passed (incl. new regression: no hook-pack fallback / no misleading message on rebuild-failure)src/cli/update-cli.test.ts— 124 passedpnpm format:checkon touched files — cleanReviewed independently before submission (cross-model review), which caught and fixed a gap where the rebuild-failure error didn't match the hook-pack suppression predicate.