fix(plugins): canonicalize packageRoot before hashing runtime-deps stage key#75048
Conversation
|
Codex review: needs maintainer review before merge. What this changes: The PR canonicalizes the bundled runtime-deps stage-key hash by realpath-normalizing packageRoot, adds symlink regression and release-check coverage, and adds a changelog entry for the Windows bundled-channel ENOENT fix. Maintainer follow-up before merge: This is a protected maintainer PR with no discrete automated repair finding; the next action is normal maintainer review, targeted validation, and merge or requested changes. Security review: Security review cleared: The diff only changes local filesystem path canonicalization, tests, and changelog text, with no dependency, workflow, secret, package-resolution, or code-execution surface added. Review detailsBest possible solution: Land the localized realpath-based stage-key convergence after maintainer review and targeted runtime-deps validation, keeping the fix in bundled runtime-deps hashing instead of broad channel or loader path rewrites. Do we have a high-confidence way to reproduce the issue? Yes. Current main still hashes path.resolve(packageRoot), and the PR's symlinked packageRoot test is a focused unit reproduction for two views of the same install; the live Windows PM2 path was not run in this read-only review. Is this the best way to solve the issue? Yes. Reusing the existing realpathOrResolve helper for the hash input is the narrowest maintainable fix for converging the loader and bundled-channel stage directories, with release-check expectations adjusted to the new canonical behavior. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 3c4851037b6c. |
1ba4226 to
6a3622c
Compare
ebe9835 to
324859f
Compare
|
Merged via squash.
Thanks @openperf and @vincentkoc ! |
|
Merged as squash commit |
…age key (openclaw#75048) Merged via squash. Prepared head SHA: 324859f Co-authored-by: openperf <80630709+openperf@users.noreply.github.com> Co-authored-by: openperf <80630709+openperf@users.noreply.github.com> Reviewed-by: @openperf
…age key (openclaw#75048) Merged via squash. Prepared head SHA: 324859f Co-authored-by: openperf <80630709+openperf@users.noreply.github.com> Co-authored-by: openperf <80630709+openperf@users.noreply.github.com> Reviewed-by: @openperf
Summary
ENOENTon shared dist chunks under~/.openclaw/plugin-runtime-deps/openclaw-2026.4.27-<hash>/dist/*.js(e.g.channel-reply-pipeline-DIVeGRyD.js,channel-options-UH47ik5d.js,cache-controls-DyMccij0.js). Different processes — and even different code paths inside the same process — derive different<hash>values for the same physical install, so loaders look up files in a stage directory that was populated under a different key. Site of the divergence:src/plugins/bundled-runtime-deps-roots.ts:55-57,createPathHash, which hashespath.resolve(value)instead of the canonical filesystem path.createPathHashdid not canonicalize thepackageRootto its OS-real path before hashing — it only normalized lexically viapath.resolve. The two consumers of this hash arrive with different lexical forms of the same physical directory:src/plugins/loader.ts:1419producespluginRoot = safeRealpathOrResolve(candidate.rootDir)which already resolves symlinks viafs.realpathSync. Walking up topackageRootyields the realpath form (e.g.C:\Users\<u>\.npm\node_modules\openclaw).src/channels/plugins/bundled.ts:206-211callsresolveBundledChannelBoundaryRootwithparams.rootScope.packageRoot, which originates fromOPENCLAW_PACKAGE_ROOTinsrc/channels/plugins/bundled-root.tsand is not realpath-normalized. Walking up yields the symlinked form (e.g.C:\Users\<u>\AppData\Roaming\npm\node_modules\openclaw).Because
path.resolveonly normalizes path components — it does not resolve symlinks, junctions, or drive-letter casing — the same physical install hashes to two different 12-char keys (matching the two values reported in [BUG] 4.27 multi-instance: plugin-runtime-deps hash mismatch causes ENOENT on bundled channel dist files #74963:5f4e2e59ed9aand456555aaef5c). The loader stages dist chunks into hash-A's directory; the channel loader then opens hash-B and getsENOENT. PM2 multi-instance amplifies the symptom because each worker can independently produce a third lexical form via its argv1/cwd context, but the bug also reproduces single-instance whenever an npm symlink, Windows junction, or drive-letter casing variant is in play.packageRootinstead of the lexical form.createPathHashnow delegates path normalization to the file-localrealpathOrResolvehelper (src/plugins/bundled-runtime-deps-roots.ts:240), which callsfs.realpathSync.native(value)and falls back topath.resolve(value)only when the path does not yet exist. This is the same canonicalization already used byresolveExistingExternalBundledRuntimeDepsRootsto detect already-staged package roots, so the file's two normalization sites are now consistent. Side-effect surface stays small:createPathHashis reachable from exactly one caller (resolveExternalBundledRuntimeDepsInstallRoots, line 213); no other call site re-implements the hash format. Existing 4.27 stage directories created with the old (lexical) hash become orphaned on disk after upgrade — they are not referenced by the new key but cause no functional regression; first start after upgrade re-stages once into the canonical directory and all subsequent starts (including all PM2 workers) share it. The fallback topath.resolvepreserves the old behavior in the degenerate case where the package root has been deleted between resolution and hashing.src/plugins/bundled-runtime-deps-roots.ts:createPathHashnow hashesrealpathOrResolve(value)instead ofpath.resolve(value); one-line behavioral change plus a two-line comment explaining why.src/plugins/bundled-runtime-deps.test.ts: new regression teststages bundled runtime deps to the same root for symlinked packageRoot views (issue #74963)— creates a real package root and a sibling symlink to it, callsresolveBundledRuntimeDependencyInstallRootvia both views, and asserts the two install roots are byte-equal and match theopenclaw-<version>-<12-hex>shape. Skipped onwin32to follow the existingitSupportsSymlinksprecedent at line 2938 (Windows symlink creation requires admin in CI).CHANGELOG.mdentry — left for maintainers to place under the active release section per their preferred phrasing/credit format.realpathOrResolveitself, the file lock implementation (bundled-runtime-deps-lock.ts), the dist-mirror cache (bundled-runtime-dist-mirror-cache.ts), the channel loader (src/channels/plugins/bundled.ts), or the manifest discovery flow (src/plugins/loader.ts). The fix is intentionally confined to the hashing seam; it does not also "fix" the lexical inconsistency between the loader and channel call sites because doing so would broaden the surface (hot import paths persrc/channels/CLAUDE.md) without changing the observable outcome — converging the hash already converges the stage directory.path.resolve(installRoot)hash construction inside the test atbundled-runtime-deps.test.ts:2444. That value is computed only as the right-hand side of a.not.toBeassertion (line 2450), so the assertion still holds: the resolver returns the canonicalinstallRoot, which is not equal to any hypothetical fallback root — regardless of whether the fallback is computed lexically or canonically.any, no behavioral change for source-checkout layouts (the source-checkout branch inresolveBundledRuntimeDependencyInstallRootPlan/resolveBundledRuntimeDependencyPackageInstallRootPlanshort-circuits before reachingcreatePathHash).openclaw-2026.4.27-*) so the existing prune logic atpruneUnknownBundledRuntimeDepsRoots(which targetsopenclaw-unknown-*) leaves them alone — same as any prior version's stage directory. Disk cost is bounded and one-time.Reproduction
[channels] failed to load bundled channel <id>: ENOENTfor several channels (Discord, Feishu, Google Chat, iMessage, LINE, IRC).~/.openclaw/plugin-runtime-deps/: two or more directories namedopenclaw-2026.4.27-<12-hex>exist with the same version prefix but different hashes. The chunks named in the ENOENT messages exist under one hash but the failing process is reading from the other.After applying this PR, only one such directory exists per version and all instances/call sites resolve to it.
Equivalent unit-level reproduction (already wired in this PR):
pnpm test src/plugins/bundled-runtime-deps.test.ts -t "issue #74963"— fails onmain(two install roots), passes after the fix.Risk / Mitigation
Mitigation: behavior is identical to any major version's stage directory rotation, no operator action is required. The orphaned directory is version-prefixed so it does not collide with any future hash and does not get pruned by the
openclaw-unknown-*cleanup; operators who care about disk usage canrm -rfit manually or wait for the next major version's natural rotation.realpathSync.nativefailing on a path that does exist: would silently fall back topath.resolve, reproducing the old (buggy) hash. In practicerealpathSync.nativeonly fails on ENOENT/EACCES/loop-detection; thepackageRoothere is the parent of the dist tree we just enumerated, so it is guaranteed to exist at this point in the flow.Mitigation: the fallback preserves the pre-fix behavior, so failure mode is "no improvement", not "new regression". The added regression test exercises the realpath path on Linux/macOS; the Windows-skipped path still benefits from realpath's native libuv resolver in production where
realpathSync.nativeis the canonical Windows realpath./tmp→/private/tmpsymlinking: the new test creates its own intra-tempdir symlink (linkedPackageRoot→realPackageRoot), so it does not rely on/tmpbeing or not being a symlink — both sides go through the samerealpathSync.native.Mitigation: tested mentally against macOS
/var/folders/.../T(os.tmpdir()default) and Linux/tmp; the test asserts equality between two resolver calls, both of which canonicalize identically.fs.realpathSync.nativestat percreatePathHashcall.Mitigation:
createPathHashis invoked only insideresolveExternalBundledRuntimeDepsInstallRoots, which runs at plugin discovery time (cold start), not on the hot request path. Negligible measured impact.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Fixes #74963