Skip to content

fix(plugins): canonicalize packageRoot before hashing runtime-deps stage key#75048

Merged
openperf merged 5 commits intoopenclaw:mainfrom
openperf:fix/74963-runtime-deps-realpath-hash
May 1, 2026
Merged

fix(plugins): canonicalize packageRoot before hashing runtime-deps stage key#75048
openperf merged 5 commits intoopenclaw:mainfrom
openperf:fix/74963-runtime-deps-realpath-hash

Conversation

@openperf
Copy link
Copy Markdown
Member

Summary

  • Problem: On Windows multi-instance deployments (5x PM2 workers, OpenClaw 2026.4.27, Node v22.x), bundled channels fail to load with ENOENT on shared dist chunks under ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.27-<hash>/dist/*.js (e.g. channel-reply-pipeline-DIVeGRyD.js, channel-options-UH47ik5d.js, cache-controls-DyMccij0.js). Different processes — and even different code paths inside the same process — derive different <hash> values for the same physical install, so loaders look up files in a stage directory that was populated under a different key. Site of the divergence: src/plugins/bundled-runtime-deps-roots.ts:55-57, createPathHash, which hashes path.resolve(value) instead of the canonical filesystem path.
  • Root Cause: createPathHash did not canonicalize the packageRoot to its OS-real path before hashing — it only normalized lexically via path.resolve. The two consumers of this hash arrive with different lexical forms of the same physical directory:
    • Loader path: src/plugins/loader.ts:1419 produces pluginRoot = safeRealpathOrResolve(candidate.rootDir) which already resolves symlinks via fs.realpathSync. Walking up to packageRoot yields the realpath form (e.g. C:\Users\<u>\.npm\node_modules\openclaw).
    • Bundled-channel path: src/channels/plugins/bundled.ts:206-211 calls resolveBundledChannelBoundaryRoot with params.rootScope.packageRoot, which originates from OPENCLAW_PACKAGE_ROOT in src/channels/plugins/bundled-root.ts and is not realpath-normalized. Walking up yields the symlinked form (e.g. C:\Users\<u>\AppData\Roaming\npm\node_modules\openclaw).
      Because path.resolve only normalizes path components — it does not resolve symlinks, junctions, or drive-letter casing — the same physical install hashes to two different 12-char keys (matching the two values reported in [BUG] 4.27 multi-instance: plugin-runtime-deps hash mismatch causes ENOENT on bundled channel dist files #74963: 5f4e2e59ed9a and 456555aaef5c). The loader stages dist chunks into hash-A's directory; the channel loader then opens hash-B and gets ENOENT. PM2 multi-instance amplifies the symptom because each worker can independently produce a third lexical form via its argv1/cwd context, but the bug also reproduces single-instance whenever an npm symlink, Windows junction, or drive-letter casing variant is in play.
  • Fix: Hash the OS-canonical (realpath) form of packageRoot instead of the lexical form. createPathHash now delegates path normalization to the file-local realpathOrResolve helper (src/plugins/bundled-runtime-deps-roots.ts:240), which calls fs.realpathSync.native(value) and falls back to path.resolve(value) only when the path does not yet exist. This is the same canonicalization already used by resolveExistingExternalBundledRuntimeDepsRoots to detect already-staged package roots, so the file's two normalization sites are now consistent. Side-effect surface stays small: createPathHash is reachable from exactly one caller (resolveExternalBundledRuntimeDepsInstallRoots, line 213); no other call site re-implements the hash format. Existing 4.27 stage directories created with the old (lexical) hash become orphaned on disk after upgrade — they are not referenced by the new key but cause no functional regression; first start after upgrade re-stages once into the canonical directory and all subsequent starts (including all PM2 workers) share it. The fallback to path.resolve preserves the old behavior in the degenerate case where the package root has been deleted between resolution and hashing.
  • What changed:
    • src/plugins/bundled-runtime-deps-roots.ts: createPathHash now hashes realpathOrResolve(value) instead of path.resolve(value); one-line behavioral change plus a two-line comment explaining why.
    • src/plugins/bundled-runtime-deps.test.ts: new regression test stages bundled runtime deps to the same root for symlinked packageRoot views (issue #74963) — creates a real package root and a sibling symlink to it, calls resolveBundledRuntimeDependencyInstallRoot via both views, and asserts the two install roots are byte-equal and match the openclaw-<version>-<12-hex> shape. Skipped on win32 to follow the existing itSupportsSymlinks precedent at line 2938 (Windows symlink creation requires admin in CI).
  • What did NOT change (scope boundary):
    • No CHANGELOG.md entry — left for maintainers to place under the active release section per their preferred phrasing/credit format.
    • No change to realpathOrResolve itself, the file lock implementation (bundled-runtime-deps-lock.ts), the dist-mirror cache (bundled-runtime-dist-mirror-cache.ts), the channel loader (src/channels/plugins/bundled.ts), or the manifest discovery flow (src/plugins/loader.ts). The fix is intentionally confined to the hashing seam; it does not also "fix" the lexical inconsistency between the loader and channel call sites because doing so would broaden the surface (hot import paths per src/channels/CLAUDE.md) without changing the observable outcome — converging the hash already converges the stage directory.
    • No change to the existing path.resolve(installRoot) hash construction inside the test at bundled-runtime-deps.test.ts:2444. That value is computed only as the right-hand side of a .not.toBe assertion (line 2450), so the assertion still holds: the resolver returns the canonical installRoot, which is not equal to any hypothetical fallback root — regardless of whether the fallback is computed lexically or canonically.
    • No new env vars, no new public API, no new types, no any, no behavioral change for source-checkout layouts (the source-checkout branch in resolveBundledRuntimeDependencyInstallRootPlan/resolveBundledRuntimeDependencyPackageInstallRootPlan short-circuits before reaching createPathHash).
    • No deprecation, no migration shim. Existing stage dirs from the buggy hash become disk-resident orphans; they are version-prefixed (openclaw-2026.4.27-*) so the existing prune logic at pruneUnknownBundledRuntimeDepsRoots (which targets openclaw-unknown-*) leaves them alone — same as any prior version's stage directory. Disk cost is bounded and one-time.

Reproduction

  1. On Windows, run 5 OpenClaw 2026.4.27 gateway instances under PM2 against the same install (or, simpler, expose the install via two paths — one canonical and one through an npm/Windows junction).
  2. Boot the gateway; observe [channels] failed to load bundled channel <id>: ENOENT for several channels (Discord, Feishu, Google Chat, iMessage, LINE, IRC).
  3. Inspect ~/.openclaw/plugin-runtime-deps/: two or more directories named openclaw-2026.4.27-<12-hex> exist with the same version prefix but different hashes. The chunks named in the ENOENT messages exist under one hash but the failing process is reading from the other.

After applying this PR, only one such directory exists per version and all instances/call sites resolve to it.

Equivalent unit-level reproduction (already wired in this PR): pnpm test src/plugins/bundled-runtime-deps.test.ts -t "issue #74963" — fails on main (two install roots), passes after the fix.

Risk / Mitigation

  • RiskStage-directory churn on upgrade: existing 2026.4.27 deployments will create one new realpath-keyed stage directory on first run and leave the old lexical-keyed directory(ies) behind. This is a one-time disk cost (single-digit GB at worst, dependent on the bundled plugin runtime-deps tree size); subsequent starts share the new directory.
    Mitigation: behavior is identical to any major version's stage directory rotation, no operator action is required. The orphaned directory is version-prefixed so it does not collide with any future hash and does not get pruned by the openclaw-unknown-* cleanup; operators who care about disk usage can rm -rf it manually or wait for the next major version's natural rotation.
  • RiskrealpathSync.native failing on a path that does exist: would silently fall back to path.resolve, reproducing the old (buggy) hash. In practice realpathSync.native only fails on ENOENT/EACCES/loop-detection; the packageRoot here is the parent of the dist tree we just enumerated, so it is guaranteed to exist at this point in the flow.
    Mitigation: the fallback preserves the pre-fix behavior, so failure mode is "no improvement", not "new regression". The added regression test exercises the realpath path on Linux/macOS; the Windows-skipped path still benefits from realpath's native libuv resolver in production where realpathSync.native is the canonical Windows realpath.
  • RiskTest flakiness on macOS due to /tmp/private/tmp symlinking: the new test creates its own intra-tempdir symlink (linkedPackageRootrealPackageRoot), so it does not rely on /tmp being or not being a symlink — both sides go through the same realpathSync.native.
    Mitigation: tested mentally against macOS /var/folders/.../T (os.tmpdir() default) and Linux /tmp; the test asserts equality between two resolver calls, both of which canonicalize identically.
  • RiskPerformance regression: one fs.realpathSync.native stat per createPathHash call.
    Mitigation: createPathHash is invoked only inside resolveExternalBundledRuntimeDepsInstallRoots, which runs at plugin discovery time (cold start), not on the hot request path. Negligible measured impact.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Plugins
  • Bundled runtime deps
  • Channels (downstream consumer of the converged stage directory)

Linked Issue/PR

Fixes #74963

@openclaw-barnacle openclaw-barnacle Bot added size: XS maintainer Maintainer-authored PR labels Apr 30, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 30, 2026

Codex review: needs maintainer review before merge.

What this changes:

The PR canonicalizes the bundled runtime-deps stage-key hash by realpath-normalizing packageRoot, adds symlink regression and release-check coverage, and adds a changelog entry for the Windows bundled-channel ENOENT fix.

Maintainer follow-up before merge:

This is a protected maintainer PR with no discrete automated repair finding; the next action is normal maintainer review, targeted validation, and merge or requested changes.

Security review:

Security review cleared: The diff only changes local filesystem path canonicalization, tests, and changelog text, with no dependency, workflow, secret, package-resolution, or code-execution surface added.

Review details

Best possible solution:

Land the localized realpath-based stage-key convergence after maintainer review and targeted runtime-deps validation, keeping the fix in bundled runtime-deps hashing instead of broad channel or loader path rewrites.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main still hashes path.resolve(packageRoot), and the PR's symlinked packageRoot test is a focused unit reproduction for two views of the same install; the live Windows PM2 path was not run in this read-only review.

Is this the best way to solve the issue?

Yes. Reusing the existing realpathOrResolve helper for the hash input is the narrowest maintainable fix for converging the loader and bundled-channel stage directories, with release-check expectations adjusted to the new canonical behavior.

Acceptance criteria:

What I checked:

  • Protected maintainer handling: The provided GitHub context shows authorAssociation MEMBER and the protected maintainer label, so this PR should stay open for explicit maintainer handling. (ebe9835787a6)
  • Current main still hashes lexical packageRoot: createPathHash on current main hashes path.resolve(value), so symlinked, junctioned, or differently cased views of the same package root can still produce different stage keys. (src/plugins/bundled-runtime-deps-roots.ts:55, 3c4851037b6c)
  • Existing local canonicalization helper: The same file already uses realpathOrResolve for existing external runtime-deps root detection, and that helper calls fs.realpathSync.native with a path.resolve fallback. (src/plugins/bundled-runtime-deps-roots.ts:222, 3c4851037b6c)
  • Loader/channel path divergence is plausible: The plugin loader realpaths bundled candidate roots while bundled channel loading passes rootScope.packageRoot into boundary/runtime-deps preparation, matching the PR's described two lexical views. (src/plugins/loader.ts:1419, 3c4851037b6c)
  • PR patch scope is narrow: The remote PR patch changes only the hash input/comment, a bundled runtime-deps symlink regression test, release-check expectations, and one changelog entry; it does not touch workflows, dependencies, lockfiles, install scripts, or publishing metadata. (src/plugins/bundled-runtime-deps-roots.ts:55, ebe9835787a6)
  • Regression coverage targets the reported root cause: The PR adds a unit test that resolves runtime-deps install roots through a real package root and a symlinked package root, then asserts both paths use the same openclaw-- stage directory. (src/plugins/bundled-runtime-deps.test.ts:2645, ebe9835787a6)

Likely related people:

  • steipete: Current-main blame points the runtime-deps resolver, realpath helper, loader/channel path surfaces, release-check sentinel logic, and related tests to Peter Steinberger's recent runtime-deps split/import work; Peter also authored the current main runtime-deps performance change and latest release/changelog surfaces. (role: introduced behavior and recent maintainer; confidence: high; commits: 65c94df872b9, 3c4851037b6c, a448042c2edd; files: src/plugins/bundled-runtime-deps-roots.ts, src/plugins/bundled-runtime-deps.test.ts, scripts/release-check.ts)
  • vincentkoc: Vincent Koc authored recent merged runtime-deps test work on current main, authored the PR's release-check alignment commit, and is assigned in the PR timeline. (role: recent adjacent maintainer and assigned routing owner; confidence: high; commits: e311ffdcb94e, 48a9c2254c7d; files: src/plugins/bundled-runtime-deps.test.ts, test/release-check.test.ts)

Remaining risk / open question:

  • Targeted tests and the live Windows PM2 reproduction were not run during this read-only review, so merge proof still depends on CI or maintainer-run validation.
  • The new symlink regression test skips win32 setup, so Windows junction and drive-casing behavior is supported by code reasoning around fs.realpathSync.native rather than direct local execution here.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3c4851037b6c.

@openperf openperf force-pushed the fix/74963-runtime-deps-realpath-hash branch from ebe9835 to 324859f Compare May 1, 2026 01:55
@openperf openperf merged commit 4b98f09 into openclaw:main May 1, 2026
90 checks passed
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 1, 2026

Merged via squash.

Thanks @openperf and @vincentkoc !

@openperf openperf deleted the fix/74963-runtime-deps-realpath-hash branch May 1, 2026 01:56
@openperf
Copy link
Copy Markdown
Member Author

openperf commented May 1, 2026

Merged as squash commit 4b98f0952934c26a24a25fb5466920e6542ea61f on main. Changelog entry added to ## Unreleased.

lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
…age key (openclaw#75048)

Merged via squash.

Prepared head SHA: 324859f
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Reviewed-by: @openperf
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…age key (openclaw#75048)

Merged via squash.

Prepared head SHA: 324859f
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Reviewed-by: @openperf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] 4.27 multi-instance: plugin-runtime-deps hash mismatch causes ENOENT on bundled channel dist files

2 participants