Skip to content

[Bug]: @openclaw/codex peerDependency link not created on install — gateway lanes crash with ERR_MODULE_NOT_FOUND for openclaw/plugin-sdk/* #77959

@LexiPrime

Description

@LexiPrime

Bug type

Regression (worked before, now fails) — same packaging-regression family as #77896 (Matrix) and the cluster of closed issues #71484, #74692, #74899, #75032, #75056, #75206.

Summary

After updating to OpenClaw 2026.5.4 on a Homebrew-managed install, every main-session lane crashes immediately with ERR_MODULE_NOT_FOUND because @openclaw/codex@2026.5.3 cannot resolve openclaw/plugin-sdk/* at runtime. The plugin declares peerDependencies: { "openclaw": ">=2026.5.3" } correctly, but linkOpenClawPeerDependencies() did not create the expected node_modules/openclaw symlink inside the installed plugin directory, so Node's resolver has nothing to walk to.

The result is total main-channel blackout: Telegram, Mission Control chat, and the gateway Control UI all stop receiving/sending messages even though the gateway process itself reports healthy.

Steps to reproduce

  1. Install OpenClaw 2026.5.4 via Homebrew (/opt/homebrew/lib/node_modules/openclaw).
  2. Have @openclaw/codex installed as a plugin into the default plugin npm root (~/.openclaw/npm/node_modules/@openclaw/codex, version 2026.5.3).
  3. Restart the gateway.
  4. Watch any main-session lane fail.

Expected behavior

Plugin install (or post-update reconciliation) should symlink the host openclaw package into the plugin's local node_modules/ per src/plugins/plugin-peer-link.ts → linkOpenClawPeerDependencies(). Main session lanes should boot normally.

Actual behavior

lane task error:
  lane=session:agent:main:main durationMs=663
  error="Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'openclaw'
   imported from /Users/<user>/.openclaw/npm/node_modules/@openclaw/codex/dist/client-chGfNrq5.js"

Filesystem state immediately after the failure:

$ ls -la ~/.openclaw/npm/node_modules/@openclaw/codex/node_modules/
total 0
drwx------  3 user  staff   96 May  5 12:15 .
drwx------  6 user  staff  192 May  4 19:45 ..
drwx------  2 user  staff   64 May  5 10:32 .bin
# NOTE: no `openclaw` symlink — peer link was never written.

@openclaw/codex@2026.5.3 package.json (relevant slice) — peer dep IS declared:

{
  "name": "@openclaw/codex",
  "version": "2026.5.3",
  "type": "module",
  "peerDependencies": { "openclaw": ">=2026.5.3" },
  "devDependencies": { "@openclaw/plugin-sdk": "workspace:*" }
}

The codex plugin's compiled client imports the host SDK by sibling-package name, e.g.:

from "openclaw/plugin-sdk/agent-harness-runtime"
from "openclaw/plugin-sdk/windows-spawn"

…which can only resolve if openclaw is reachable from the plugin's node_modules walk. With the link missing, every import path explodes.

Local workaround

Manually creating the link that linkOpenClawPeerDependencies() was supposed to create restores everything immediately:

ln -s /opt/homebrew/lib/node_modules/openclaw \
      ~/.openclaw/npm/node_modules/openclaw
openclaw gateway restart

(Placing the link at the npm-root level rather than per-plugin works because Node walks ancestor node_modules directories. Per-plugin under @openclaw/codex/node_modules/openclaw works equally.)

To make the workaround survive openclaw update, we re-assert the symlink at the top of our recovery-check cron and a post-upgrade-verify script. With those guards, recovery is automatic within ~5 minutes after any future regression of the same shape.

Diagnosis

Looking at src/plugins/plugin-peer-link.ts:

const hostRoot = resolveOpenClawPackageRootSync({
  argv1: process.argv[1],
  moduleUrl: import.meta.url,
  cwd: process.cwd(),
});
if (!hostRoot) {
  params.logger.warn?.(
    "Could not locate openclaw package root to symlink peerDependencies; ..."
  );
  return;
}

resolveOpenClawPackageRootSync (in src/infra/openclaw-root.ts) walks argv1, moduleUrl, and cwd ancestors looking for a package.json whose name === "openclaw". Two plausible failure modes during plugin install in a Homebrew layout:

  1. npm install runs in a child process whose argv1 is the npm CLI (not the OpenClaw binary), moduleUrl is inside npm's own dist, and cwd is the plugin npm root (~/.openclaw/npm) — none of those ancestor walks lead to /opt/homebrew/lib/node_modules/openclaw. hostRoot resolves to null, the warning is emitted, and the link is silently skipped.
  2. An earlier successful link pointed to a previous Homebrew openclaw directory that was then removed during update, leaving a dangling/missing symlink that no later step rewrites.

Either way the install-time silent skip means a healthy update can leave the system in a hard-broken state with no obvious error other than lane crashes after restart.

Suggested fixes (any one would help; combining is best)

  1. Make linkOpenClawPeerDependencies resilient to detached install contexts. When argv1/moduleUrl walks fail, fall back to:
    • the parent process's executable path,
    • npm root -g + Homebrew's /opt/homebrew/lib/node_modules,
    • an explicit OPENCLAW_HOST_ROOT env var the host can pre-set when shelling out to npm install.
  2. Promote the silent skip to a hard failure when a plugin declares peerDependencies.openclaw and the host root can't be located. Today this is a logger.warn?. that nobody sees in the wild — that's how regressions like [Bug]: Matrix channel missing matrix-js-sdk after 2026.5.4 host npm update #77896 and this one ship without anyone noticing pre-release.
  3. Re-run linkOpenClawPeerDependencies on gateway boot for every installed plugin that declares it (cheap idempotent symlink check), so the system self-heals after a Homebrew openclaw upgrade that moves/replaces the host root.
  4. Add an installer-side smoke test that, after npm install completes for a plugin with peerDependencies.openclaw, asserts the link exists and points at a real directory before declaring install success.

Environment

  • OpenClaw: 2026.5.4 (325df3e)
  • Plugin: @openclaw/codex@2026.5.3
  • Node: v25.8.2
  • OS: macOS (Apple Silicon)
  • Install method: Homebrew global (/opt/homebrew/lib/node_modules/openclaw)
  • Plugin root: ~/.openclaw/npm/node_modules/
  • Gateway port: 18789 (loopback)
  • openclaw gateway status after the fix: Connectivity probe: ok

Related

This appears to be a recurring class bug rather than a one-off — strong candidate for the "promote silent skip to hard failure + boot-time re-link" pattern above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions