Skip to content

docker: fix self-build breakage after extension migration#48523

Closed
gbballpack wants to merge 1 commit into
openclaw:mainfrom
gbballpack:fix/docker-self-build-extension-runtime
Closed

docker: fix self-build breakage after extension migration#48523
gbballpack wants to merge 1 commit into
openclaw:mainfrom
gbballpack:fix/docker-self-build-extension-runtime

Conversation

@gbballpack

@gbballpack gbballpack commented Mar 16, 2026

Copy link
Copy Markdown

docker: fix self-build breakage after extension migration

Summary

Fix two Docker self-build issues that prevent the gateway from starting
after the channel-to-extension migration (16505718e8, 439c21e078).
These affect anyone building from the stock Dockerfile on current main
and will affect the next tagged release (v2026.3.14+).

The official GHCR image for v2026.3.13 is not affected because channels
had not yet been moved to extensions/ at that point.

Problem

After the channel migration to extensions/, bundled extensions now contain
hundreds of relative ../../../src/ imports that the tsdown bundler resolves
at build time. Two issues in the Docker build pipeline prevent this from
working for self-builders:

  1. OPENCLAW_EXTENSIONS defaults to empty — the ext-deps stage produces
    zero package.json files, so extension npm deps are never installed.
    tsdown silently produces zero extension output. The gateway falls back to
    Jiti transpilation, which fails because /app/src/ is excluded from the
    runtime image.

  2. pnpm prune --prod removes the self-referencing node_modules/openclaw
    symlink
    — plugins import from openclaw/plugin-sdk/* which requires
    this link to resolve in the pruned runtime image.

Changes (Dockerfile only)

1) Default OPENCLAW_EXTENSIONS to all bundled extensions

-ARG OPENCLAW_EXTENSIONS=""
+# Default: build all bundled extensions. Pass a space-separated list to
+# build only a subset (e.g. "whatsapp device-pair").
+ARG OPENCLAW_EXTENSIONS="__all__"

Update the ext-deps RUN block so that when OPENCLAW_EXTENSIONS is empty
or "__all__", package manifests are copied for every bundled extension.
Otherwise, preserve the existing explicit subset behavior.

The empty-string check is critical: docker-setup.sh always passes
--build-arg "OPENCLAW_EXTENSIONS=${OPENCLAW_EXTENSIONS}". When the host
env var is unset, this sends an empty string — overriding the Dockerfile
ARG default. Treating empty as "all" ensures both docker build . and
docker-setup.sh produce working builds.

2) Restore self-referencing package link after prune

+# Restore self-referencing package link removed by pnpm prune --prod.
+# Required for openclaw/plugin-sdk/* subpath imports at runtime.
+RUN ln -s /app /app/node_modules/openclaw

Added in the runtime-assets stage after pnpm prune --prod.

Root cause commits

Commit Description Impact
16505718e8 Move WhatsApp to extensions/ (#45725) First channel extension with ../../../src/ imports
439c21e078 Remove channel shim dirs (#45967) Completes migration; all channels now in extensions/
57f19f0d5c Add OPENCLAW_EXTENSIONS build arg (#32223) Introduced empty default that silently breaks extension compilation
b46ac250d1 WhatsApp: use scoped plugin SDK imports Made node_modules/openclaw symlink essential
e9cf3506fd Telegram: use scoped plugin SDK imports Same — all channels now need the self-referencing link

Verification

# Build with default (all extensions)
docker build -t openclaw:local -f Dockerfile .

# Verify extensions compiled
docker run --rm openclaw:local sh -c \
  'ls /app/dist/extensions/ | wc -l && echo "extensions built"'

# Verify self-ref link
docker run --rm openclaw:local ls -la /app/node_modules/openclaw

# Verify runtime module (fixed independently on main)
docker run --rm openclaw:local ls /app/dist/plugins/runtime/index.js

# Verify gateway starts
docker run --rm openclaw:local timeout 15 node openclaw.mjs gateway \
  --allow-unconfigured 2>&1 | grep -E "listening|error" | head -5

Backward compatibility

  • Empty OPENCLAW_EXTENSIONS (including the implicit empty string from
    docker-setup.sh when the env var is unset) now builds all extensions
    instead of building none. This is the key behavior change — it turns a
    broken default into a working one.
  • OPENCLAW_EXTENSIONS="__all__" is accepted as an explicit alias for
    the same behavior.
  • OPENCLAW_EXTENSIONS="whatsapp telegram" (etc.) continues to work
    as before — only the listed extensions have their npm deps installed
    at build time. tsdown still compiles all extensions found via
    openclaw.plugin.json; extensions whose third-party deps are missing
    emit unresolved-import warnings but produce working output for any
    imports resolved from the root package deps.
  • No changes to runtime behavior, plugin loading, or config format.
  • No platform-specific changes (Dockerfile runs in Linux container).
  • The symlink ln -s is POSIX standard, compatible with all Docker base images.

Testing

  • docker build with no --build-arg produces working image with all extensions
  • Gateway starts and loads channels without errors
  • WhatsApp channel connects and receives messages
  • docker build --build-arg OPENCLAW_EXTENSIONS="whatsapp device-pair" builds with subset deps
  • docker-setup.sh with unset OPENCLAW_EXTENSIONS env var builds all extensions
  • pnpm exec vitest run --config vitest.gateway.config.ts passes

Related

@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts docker Docker and sandbox tooling size: XL labels Mar 16, 2026
@gbballpack gbballpack force-pushed the fix/docker-self-build-extension-runtime branch from d60a0f3 to 5989881 Compare March 16, 2026 22:35
@openclaw-barnacle openclaw-barnacle Bot added size: XS and removed scripts Repository scripts size: XL labels Mar 16, 2026
@gbballpack gbballpack changed the title docker: fix self-build breakage after extension migration docker: fix self-build breakage after channel extension migration Mar 16, 2026
@gbballpack gbballpack marked this pull request as ready for review March 16, 2026 23:02
@greptile-apps

greptile-apps Bot commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes three interacting bugs that caused self-built Docker images to fail to start after the channel-to-extension migration on main. The fixes are surgical and well-scoped: updating the OPENCLAW_EXTENSIONS ARG default to auto-detect all extensions, restoring the pnpm self-referencing symlink that pnpm prune --prod strips, and adding a missing tsdown build entry for the plugin runtime module.

Key changes:

  • Dockerfile: OPENCLAW_EXTENSIONS now defaults to "__all__" with shell logic that also handles the empty-string override sent by docker-setup.sh — covering the root cause of the silent no-extensions build.
  • Dockerfile: RUN ln -s /app /app/node_modules/openclaw restores the self-referencing package link needed for openclaw/plugin-sdk/* subpath imports after pruning.
  • tsdown.config.ts: New nodeBuildConfig entry for src/plugins/runtime/index.tsdist/runtime/index.js, so the lazy runtime resolver finds the compiled module in production where /app/src/ is absent.
  • The stale # Opt-in extension dependencies header comment (line 3) is left unchanged but now contradicts the new "all by default" behaviour.

Confidence Score: 4/5

  • Safe to merge — all three fixes are correct and directly address the described breakage; only minor style/documentation nits remain.
  • The logic of each fix is sound: the shell condition [ -z "$OPENCLAW_EXTENSIONS" ] || [ "$OPENCLAW_EXTENSIONS" = "__all__" ] correctly handles both the docker-setup.sh empty-string override and the new sentinel; the symlink target /app is valid in the final runtime image; and the tsdown entry key "runtime/index" correctly maps to dist/runtime/index.js. The only deductions are for using ln -s instead of the idiomatic ln -sf (inconsistent with line 242) and a stale header comment, neither of which affects correctness.
  • No files require special attention — both Dockerfile and tsdown.config.ts changes are straightforward and low-risk.

Comments Outside Diff (1)

  1. Dockerfile, line 3-4 (link)

    Stale "Opt-in" header comment conflicts with new default

    The header on line 3 still says "Opt-in extension dependencies at build time", but after this PR the behaviour is the opposite — all extensions are built by default, and passing OPENCLAW_EXTENSIONS is now an opt-out mechanism to limit to a subset. The accurate description was added on lines 15–16 just before the ARG, so the old header is now misleading.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: Dockerfile
    Line: 3-4
    
    Comment:
    **Stale "Opt-in" header comment conflicts with new default**
    
    The header on line 3 still says "Opt-in extension dependencies at build time", but after this PR the behaviour is the opposite — all extensions are built by default, and passing `OPENCLAW_EXTENSIONS` is now an opt-*out* mechanism to limit to a subset. The accurate description was added on lines 15–16 just before the `ARG`, so the old header is now misleading.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: Dockerfile
Line: 117

Comment:
**Use `ln -sf` for consistency and idempotency**

The rest of the Dockerfile uses `ln -sf` (line 242: `ln -sf /app/openclaw.mjs /usr/local/bin/openclaw`), but this line uses plain `ln -s`. While `pnpm prune --prod` currently removes the self-referencing symlink, using `-f` (force) would make this layer idempotent and guard against future `pnpm` version changes that might leave the symlink in place — causing the build to fail with "File exists".

```suggestion
RUN ln -sf /app /app/node_modules/openclaw
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: Dockerfile
Line: 3-4

Comment:
**Stale "Opt-in" header comment conflicts with new default**

The header on line 3 still says "Opt-in extension dependencies at build time", but after this PR the behaviour is the opposite — all extensions are built by default, and passing `OPENCLAW_EXTENSIONS` is now an opt-*out* mechanism to limit to a subset. The accurate description was added on lines 15–16 just before the `ARG`, so the old header is now misleading.

```suggestion
# Extension build control (space-separated directory names, or leave unset for all).
# Example: docker build --build-arg OPENCLAW_EXTENSIONS="diagnostics-otel matrix" .
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 5989881

Comment thread Dockerfile
@@ -98,6 +112,9 @@ RUN pnpm ui:build
FROM build AS runtime-assets
RUN CI=true pnpm prune --prod && \
find dist -type f \( -name '*.d.ts' -o -name '*.d.mts' -o -name '*.d.cts' -o -name '*.map' \) -delete
# Restore self-referencing package link removed by pnpm prune --prod.
# Required for openclaw/plugin-sdk/* subpath imports at runtime.
RUN ln -s /app /app/node_modules/openclaw

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ln -sf for consistency and idempotency

The rest of the Dockerfile uses ln -sf (line 242: ln -sf /app/openclaw.mjs /usr/local/bin/openclaw), but this line uses plain ln -s. While pnpm prune --prod currently removes the self-referencing symlink, using -f (force) would make this layer idempotent and guard against future pnpm version changes that might leave the symlink in place — causing the build to fail with "File exists".

Suggested change
RUN ln -s /app /app/node_modules/openclaw
RUN ln -sf /app /app/node_modules/openclaw
Prompt To Fix With AI
This is a comment left during a code review.
Path: Dockerfile
Line: 117

Comment:
**Use `ln -sf` for consistency and idempotency**

The rest of the Dockerfile uses `ln -sf` (line 242: `ln -sf /app/openclaw.mjs /usr/local/bin/openclaw`), but this line uses plain `ln -s`. While `pnpm prune --prod` currently removes the self-referencing symlink, using `-f` (force) would make this layer idempotent and guard against future `pnpm` version changes that might leave the symlink in place — causing the build to fail with "File exists".

```suggestion
RUN ln -sf /app /app/node_modules/openclaw
```

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5989881c2f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tsdown.config.ts Outdated
// Plugin runtime module — the lazy resolver walks up from import.meta.url
// looking for runtime/index.{ts,js}. In production Docker builds /app/src/
// is absent, so this must be compiled into dist/runtime/index.js.
entry: { "runtime/index": "src/plugins/runtime/index.ts" },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit plugin runtime at path loader resolves

This entry builds src/plugins/runtime/index.ts into dist/runtime/index.js, but the runtime resolver only checks src/plugins/runtime/index.ts and dist/plugins/runtime/index.js (src/plugins/loader.ts, resolvePluginRuntimeModulePath, candidates at lines 211-218). In production Docker images where /app/src is absent, that mismatch means the new artifact is never discovered and startup can still fail with Unable to resolve plugin runtime module.

Useful? React with 👍 / 👎.

@gbballpack gbballpack force-pushed the fix/docker-self-build-extension-runtime branch 2 times, most recently from b071e9e to 90f0ebc Compare March 17, 2026 14:34
@gbballpack gbballpack changed the title docker: fix self-build breakage after channel extension migration docker: fix self-build breakage after extension migration Mar 21, 2026
@gbballpack

gbballpack commented Mar 21, 2026

Copy link
Copy Markdown
Author

Similar PRs and issues
Most directly related

#48422 — duplicate/adjacent runtime-module breakage you referenced
Closest sibling bug: same breakage window, different missing runtime artifact.

#48447 — fix: mirror bundled extension deps in root package.json (#48189)
Very similar class of bug: Docker/runtime packaging misses extension dependencies, causing Cannot find package ... errors.

Root-cause / enabling PRs
These three are the strongest “similar PRs” to cite because they explain how the breakage got introduced.

#45725 — refactor: move WhatsApp channel implementation to extensions/

#45967 — refactor: remove channel shim directories, point all imports to extensions

#32223 — container builds: opt-in extension deps via OPENCLAW_EXTENSIONS build arg

Same symptom family: packaged/runtime module resolution breaks
These are very good supporting examples because they show the same pattern of extensions depending on ../../../src/... or missing packaged runtime artifacts:

#46609 — Signal plugin failed to load: Cannot find module '../../../src/infra/outbound/send-deps.js'

#47021 — Telegram plugin fails after refactor: Cannot find module '../../../src/infra/outbound/deliver.js'

#41832 — BlueBubbles plugin fails to load: Cannot find module '../../../src/infra/parse-finite-number.js'

#32662 — Nextcloud Talk failed to load: Cannot find module '../../../src/infra/abort-signal.js'

#37915 — Nextcloud Talk plugin fails to load: same abort-signal.js packaging failure

#18846 — llm-task dynamic import fails for pi-embedded-runner.js because source/dist agent entry is missing

@gbballpack gbballpack force-pushed the fix/docker-self-build-extension-runtime branch from 90f0ebc to e849524 Compare March 29, 2026 22:53
@steipete

Copy link
Copy Markdown
Contributor

Closing this as implemented after Codex review.

Current main already covers the reported Docker self-build/runtime failure through later plugin runtime-deps and plugin-sdk alias fixes, so this PR's Dockerfile-only workaround is obsolete on main.

What I checked:

  • Bundled plugins now self-repair runtime deps during load: The plugin loader now installs bundled runtime deps for enabled bundled plugins, then either mirrors the runtime root into an external stage dir or injects a packaged openclaw/plugin-sdk/* alias under dist/extensions/node_modules/openclaw when the plugin root is writable. That removes the need for a Dockerfile-level /app/node_modules/openclaw symlink workaround. (src/plugins/loader.ts:2196, 48b9452c0795)
  • Main writes packaged plugin-sdk aliases instead of relying on root symlink: ensureOpenClawPluginSdkAlias writes dist/extensions/node_modules/openclaw/package.json plus plugin-sdk/*.js wrappers, which directly addresses the packaged-install openclaw/plugin-sdk/* resolution problem raised in the PR. (src/plugins/bundled-runtime-root.ts:211, 48b9452c0795)
  • Regression test covers external stage dir + plugin-sdk imports with no openclaw symlink: Current tests load a bundled plugin that imports openclaw/plugin-sdk/text-runtime from an external stage dir and explicitly assert that dist/extensions/node_modules/openclaw does not exist. That is direct evidence the old symlink-based failure mode is fixed on main. (src/plugins/loader.test.ts:1457, 48b9452c0795)
  • Changelog records the packaged-install/plugin-sdk fix on main: Unreleased changelog entry: bundled plugin openclaw/plugin-sdk/* resolution was restored for packaged installs and external runtime-deps stage roots, specifically calling out the old Cannot find package 'openclaw' crash-loop after missing dependency repair. (CHANGELOG.md:104, 48b9452c0795)
  • The original monorepo-only relative-import premise is no longer true for production extension code: A repo-wide search found ../../../src/... imports only in tests/test-support paths, not in shipped extension runtime code. That removes the main build/runtime premise described in the PR body.
  • Docker docs still describe OPENCLAW_EXTENSIONS as optional pre-install behavior: Current docs describe OPENCLAW_EXTENSIONS as an optional build-time pre-install knob, which matches the new runtime-deps repair path rather than the PR's claim that empty/default must be treated as broken. Public docs: docs/install/docker.md. (docs/install/docker.md:121, 48b9452c0795)

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Review notes: reviewed against 4013c658537e; fix evidence: commit 48b9452c0795.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docker Docker and sandbox tooling size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants