Skip to content

[Bug]: Bundled runtime deps still restage on first user turn in 2026.4.24-beta.2 #71599

@Conan-Scott

Description

@Conan-Scott

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Yes

Summary

Follow-up to #71420. Retested on 2026.4.24-beta.2 in the same restricted OpenShift/Kubernetes-style deployment. The writable external staging/cache work appears improved, but bundled runtime deps still do not converge across gateway startup, control/session, and doctor paths.

On startup, OpenClaw stages the active bundled plugin set and reaches ready. On the first user turn/control path, it stages a broader provider/plugin set again, delaying webchat/control responses by ~50s and leaving the user turn stuck for several minutes. openclaw doctor --fix also repeatedly reports missing bundled runtime deps and reinstalls them on subsequent runs instead of converging.

This looks like the same class of issue from #71420: staged deps are not being recognized as already satisfied across all plugin-registry/session-control/doctor paths.

Steps to reproduce

  1. Run OpenClaw 2026.4.24-beta.2 in a restricted container/Kubernetes/OpenShift-style deployment where the packaged app tree (/app) is not writable by the runtime user.
  2. Use persistent writable OpenClaw state under /home/node/.openclaw.
  3. Start the gateway with bundled plugins enabled, including at least acpx, browser, discord, memory-core, and telegram.
  4. Wait for gateway startup to complete.
  5. Open webchat/control UI and send the first user turn.
  6. Watch the gateway log during the first user turn/control path.
  7. Separately, run openclaw doctor --fix twice and compare whether the second run recognizes bundled runtime deps as already satisfied.

Expected behavior

Bundled runtime deps should stage once into the configured/default writable external stage root, then be reused by later gateway, control/session, and doctor paths.

After the first successful staging/install:

  • first user turn should not trigger another broad bundled runtime-deps install pass;
  • models.list, node.list, and related control calls should not block for ~50s on dependency staging;
  • openclaw doctor --fix should converge, with a subsequent run reporting no missing bundled runtime deps;
  • all paths should derive the same stage-root identity and retained manifest state.

Actual behavior

On 2026.4.24-beta.2, startup still stages bundled runtime deps and reaches ready after roughly one minute. Then the first user turn/control path stages the broader provider/plugin set again and blocks control responses for ~50s. The agent turn remained stuck for several minutes and produced a stuck-session diagnostic.

Observed startup staging set:

  • acpx
  • browser
  • discord
  • memory-core
  • telegram

Observed first user-turn/control staging set:

  • acpx
  • anthropic
  • brave
  • browser
  • discord
  • google
  • memory-core
  • openai
  • tavily
  • telegram

Observed control-call delay after the first turn/control path:

  • models.list around 49.7s
  • node.list around 49.7s
  • device.pair.list around 49.7s
  • later node.list around 52.7s

Observed stuck-session warning:

  • state=processing
  • age=214s

openclaw doctor --fix also does not converge in this deployment. Each run reports a “Bundled plugin runtime deps are missing” section, then an “Installed bundled plugin deps” section, and during the fix phase logs fresh staging again, for example:

[plugins] acpx staging bundled runtime deps (1 missing, 38 install specs): acpx@0.5.3

Subsequent doctor --fix runs repeat the same missing/install output rather than recognizing the staged deps as already satisfied.

OpenClaw version

2026.4.24-beta.2

Operating system

Debian GNU/Linux 12 (GHCR docker image)

Install method

docker/k8s

Model

openai-codex/GPT-5.5

Provider / routing chain

N/A

Additional provider/model setup details

N/A

Logs, screenshots, and evidence

Startup excerpt from 2026.4.24-beta.2:

2026-04-25T22:37:23.180+10:00 [gateway] starting...
2026-04-25T22:37:24.842+10:00 [plugins] acpx staging bundled runtime deps (1 missing, 1 install specs): acpx@0.5.3
2026-04-25T22:37:39.318+10:00 [plugins] acpx installed bundled runtime deps in 14476ms: acpx@0.5.3
2026-04-25T22:37:41.254+10:00 [plugins] browser staging bundled runtime deps (7 missing, 8 install specs): @modelcontextprotocol/sdk@1.29.0, commander@^14.0.3, express@^5.2.1, playwright-core@1.59.1, typebox@1.1.31, undici@8.1.0, ws@^8.20.0
2026-04-25T22:37:44.539+10:00 [plugins] browser installed bundled runtime deps in 3285ms: @modelcontextprotocol/sdk@1.29.0, commander@^14.0.3, express@^5.2.1, playwright-core@1.59.1, typebox@1.1.31, undici@8.1.0, ws@^8.20.0
2026-04-25T22:37:46.379+10:00 [plugins] discord staging bundled runtime deps (8 missing, 13 install specs): @buape/carbon@0.16.0, @discordjs/voice@^0.19.2, discord-api-types@^0.38.47, https-proxy-agent@^9.0.0, opusscript@^0.1.1, typebox@1.1.31, undici@8.1.0, ws@^8.20.0
2026-04-25T22:37:50.718+10:00 [plugins] discord installed bundled runtime deps in 4339ms: @buape/carbon@0.16.0, @discordjs/voice@^0.19.2, discord-api-types@^0.38.47, https-proxy-agent@^9.0.0, opusscript@^0.1.1, typebox@1.1.31, undici@8.1.0, ws@^8.20.0
2026-04-25T22:38:04.036+10:00 [plugins] memory-core staging bundled runtime deps (2 missing, 14 install specs): chokidar@^5.0.0, typebox@1.1.31
2026-04-25T22:38:07.114+10:00 [plugins] memory-core installed bundled runtime deps in 3078ms: chokidar@^5.0.0, typebox@1.1.31
2026-04-25T22:38:09.250+10:00 [plugins] telegram staging bundled runtime deps (5 missing, 17 install specs): @grammyjs/runner@^2.0.3, @grammyjs/transformer-throttler@^1.2.1, grammy@^1.42.0, typebox@1.1.31, undici@8.1.0
2026-04-25T22:38:11.880+10:00 [plugins] telegram installed bundled runtime deps in 2629ms: @grammyjs/runner@^2.0.3, @grammyjs/transformer-throttler@^1.2.1, grammy@^1.42.0, typebox@1.1.31, undici@8.1.0

First user-turn/control path summary from the same run:

# first user-turn/control path staged the broader set again:
[plugins] acpx staging bundled runtime deps ...
[plugins] anthropic staging bundled runtime deps ...
[plugins] brave staging bundled runtime deps ...
[plugins] browser staging bundled runtime deps ...
[plugins] discord staging bundled runtime deps ...
[plugins] google staging bundled runtime deps ...
[plugins] memory-core staging bundled runtime deps ...
[plugins] openai staging bundled runtime deps ...
[plugins] tavily staging bundled runtime deps ...
[plugins] telegram staging bundled runtime deps ...

# control calls delayed around the same window:
[ws] ⇄ res ✓ models.list ~49700ms
[ws] ⇄ res ✓ node.list ~49700ms
[ws] ⇄ res ✓ device.pair.list ~49700ms

# later warning:
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:main:main state=processing age=214s queueDepth=1

Stage-root evidence from the beta 2 pod shows both a versioned beta 2 root and a fresh openclaw-unknown-* root under ~/.openclaw/plugin-runtime-deps:

openclaw-2026.4.24-beta.2-f53b52ad6d21
openclaw-unknown-6ff65e466848


The beta 2 versioned root had a manifest updated at 2026-04-25 23:21:09 +1000 with 38 specs. The beta 2 openclaw-unknown-6ff65e466848 root had a manifest updated at 2026-04-25 23:12:54 +1000 with 11 specs, matching the Discord/Telegram/shared deps subset (@buape/carbon, @discordjs/voice, @grammyjs/runner, grammy, typebox, undici, ws, etc.).

So beta 2 is still creating/updating an openclaw-unknown-* stage root alongside the expected versioned root. That supports the theory that at least one runtime-deps path is deriving a different package/version identity than the main gateway path.

Impact and severity

Affected users: operators running OpenClaw in restricted/non-root/containerized deployments where packaged application paths are not writable and runtime state must live under explicit writable volumes.
Severity: Medium to High for Kubernetes/OpenShift-style deployments.
Consequences:

  • first user turn can hang for several minutes;
  • control UI calls such as models.list and node.list are delayed by ~50s;
  • gateway startup remains slow because active bundled plugins restage deps;
  • doctor --fix does not converge, making the recommended repair path unreliable;
  • users may need to restart pods to recover stuck turns;
  • production-style container deployments remain fragile despite the beta fix.

Additional information

The previous issue #71420 was closed as fixed on main / next release

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions