Skip to content

[Bug]: maybeBootstrapChannelPlugin destructively replaces plugin registry, dropping autoStart:false channels #54599

@ecapuano

Description

@ecapuano

Describe the bug

maybeBootstrapChannelPlugin() calls loadOpenClawPlugins() with activate: true (default), which creates a fresh empty plugin registry and replaces the global active registry via setActivePluginRegistry(). Channels configured with autoStart: false + httpUrl (externally-managed daemons) may not survive this mid-lifecycle registry rebuild, causing them to silently disappear from the plugin registry. The bootstrapAttempts Set then permanently blocks re-registration for the lifetime of the process.

Result: The message tool and subagent announce steps fail with "Unknown channel: <channel>" or "Channel is unavailable: <channel>", while in-session replies (which use a pre-bound deliveryContext that bypasses the registry) continue working normally. Only a gateway restart recovers the channel.

This appears to be a regression or incomplete fix from the work in PR #23727 / commit 4258a33 (which introduced maybeBootstrapChannelPlugin as a recovery mechanism for #14188). The recovery mechanism itself is now causing collateral damage.

To reproduce

  1. Configure a channel with autoStart: false + httpUrl (e.g., Signal with an external signal-cli daemon)
  2. Set healthMonitor: { enabled: false } for that channel (per recommended config for externally-managed daemons)
  3. Run the gateway for several hours under load, ideally with sustained LLM errors (overloaded/timeout) that trigger model failover
  4. Trigger a subagent run that completes and fires an announce step targeting the autoStart: false channel
  5. Observe: announce step fails with Unknown channel: signal even though the external daemon is healthy and in-session replies work

Expected behavior

maybeBootstrapChannelPlugin() should recover a missing channel without destroying other channels in the registry. Channels with autoStart: false that were successfully registered at startup should survive mid-lifecycle plugin reloads.

Root cause analysis

Traced through the v2026.3.12 bundle (pi-embedded-CbCYZxIb.js):

1. Destructive registry replacement (loadOpenClawPlugins, line 149373):

  • On cache miss, creates a fresh empty registry via createPluginRegistry() (empty channels: [])
  • Populates it by discovering and loading all plugins
  • Calls activatePluginRegistry()setActivePluginRegistry() which replaces the entire global registry (state.registry = registry)

2. Bootstrap trigger (maybeBootstrapChannelPlugin, lines 8217–8235):

  • Called from resolveOutboundChannelPlugin() when a channel isn't found in the registry
  • Passes config through applyPluginAutoEnable() which may produce a different effective config than startup, causing a cache key mismatch → full reload
  • Does NOT pass activate: false, so the fresh registry replaces the global one

3. autoStart:false channels may not survive reload:

  • At gateway startup, autoStart: false + httpUrl channels are registered correctly (plugin points to existing HTTP URL)
  • During mid-lifecycle loadOpenClawPlugins(), the loading path may handle these channels differently (setup-runtime mode at lines 149596–149712 may skip full registration)
  • If the channel isn't fully registered in the new registry, it's lost when the registry is activated

4. bootstrapAttempts locks in the failure (line 8211):

  • After loadOpenClawPlugins() completes without throwing, the channel's attempt key stays in the bootstrapAttempts Set permanently
  • Since the function "succeeded" (no exception), retries are blocked for the lifetime of that registry key
  • The channel remains missing until gateway restart

Suggested fix

One or more of:

  1. Make bootstrap additive, not destructive: maybeBootstrapChannelPlugin() should add the missing channel to the existing registry rather than replacing the entire registry. Pass activate: false and manually merge the result.

  2. Preserve existing channels during reload: If loadOpenClawPlugins() must rebuild the registry, carry forward channel entries from the previous registry that aren't being explicitly reloaded.

  3. Fix bootstrapAttempts recovery: If the target channel is still missing from the registry after loadOpenClawPlugins() completes, delete the attempt key from bootstrapAttempts (treat as failure, not success). Currently only exceptions trigger cleanup.

  4. Ensure autoStart:false channels survive reload: The mid-lifecycle loading path should handle autoStart: false + httpUrl channels identically to the startup path — register the plugin pointing to the HTTP URL without requiring daemon startup.

Environment

  • OpenClaw version: v2026.3.12
  • OS: Ubuntu 24.04 (Linux 6.8.0-106-generic)
  • Channel: Signal via signal-cli with autoStart: false, httpUrl: "http://127.0.0.1:8080", healthMonitor: { enabled: false }
  • Observed: 2 occurrences across 2 gateway processes (2026-03-24, 2026-03-25), each after 5–16 hours of uptime
  • Correlation: Both occurrences preceded by sustained Anthropic API overloaded errors and long-running subagent waits (2.5–4 min)

Related issues

Source code references (v2026.3.12 bundle)

Component File Lines
setActivePluginRegistry() dist/runtime-Iz8uZ7EU.js 38–42
resolveOutboundChannelPlugin() dist/pi-embedded-CbCYZxIb.js 8244–8256
maybeBootstrapChannelPlugin() dist/pi-embedded-CbCYZxIb.js 8217–8235
bootstrapAttempts Set dist/pi-embedded-CbCYZxIb.js 8211
loadOpenClawPlugins() dist/pi-embedded-CbCYZxIb.js 149373–149820
Cache key computation dist/pi-embedded-CbCYZxIb.js 149125–149143

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions