Skip to content

Capability-provider lookups bypass cache when plugins.entries is non-empty (~25-30s latency per turn) #73793

@poolside-ventures

Description

@poolside-ventures

Bug type

Behavior bug (incorrect output/state without crash) — performance regression-class issue (latency stacking).

Beta release blocker

No

Summary

resolvePluginCapabilityProviders bypasses its active-registry cache on every call when cfg.plugins.entries is non-empty, causing 4–5 full loadOpenClawPlugins cycles per agent turn (~5–6s each), adding ~25–30s of wall-clock latency to every turn even when the underlying LLM call completes in <100ms.

Steps to reproduce

  1. Run any gateway with openclaw.json containing at least one plugin in plugins.entries (i.e., any user who has installed any plugin via openclaw plugins install …). Confirmed against 2026.4.26-f53b52ad6d21.
  2. Connect a third-party channel plugin that exports a default register(api) function and logs on entry. (Reproduced with scope-openclaw 0.35.4, which logs register #N timing per call with the trimmed openclaw runtime call stack.)
  3. Send a single inbound message to the gateway.
  4. Observe register() is called 4× in a single turn, ~5–6s apart, all with stack runPluginRegisterSync ← loadOpenClawPlugins ← resolveRuntimePluginRegistry ← resolvePluginCapabilityProviders (capability-provider-runtime.ts:316 in source).

Observed log (top of stack on each repeat):

register #2  ← ensureRuntimePluginsLoaded            (initial)
register #3  ← resolvePluginCapabilityProviders      (+~6s)
register #4  ← resolvePluginCapabilityProviders      (+~6s)
register #5  ← resolvePluginCapabilityProviders      (+~6s)

Expected behavior

After the initial loadOpenClawPlugins cycle has populated the active plugin registry, subsequent calls to resolvePluginCapabilityProviders for capability keys whose providers already exist in the active registry should reuse it without re-running the full plugin load + register cycle. The runtime invalidates the active registry on real config changes already, so the early-return should be safe.

Actual behavior

The early-return at src/plugins/capability-provider-runtime.ts:326-333 is gated on:

if (
  activeProviders.length > 0 &&
  params.key !== "memoryEmbeddingProviders" &&
  params.key !== "speechProviders" &&
  !hasExplicitPluginConfig(params.cfg?.plugins)   // ← always false when plugins.entries non-empty
) {
  return activeProviders.map(...);
}

hasExplicitPluginConfig (src/plugins/config-normalization-shared.ts:162) returns true for any non-empty plugins.entries:

if (plugins.entries && Object.keys(plugins.entries).length > 0) {
  return true;
}

Since installing a plugin populates plugins.entries, this gate fails for effectively every production user. Each capability lookup falls through to resolveRuntimePluginRegistry(loadOptions)loadOpenClawPlugins(loadOptions) → re-import + manifest validation + register() across all loaded plugins.

The hot path on a single turn invokes lookups for imageGenerationProviders, videoGenerationProviders, musicGenerationProviders, mediaUnderstandingProviders, realtimeVoiceProviders, realtimeTranscriptionProviders, etc. (each with its own caller in src/{image,video,music}-generation/provider-registry.ts, src/media-understanding/provider-{capability-,}registry.ts, etc.). Each one cache-misses through this gate.

Environment

  • OpenClaw 2026.4.26-f53b52ad6d21 (production gateway)
  • Node v24.14.0
  • Linux x64 (Elestio-hosted Docker container)
  • Active plugins: memory-core, scope (channel)

Logs / evidence

Per-turn timing from instrumented plugin (truncated):

17:28:21.299  scope-openclaw register #2 timing: module-load=83955ms log=2ms channel=0ms provider=0ms hooks=0ms tools+on=0ms total=2ms
17:28:21.300  scope-openclaw register #2 caller: runPluginRegisterSync ← loadOpenClawPlugins ← resolveRuntimePluginRegistry ← ensureRuntimePluginsLoaded
17:28:36.472  scope-openclaw register #3 total=2ms
17:28:36.473  scope-openclaw register #3 caller: runPluginRegisterSync ← loadOpenClawPlugins ← resolveRuntimePluginRegistry ← resolvePluginCapabilityProviders
17:28:42.711  scope-openclaw register #4 total=2ms
17:28:42.713  scope-openclaw register #4 caller: runPluginRegisterSync ← loadOpenClawPlugins ← resolveRuntimePluginRegistry ← resolvePluginCapabilityProviders
17:28:48.597  scope-openclaw register #5 total=2ms
17:28:48.598  scope-openclaw register #5 caller: runPluginRegisterSync ← loadOpenClawPlugins ← resolveRuntimePluginRegistry ← resolvePluginCapabilityProviders

Module-load value is monotonically growing (84s → 99s → 105s → 111s) confirming the JS module is the same instance across calls — it's not a chokidar / dynamic-import issue. Plugin-side register() body itself runs in 2ms; the 5–6s/call is entirely inside the runtime's loadOpenClawPlugins cycle.

Proposed fix

Drop the !hasExplicitPluginConfig gate on this early-return. memoryEmbeddingProviders and speechProviders continue to fall through (their lookups must reconcile against cfg-level provider preferences). For every other capability key, when activeProviders.length > 0 the active registry is already authoritative — the runtime rebuilds it on config-affecting changes, so reusing it here is safe regardless of cfg.plugins.entries.

PR with fix + regression test: poolside-ventures/openclaw#fix/capability-provider-cache-bypass (will link separately once filed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions