Skip to content

perf(agents/runtime): short-circuit ensureRuntimePluginsLoaded when active registry exists#74118

Closed
poolside-ventures wants to merge 1 commit into
openclaw:mainfrom
poolside-ventures:perf/ensure-runtime-plugins-skip-when-active
Closed

perf(agents/runtime): short-circuit ensureRuntimePluginsLoaded when active registry exists#74118
poolside-ventures wants to merge 1 commit into
openclaw:mainfrom
poolside-ventures:perf/ensure-runtime-plugins-skip-when-active

Conversation

@poolside-ventures

Copy link
Copy Markdown

Summary

  • Problem: ensureRuntimePluginsLoaded (src/agents/runtime-plugins.ts:6) is called from dispatchReplyFromConfig on every inbound message and rebuilds the entire plugin registry on each call. It builds a 3-field options object (config, workspaceDir, runtimeOptions) and hands it to resolveRuntimePluginRegistry. getCompatibleActivePluginRegistry's strict cacheKey equality comparison fails — boot's options set has 9+ fields (onlyPluginIds, activationSourceConfig, autoEnabledReasons, etc.), so the two hashes always differ. The fall-through runs a full loadOpenClawPlugins: Jiti-loads every plugin module, re-validates manifests, re-runs each plugin's register().
  • Why it matters: ~5–6s of wasted wall-clock per inbound message on hosted gateways. The active registry is already a valid answer; rebuilding it produces no useful state change. Repro and instrumented stack traces in ensureRuntimePluginsLoaded cache-misses every inbound dispatch (~5-6s wasted per message) #74117.
  • What changed: Fast path in ensureRuntimePluginsLoaded — if getActivePluginRegistry() is populated, return immediately. The function's intent is to ensure plugins are loaded; if they already are, the goal is met. Updates the existing test (which was asserting one call — that assertion was wrong; the prior behavior reactivated every time) into a proper regression test that asserts zero calls when the active registry exists.
  • What did NOT change (scope boundary): Behavior when no active registry exists (still builds load options and calls resolveRuntimePluginRegistry). loadGatewayPlugins's boot path. getCompatibleActivePluginRegistry's strict-equality logic — that's a more intrusive change to file separately if desired. Plugin reconfiguration paths that explicitly invalidate the active registry (setActivePluginRegistry, gateway restart on config write) — those continue to work as before.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution

Linked Issue/PR

Root Cause

ensureRuntimePluginsLoaded is the simplest member of a family of "ensure registry is loaded" helpers (cf. ensurePluginRegistryLoaded in plugins/runtime/runtime-registry-loader.ts, which has its own scope-tracking short-circuit via pluginRegistryLoaded rank). The agent-side ensureRuntimePluginsLoaded is a thin wrapper that defers all caching responsibility to resolveRuntimePluginRegistry → getCompatibleActivePluginRegistry. That worked when the cache-key derivation matched between boot and dispatch. It stopped working when the boot path grew additional options (onlyPluginIds, activationSourceConfig, etc.) that the dispatch path doesn't supply, but the strict-equality check was never relaxed to a "is-compatible-subset" check.

A more thorough fix would be to make getCompatibleActivePluginRegistry accept subset-compatibility, but that's a larger surface area touching multiple call sites. This PR takes the minimal approach for this specific call site: short-circuit at the helper level. The function's contract is "ensure plugins are loaded", not "always rebuild the registry".

Tests

  • Updated src/agents/runtime-plugins.test.ts:
    • Added getActivePluginRegistry mock to the existing vi.mock("../plugins/runtime.js", ...) block.
    • Renamed and rewrote the existing test (does not reactivate plugins when a process already has an active registry — which was incorrectly asserting one call) into short-circuits without rebuilding load options when an active registry exists, asserting zero calls to resolveRuntimePluginRegistry.
    • Other tests (resolves runtime plugins through the shared runtime helper when no active registry is present, etc.) now mock getActivePluginRegistry to return undefined so they exercise the fall-through path.

Verification

$ pnpm vitest run src/agents/runtime-plugins.test.ts
 Test Files  2 passed (2)
      Tests  8 passed (8)

…ctive registry exists

ensureRuntimePluginsLoaded is called from dispatchReplyFromConfig on
every inbound message and was rebuilding the entire plugin registry
on each call.

Root cause: it builds a 3-field options object (config, workspaceDir,
runtimeOptions) and calls resolveRuntimePluginRegistry. That ends up
in getCompatibleActivePluginRegistry, which does a strict cacheKey
equality check against the active registry's cache key. The boot path
(loadGatewayPlugins) populates the active registry with a 9+ field
options set (onlyPluginIds, activationSourceConfig, autoEnabledReasons,
coreGatewayHandlers, ...). The dispatch-time hash and the boot-time
hash always differ, the equality check always fails, and the dispatcher
falls through to a full loadOpenClawPlugins — re-importing every
plugin module, re-validating manifests, and re-running each plugin's
register(). On hosted gateways with plugins.entries populated, that's
~5–6s of wasted wall-clock per inbound message even when the active
registry was already a valid answer.

The function exists to *ensure* runtime plugins are loaded — if the
active registry is already populated, the goal is already met.
Plugin reconfiguration (installs / uninstalls / auto-enable changes)
already invalidates the active registry through other code paths
(setActivePluginRegistry, gateway restart on config write), so a stale
active registry is not a concern here.

Updates the existing "does not reactivate plugins when a process
already has an active registry" test (which was asserting one call —
incorrect; the prior code reactivated every time) into a regression
test that asserts zero calls when getActivePluginRegistry returns
truthy.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS labels Apr 29, 2026
@greptile-apps

greptile-apps Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a fast-path short-circuit in ensureRuntimePluginsLoaded that returns immediately when getActivePluginRegistry() is populated, fixing a ~5–6s per-inbound-message regression on hosted gateways caused by a strict cacheKey equality mismatch between the boot-time (9+ fields) and dispatch-time (3-field) options sets. The logic and tests are correct for all current code paths.

Confidence Score: 4/5

Safe to merge; the short-circuit is correct for all current call paths, with one minor architectural fragility worth noting.

Only a P2 style/robustness concern: the truthy registry check does not distinguish a properly-loaded registry from an empty placeholder created by requireActivePluginRegistry. No present-day code path triggers the problematic sequence, but the guard could silently misbehave if future callers introduce one.

src/agents/runtime-plugins.ts — the short-circuit guard on line 31

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/runtime-plugins.ts
Line: 31-33

Comment:
**Truthy check includes empty placeholder registries**

`getActivePluginRegistry()` returns any non-null `PluginRegistry`, including the empty object that `requireActivePluginRegistry()` allocates (via `createEmptyPluginRegistry()`) as a zero-plugin placeholder when called before any real load. If any code path invokes `requireActivePluginRegistry` before the first `ensureRuntimePluginsLoaded` call in a non-gateway process (e.g. an early `listChannelSetupPlugins` call during channel setup), this short-circuit would fire on that placeholder and silently skip real plugin loading.

This is not a bug in the current call graph — `dispatch-from-config.ts` calls `ensureRuntimePluginsLoaded` at line 391, before any hook runner or channel-setup operations — but the guard is fragile against future callers. A more future-proof check would be `getActivePluginRegistry()?.plugins.length > 0`, or a dedicated `hasActivePluginRegistryBeenLoaded()` boolean, to distinguish the gateway boot registry from a zero-plugin placeholder.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "perf(agents/runtime): short-circuit ensu..." | Re-trigger Greptile

Comment on lines +31 to +33
if (getActivePluginRegistry()) {
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Truthy check includes empty placeholder registries

getActivePluginRegistry() returns any non-null PluginRegistry, including the empty object that requireActivePluginRegistry() allocates (via createEmptyPluginRegistry()) as a zero-plugin placeholder when called before any real load. If any code path invokes requireActivePluginRegistry before the first ensureRuntimePluginsLoaded call in a non-gateway process (e.g. an early listChannelSetupPlugins call during channel setup), this short-circuit would fire on that placeholder and silently skip real plugin loading.

This is not a bug in the current call graph — dispatch-from-config.ts calls ensureRuntimePluginsLoaded at line 391, before any hook runner or channel-setup operations — but the guard is fragile against future callers. A more future-proof check would be getActivePluginRegistry()?.plugins.length > 0, or a dedicated hasActivePluginRegistryBeenLoaded() boolean, to distinguish the gateway boot registry from a zero-plugin placeholder.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/runtime-plugins.ts
Line: 31-33

Comment:
**Truthy check includes empty placeholder registries**

`getActivePluginRegistry()` returns any non-null `PluginRegistry`, including the empty object that `requireActivePluginRegistry()` allocates (via `createEmptyPluginRegistry()`) as a zero-plugin placeholder when called before any real load. If any code path invokes `requireActivePluginRegistry` before the first `ensureRuntimePluginsLoaded` call in a non-gateway process (e.g. an early `listChannelSetupPlugins` call during channel setup), this short-circuit would fire on that placeholder and silently skip real plugin loading.

This is not a bug in the current call graph — `dispatch-from-config.ts` calls `ensureRuntimePluginsLoaded` at line 391, before any hook runner or channel-setup operations — but the guard is fragile against future callers. A more future-proof check would be `getActivePluginRegistry()?.plugins.length > 0`, or a dedicated `hasActivePluginRegistryBeenLoaded()` boolean, to distinguish the gateway boot registry from a zero-plugin placeholder.

How can I resolve this? If you propose a fix, please make it concise.

@clawsweeper

clawsweeper Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

What this changes:

The PR adds an early return in ensureRuntimePluginsLoaded when an active plugin registry exists and updates the unit test to assert the runtime resolver is not called in that case.

Required change before merge:

The remaining blockers are narrow and automatable: preserve workspace compatibility in the fast path, add focused regression coverage, and add the required changelog entry.

Security review:

Security review cleared: The diff does not touch CI, dependencies, package resolution, secrets, permissions, or other supply-chain-sensitive surfaces; the concern found is functional workspace compatibility.

Review findings:

  • [P2] Preserve workspace compatibility in the fast path — src/agents/runtime-plugins.ts:31-33
  • [P3] Add the required changelog entry — src/agents/runtime-plugins.ts:31-33
Review details

Best possible solution:

Keep the performance direction, but make the short-circuit compatibility-aware: normalize the requested workspace first, reuse the active registry only when it is scoped to the same workspace/runtime surface, and fall through for mismatched workspaces or placeholder registries. Add a regression test for the mismatch case and the required changelog entry.

Do we have a high-confidence way to reproduce the issue?

Yes. A focused unit test can set an active registry for workspace A, call ensureRuntimePluginsLoaded with workspace B, and assert resolveRuntimePluginRegistry is still called; current source shows workspace roots are part of plugin registry compatibility.

Is this the best way to solve the issue?

No. The current PR is the narrowest idea for the hosted dispatch hot path, but the implementation should compare active-registry workspace compatibility or use an equivalent loader helper before skipping resolution.

Full review comments:

  • [P2] Preserve workspace compatibility in the fast path — src/agents/runtime-plugins.ts:31-33
    This returns as soon as any active registry exists, before resolving or comparing params.workspaceDir. Gateway startup stores the active registry for the default workspace, while inbound dispatch calls this helper with the session agent workspace, and workspace plugin roots are part of the loader cache key. A non-default agent with workspace plugins would skip resolveRuntimePluginRegistry and never load the registry for that workspace.
    Confidence: 0.87
  • [P3] Add the required changelog entry — src/agents/runtime-plugins.ts:31-33
    This is a user-facing performance bug fix and the PR changes only runtime code plus tests. Repo policy requires a CHANGELOG.md entry for user-facing fixes/perf changes before merge.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

Acceptance criteria:

  • pnpm test src/agents/runtime-plugins.test.ts
  • pnpm exec oxfmt --check --threads=1 src/agents/runtime-plugins.ts src/agents/runtime-plugins.test.ts CHANGELOG.md
  • pnpm check:changed in Testbox before maintainer handoff

What I checked:

  • PR fast path: The proposed patch adds if (getActivePluginRegistry()) return; before resolving params.workspaceDir or building load options, so any active registry suppresses the runtime resolver. (src/agents/runtime-plugins.ts:31, bcd6e4a3c020)
  • Dispatch passes session workspace: Current main computes the session agent workspace and passes it into ensureRuntimePluginsLoaded for inbound dispatch. (src/auto-reply/reply/dispatch-from-config.ts:402, efe6b37407a2)
  • Gateway startup uses default workspace: Gateway startup derives the default agent workspace before loading startup plugin runtime, so the active registry can be scoped to a different workspace than a later session agent. (src/gateway/server-startup-plugins.ts:221, efe6b37407a2)
  • Loader treats workspace as compatibility input: The loader cache key includes the workspace plugin root and hasExplicitCompatibilityInputs treats workspaceDir as compatibility-shaping input, so current behavior intentionally distinguishes registries across workspaces. (src/plugins/loader.ts:672, efe6b37407a2)
  • Active registry workspace is tracked: setActivePluginRegistry stores the active workspace and getActivePluginRegistryWorkspaceDir exposes it, giving the fast path an existing way to avoid cross-workspace reuse. (src/plugins/runtime.ts:125, efe6b37407a2)
  • Workspace plugins are documented cache scope: Plugin architecture docs say compatibility checks must include plugin discovery roots such as the default agent workspace because workspace plugins are part of the metadata scope. Public docs: docs/plugins/architecture.md. (docs/plugins/architecture.md:154, efe6b37407a2)

Likely related people:

  • steipete: Recent commits maintained src/agents/runtime-plugins.ts, plugin loader hot paths, and gateway startup/runtime dependency behavior around this surface. (role: recent maintainer; confidence: high; commits: 29ed5266bf0a, 9ca1f1a64e7a, 661af2acd394; files: src/agents/runtime-plugins.ts, src/plugins/loader.ts, src/gateway/server-plugins.ts)
  • vincentkoc: History shows repeated work on plugin runtime state, workspace propagation, and gateway plugin startup scope that directly frame this cache-compatibility issue. (role: plugin runtime maintainer; confidence: high; commits: 4c11a520a82c, dc8b881c1166, d2ce3e9accc6; files: src/plugins/runtime.ts, src/plugins/loader.ts, src/gateway/server-plugins.ts)
  • gumadeiras: Commits added the runtime registry compatibility helper and shared runtime registry resolution used by ensureRuntimePluginsLoaded. (role: introduced adjacent behavior; confidence: medium; commits: fd0aac297c96, ee7f5825c85e; files: src/agents/runtime-plugins.ts, src/plugins/loader.ts)
  • jzakirov: The workspaceDir propagation work is directly relevant because the proposed fast path bypasses workspace-sensitive registry resolution. (role: workspace compatibility contributor; confidence: medium; commits: ffb5b99114bf; files: src/plugins/runtime.ts, src/gateway/server-plugins.ts)

Remaining risk / open question:

  • The exact latency win still needs a post-fix benchmark on the hosted gateway path after preserving workspace compatibility.

Codex review notes: model gpt-5.5, reasoning high; reviewed against efe6b37407a2.

@steipete

steipete commented May 2, 2026

Copy link
Copy Markdown
Contributor

Thanks for the focused PR and root-cause writeup. I rechecked this after pulling current main (6a54aac4892c), and the same bug class has now landed via #76004 / 8283c5d6cc3fa4f066789e2d650ef78370d3c117.

Current main keeps the important part of this fix while addressing the review blocker about workspace compatibility:

  • src/agents/runtime-plugins.ts now routes through ensureStandaloneRuntimePluginRegistryLoaded(...) instead of directly rebuilding the runtime registry.
  • src/plugins/runtime/standalone-runtime-registry-loader.ts first asks getLoadedRuntimePluginRegistry(...) for a compatible already-loaded registry before calling loadOpenClawPlugins(...).
  • src/plugins/active-runtime-registry.ts validates requested workspace and required plugin ids before reusing the active registry.

Closing this PR as superseded by the landed main-branch fix. The linked issue #74117 is now closeable from current-main proof.

@steipete steipete closed this May 2, 2026
quangtran88 added a commit to quangtran88/openclaw that referenced this pull request May 11, 2026
The first inbound message after boot triggers a full `loadOpenClawPlugins`
reload via `ensureStandaloneRuntimePluginRegistryLoaded` even when
`loadGatewayStartupPluginRuntime` has already populated an active registry
for the same workspace and plugin set. The strict cache-key match in
`getLoadedRuntimePluginRegistry(loadOptions)` returns `undefined` because
the dispatch-path callers (`ensureRuntimePluginsLoaded`) build a 3-field
load-options object while gateway-startup builds a 9+ field one — their
hashes never match.

Heap profiling (Node `--heap-prof`) on a clean prod-mirror host shows
~25 MB cumulative allocation under `dispatchReplyFromConfig →
ensureRuntimePluginsLoaded → ensureStandaloneRuntimePluginRegistryLoaded
→ loadOpenClawPlugins` on the first DM, costing the request ~4.4s of
extra dispatch latency before the LLM call even fires.

After the strict cache-key miss, re-query
`getLoadedRuntimePluginRegistry` without `loadOptions` so we take the
workspace + plugin-id branch on the active surface registry. This still
enforces workspace compatibility (via the existing
`getActivePluginRegistryWorkspaceDir` check) and still calls
`registryContainsPluginIds`, so we reuse a warm registry only when it
genuinely covers the request. Falls through to `loadOpenClawPlugins` for
workspace mismatch or missing plugins.

Refs openclaw#80682. Related (closed without merge): openclaw#74118.

Verification:
- Three regression tests added covering the cache-key-miss reuse path,
  workspace mismatch, and missing-plugin fall-through.
- Heap-snapshot diff (SIGUSR2 before/after one fresh DM, registry warm)
  shows retained delta = 73 KB; the 22 MB dispatch allocation that the
  first-DM-per-process pays is 99.7% transient, confirming the bottleneck
  is the redundant `loadOpenClawPlugins` call itself.
quangtran88 added a commit to quangtran88/openclaw that referenced this pull request May 12, 2026
The first inbound message after boot triggers a full `loadOpenClawPlugins`
reload via `ensureStandaloneRuntimePluginRegistryLoaded` even when
`loadGatewayStartupPluginRuntime` has already populated an active registry
for the same workspace and plugin set. The strict cache-key match in
`getLoadedRuntimePluginRegistry(loadOptions)` returns `undefined` because
the dispatch-path callers (`ensureRuntimePluginsLoaded`) build a 3-field
load-options object while gateway-startup builds a 9+ field one — their
hashes never match.

Heap profiling (Node `--heap-prof`) on a clean prod-mirror host shows
~25 MB cumulative allocation under `dispatchReplyFromConfig →
ensureRuntimePluginsLoaded → ensureStandaloneRuntimePluginRegistryLoaded
→ loadOpenClawPlugins` on the first DM, costing the request ~4.4s of
extra dispatch latency before the LLM call even fires.

After the strict cache-key miss, re-query
`getLoadedRuntimePluginRegistry` without `loadOptions` so we take the
workspace + plugin-id branch on the active surface registry. This still
enforces workspace compatibility (via the existing
`getActivePluginRegistryWorkspaceDir` check) and still calls
`registryContainsPluginIds`, so we reuse a warm registry only when it
genuinely covers the request. Falls through to `loadOpenClawPlugins` for
workspace mismatch or missing plugins.

Refs openclaw#80682. Related (closed without merge): openclaw#74118.

Verification:
- Three regression tests added covering the cache-key-miss reuse path,
  workspace mismatch, and missing-plugin fall-through.
- Heap-snapshot diff (SIGUSR2 before/after one fresh DM, registry warm)
  shows retained delta = 73 KB; the 22 MB dispatch allocation that the
  first-DM-per-process pays is 99.7% transient, confirming the bottleneck
  is the redundant `loadOpenClawPlugins` call itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ensureRuntimePluginsLoaded cache-misses every inbound dispatch (~5-6s wasted per message)

2 participants