perf(agents/runtime): short-circuit ensureRuntimePluginsLoaded when active registry exists#74118
Conversation
…ctive registry exists ensureRuntimePluginsLoaded is called from dispatchReplyFromConfig on every inbound message and was rebuilding the entire plugin registry on each call. Root cause: it builds a 3-field options object (config, workspaceDir, runtimeOptions) and calls resolveRuntimePluginRegistry. That ends up in getCompatibleActivePluginRegistry, which does a strict cacheKey equality check against the active registry's cache key. The boot path (loadGatewayPlugins) populates the active registry with a 9+ field options set (onlyPluginIds, activationSourceConfig, autoEnabledReasons, coreGatewayHandlers, ...). The dispatch-time hash and the boot-time hash always differ, the equality check always fails, and the dispatcher falls through to a full loadOpenClawPlugins — re-importing every plugin module, re-validating manifests, and re-running each plugin's register(). On hosted gateways with plugins.entries populated, that's ~5–6s of wasted wall-clock per inbound message even when the active registry was already a valid answer. The function exists to *ensure* runtime plugins are loaded — if the active registry is already populated, the goal is already met. Plugin reconfiguration (installs / uninstalls / auto-enable changes) already invalidates the active registry through other code paths (setActivePluginRegistry, gateway restart on config write), so a stale active registry is not a concern here. Updates the existing "does not reactivate plugins when a process already has an active registry" test (which was asserting one call — incorrect; the prior code reactivated every time) into a regression test that asserts zero calls when getActivePluginRegistry returns truthy. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR adds a fast-path short-circuit in Confidence Score: 4/5Safe to merge; the short-circuit is correct for all current call paths, with one minor architectural fragility worth noting. Only a P2 style/robustness concern: the truthy registry check does not distinguish a properly-loaded registry from an empty placeholder created by src/agents/runtime-plugins.ts — the short-circuit guard on line 31 Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/runtime-plugins.ts
Line: 31-33
Comment:
**Truthy check includes empty placeholder registries**
`getActivePluginRegistry()` returns any non-null `PluginRegistry`, including the empty object that `requireActivePluginRegistry()` allocates (via `createEmptyPluginRegistry()`) as a zero-plugin placeholder when called before any real load. If any code path invokes `requireActivePluginRegistry` before the first `ensureRuntimePluginsLoaded` call in a non-gateway process (e.g. an early `listChannelSetupPlugins` call during channel setup), this short-circuit would fire on that placeholder and silently skip real plugin loading.
This is not a bug in the current call graph — `dispatch-from-config.ts` calls `ensureRuntimePluginsLoaded` at line 391, before any hook runner or channel-setup operations — but the guard is fragile against future callers. A more future-proof check would be `getActivePluginRegistry()?.plugins.length > 0`, or a dedicated `hasActivePluginRegistryBeenLoaded()` boolean, to distinguish the gateway boot registry from a zero-plugin placeholder.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "perf(agents/runtime): short-circuit ensu..." | Re-trigger Greptile |
| if (getActivePluginRegistry()) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Truthy check includes empty placeholder registries
getActivePluginRegistry() returns any non-null PluginRegistry, including the empty object that requireActivePluginRegistry() allocates (via createEmptyPluginRegistry()) as a zero-plugin placeholder when called before any real load. If any code path invokes requireActivePluginRegistry before the first ensureRuntimePluginsLoaded call in a non-gateway process (e.g. an early listChannelSetupPlugins call during channel setup), this short-circuit would fire on that placeholder and silently skip real plugin loading.
This is not a bug in the current call graph — dispatch-from-config.ts calls ensureRuntimePluginsLoaded at line 391, before any hook runner or channel-setup operations — but the guard is fragile against future callers. A more future-proof check would be getActivePluginRegistry()?.plugins.length > 0, or a dedicated hasActivePluginRegistryBeenLoaded() boolean, to distinguish the gateway boot registry from a zero-plugin placeholder.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/runtime-plugins.ts
Line: 31-33
Comment:
**Truthy check includes empty placeholder registries**
`getActivePluginRegistry()` returns any non-null `PluginRegistry`, including the empty object that `requireActivePluginRegistry()` allocates (via `createEmptyPluginRegistry()`) as a zero-plugin placeholder when called before any real load. If any code path invokes `requireActivePluginRegistry` before the first `ensureRuntimePluginsLoaded` call in a non-gateway process (e.g. an early `listChannelSetupPlugins` call during channel setup), this short-circuit would fire on that placeholder and silently skip real plugin loading.
This is not a bug in the current call graph — `dispatch-from-config.ts` calls `ensureRuntimePluginsLoaded` at line 391, before any hook runner or channel-setup operations — but the guard is fragile against future callers. A more future-proof check would be `getActivePluginRegistry()?.plugins.length > 0`, or a dedicated `hasActivePluginRegistryBeenLoaded()` boolean, to distinguish the gateway boot registry from a zero-plugin placeholder.
How can I resolve this? If you propose a fix, please make it concise.|
Codex review: needs changes before merge. What this changes: The PR adds an early return in Required change before merge: The remaining blockers are narrow and automatable: preserve workspace compatibility in the fast path, add focused regression coverage, and add the required changelog entry. Security review: Security review cleared: The diff does not touch CI, dependencies, package resolution, secrets, permissions, or other supply-chain-sensitive surfaces; the concern found is functional workspace compatibility. Review findings:
Review detailsBest possible solution: Keep the performance direction, but make the short-circuit compatibility-aware: normalize the requested workspace first, reuse the active registry only when it is scoped to the same workspace/runtime surface, and fall through for mismatched workspaces or placeholder registries. Add a regression test for the mismatch case and the required changelog entry. Do we have a high-confidence way to reproduce the issue? Yes. A focused unit test can set an active registry for workspace A, call Is this the best way to solve the issue? No. The current PR is the narrowest idea for the hosted dispatch hot path, but the implementation should compare active-registry workspace compatibility or use an equivalent loader helper before skipping resolution. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against efe6b37407a2. |
|
Thanks for the focused PR and root-cause writeup. I rechecked this after pulling current Current
Closing this PR as superseded by the landed main-branch fix. The linked issue #74117 is now closeable from current-main proof. |
The first inbound message after boot triggers a full `loadOpenClawPlugins` reload via `ensureStandaloneRuntimePluginRegistryLoaded` even when `loadGatewayStartupPluginRuntime` has already populated an active registry for the same workspace and plugin set. The strict cache-key match in `getLoadedRuntimePluginRegistry(loadOptions)` returns `undefined` because the dispatch-path callers (`ensureRuntimePluginsLoaded`) build a 3-field load-options object while gateway-startup builds a 9+ field one — their hashes never match. Heap profiling (Node `--heap-prof`) on a clean prod-mirror host shows ~25 MB cumulative allocation under `dispatchReplyFromConfig → ensureRuntimePluginsLoaded → ensureStandaloneRuntimePluginRegistryLoaded → loadOpenClawPlugins` on the first DM, costing the request ~4.4s of extra dispatch latency before the LLM call even fires. After the strict cache-key miss, re-query `getLoadedRuntimePluginRegistry` without `loadOptions` so we take the workspace + plugin-id branch on the active surface registry. This still enforces workspace compatibility (via the existing `getActivePluginRegistryWorkspaceDir` check) and still calls `registryContainsPluginIds`, so we reuse a warm registry only when it genuinely covers the request. Falls through to `loadOpenClawPlugins` for workspace mismatch or missing plugins. Refs openclaw#80682. Related (closed without merge): openclaw#74118. Verification: - Three regression tests added covering the cache-key-miss reuse path, workspace mismatch, and missing-plugin fall-through. - Heap-snapshot diff (SIGUSR2 before/after one fresh DM, registry warm) shows retained delta = 73 KB; the 22 MB dispatch allocation that the first-DM-per-process pays is 99.7% transient, confirming the bottleneck is the redundant `loadOpenClawPlugins` call itself.
The first inbound message after boot triggers a full `loadOpenClawPlugins` reload via `ensureStandaloneRuntimePluginRegistryLoaded` even when `loadGatewayStartupPluginRuntime` has already populated an active registry for the same workspace and plugin set. The strict cache-key match in `getLoadedRuntimePluginRegistry(loadOptions)` returns `undefined` because the dispatch-path callers (`ensureRuntimePluginsLoaded`) build a 3-field load-options object while gateway-startup builds a 9+ field one — their hashes never match. Heap profiling (Node `--heap-prof`) on a clean prod-mirror host shows ~25 MB cumulative allocation under `dispatchReplyFromConfig → ensureRuntimePluginsLoaded → ensureStandaloneRuntimePluginRegistryLoaded → loadOpenClawPlugins` on the first DM, costing the request ~4.4s of extra dispatch latency before the LLM call even fires. After the strict cache-key miss, re-query `getLoadedRuntimePluginRegistry` without `loadOptions` so we take the workspace + plugin-id branch on the active surface registry. This still enforces workspace compatibility (via the existing `getActivePluginRegistryWorkspaceDir` check) and still calls `registryContainsPluginIds`, so we reuse a warm registry only when it genuinely covers the request. Falls through to `loadOpenClawPlugins` for workspace mismatch or missing plugins. Refs openclaw#80682. Related (closed without merge): openclaw#74118. Verification: - Three regression tests added covering the cache-key-miss reuse path, workspace mismatch, and missing-plugin fall-through. - Heap-snapshot diff (SIGUSR2 before/after one fresh DM, registry warm) shows retained delta = 73 KB; the 22 MB dispatch allocation that the first-DM-per-process pays is 99.7% transient, confirming the bottleneck is the redundant `loadOpenClawPlugins` call itself.
Summary
ensureRuntimePluginsLoaded(src/agents/runtime-plugins.ts:6) is called fromdispatchReplyFromConfigon every inbound message and rebuilds the entire plugin registry on each call. It builds a 3-field options object (config,workspaceDir,runtimeOptions) and hands it toresolveRuntimePluginRegistry.getCompatibleActivePluginRegistry's strictcacheKeyequality comparison fails — boot's options set has 9+ fields (onlyPluginIds,activationSourceConfig,autoEnabledReasons, etc.), so the two hashes always differ. The fall-through runs a fullloadOpenClawPlugins: Jiti-loads every plugin module, re-validates manifests, re-runs each plugin'sregister().ensureRuntimePluginsLoaded— ifgetActivePluginRegistry()is populated, return immediately. The function's intent is to ensure plugins are loaded; if they already are, the goal is met. Updates the existing test (which was asserting one call — that assertion was wrong; the prior behavior reactivated every time) into a proper regression test that asserts zero calls when the active registry exists.resolveRuntimePluginRegistry).loadGatewayPlugins's boot path.getCompatibleActivePluginRegistry's strict-equality logic — that's a more intrusive change to file separately if desired. Plugin reconfiguration paths that explicitly invalidate the active registry (setActivePluginRegistry, gateway restart on config write) — those continue to work as before.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause
ensureRuntimePluginsLoadedis the simplest member of a family of "ensure registry is loaded" helpers (cf.ensurePluginRegistryLoadedinplugins/runtime/runtime-registry-loader.ts, which has its own scope-tracking short-circuit viapluginRegistryLoadedrank). The agent-sideensureRuntimePluginsLoadedis a thin wrapper that defers all caching responsibility toresolveRuntimePluginRegistry → getCompatibleActivePluginRegistry. That worked when the cache-key derivation matched between boot and dispatch. It stopped working when the boot path grew additional options (onlyPluginIds,activationSourceConfig, etc.) that the dispatch path doesn't supply, but the strict-equality check was never relaxed to a "is-compatible-subset" check.A more thorough fix would be to make
getCompatibleActivePluginRegistryaccept subset-compatibility, but that's a larger surface area touching multiple call sites. This PR takes the minimal approach for this specific call site: short-circuit at the helper level. The function's contract is "ensure plugins are loaded", not "always rebuild the registry".Tests
src/agents/runtime-plugins.test.ts:getActivePluginRegistrymock to the existingvi.mock("../plugins/runtime.js", ...)block.does not reactivate plugins when a process already has an active registry— which was incorrectly asserting one call) intoshort-circuits without rebuilding load options when an active registry exists, asserting zero calls toresolveRuntimePluginRegistry.resolves runtime plugins through the shared runtime helper when no active registry is present, etc.) now mockgetActivePluginRegistryto returnundefinedso they exercise the fall-through path.Verification