Skip to content

Plugin registry LRU cache evictions from cron scope-key proliferation cause 22-35s synchronous event loop blocks #75851

@jared-rebel

Description

@jared-rebel

Summary

On deployments with many enabled cron jobs, the plugin registry LRU cache fills with scope-key variants, causing periodic 22–35s synchronous event loop blocks when a workspace entry is evicted and must cold-reload.

Root cause

pluginLoaderCacheState in loader-CLyHx60E.js is a 128-slot LRU keyed by:

workspace :: global :: stock :: JSON(plugins + installs + loadPaths) :: scopeKey :: ...

The scopeKey component varies per invocation based on onlyPluginIds. On a deployment with 64 enabled cron jobs, each cron session can request a different plugin subset, generating a distinct (workspace × scope) cache key. With enough cron jobs firing in a short window, the 128 slots fill with scope variants and LRU-evict the main workspace's warm entry.

The cold reload involves synchronous jiti TypeScript compilation + V8 JIT for the full workspace plugin set — 22–35s that blocks the event loop entirely, causing P99 spikes visible to all concurrent WebSocket connections.

Evidence

Production trace (OpenClaw 2026.4.29, 64 enabled cron jobs, Linux VPS):

22:42:14  runtime-plugins:5ms    ← warm (just loaded at startup)
22:45:02  runtime-plugins:3ms    ← warm
22:49:38  runtime-plugins:3ms    ← warm
22:54:58  runtime-plugins:3ms    ← warm  (session f89b801e)
23:01:03  runtime-plugins:27299ms ← COLD LOAD after cron burst
23:02:41  runtime-plugins:3ms    ← warm again
23:06:47  runtime-plugins:35059ms ← COLD LOAD — same session f89b801e that was warm at 22:54

The 23:01 and 23:06 cold loads happen after the scheduler fires multiple crons in the same window (SMS poller every 3min, reply-triage every 5min, watchdogs every 30min, etc.). The resulting scope-key churn rotates the main workspace entry out of the LRU.

The P99 event loop delay spike during a cold load:

eventLoopDelayP99Ms=16064  eventLoopUtilization=1  active=1

Followed by a command lane timeout on the in-flight session.

Why it's painful

  • The cold reload is synchronous — it blocks the entire Node.js event loop for 22–35s, affecting all concurrent connections, not just the affected cron session.
  • It recurs unpredictably — any run that happens to follow a cron burst that filled the LRU triggers it.
  • There is no signal in the logs that an LRU eviction occurred; the only symptom is a sudden runtime-plugins spike on a session that was previously warm.

Proposed fixes

Option A (preferred): Pre-warm configured agent workspaces at gateway startup

On startup, iterate all configured agent IDs, resolve their workspace dirs, and eagerly populate the LRU with the full-scope registry for each. This ensures main + any auxiliary agent workspaces are always in cache regardless of cron scope churn. Cost: ~25s at startup (acceptable, already happens for the first cron anyway). Benefit: eliminates all mid-session cold loads for configured agents.

Option B: Make the cold load async / off the main thread

Move jiti compilation and V8 JIT warmup to a worker thread or break it into async microtasks so the event loop remains responsive during the 25s load period. Harder to implement but would eliminate the event loop blocking entirely regardless of cache state.

Option C: Scope-insensitive warming for known agents

When a request comes in for a workspace that is in the LRU under a different scope key, serve the full-scope registry (warm) rather than doing a cold load for the requested scope. This requires checking whether a superset-scoped entry is available before declaring a cache miss.

Workaround

Disable cron jobs that run on auxiliary agents (e.g., haiku-utility) that aren't strictly necessary — reduces LRU churn frequency. Does not eliminate the root cause.

Related: #67000 (session reuse for embedded agents) touches adjacent territory but focuses on session lifecycle rather than the plugin registry cache specifically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions