Skip to content

Gateway CPU pegged at ~99% with plugins.bundledDiscovery: "compat" — manifest-registry hot loop on idle #79806

@JanPlessow

Description

@JanPlessow

Summary

With plugins.bundledDiscovery: "compat" (the legacy default flagged by openclaw doctor), the gateway pegs ~1 CPU core at steady state with zero active runs. CPU profile shows the time is in manifest-registry-installed, discovery, and boundary-path modules walking bundled plugin manifests in a hot loop. Setting plugins.bundledDiscovery: "allowlist" returns the gateway to ~5% CPU and 96% idle in profiles. I expect compat shouldn't have a runtime cost this large.

Environment

  • OpenClaw 2026.5.7 (commit eeef486)
  • Node 24.14.0 on macOS (Darwin 25.3.0, arm64)
  • 9 plugins enabled, 84 disabled bundled plugins
  • plugins.allow set to the 9 enabled plugins
  • plugins.bundledDiscovery initially "compat", then "allowlist"

Observed behavior — compat mode

  • Gateway uptime ~1h, no active runs (hasActiveRun=false for all 146 sessions in sessions.list)
  • ps: 99.8% CPU sustained, 716 MB RSS
  • /readyz returned eventLoop.degraded=false (binary check), but gateway call health returned eventLoop.degraded=true with utilization=1, cpuCoreRatio≈1.0
  • gateway call health itself took 4.8 seconds to respond
  • Under load, delayP99Ms reached 1626 ms and delayMaxMs peaked at 84 seconds — propagated to clients as closed before connect (255 in 24h) and handshake timeout (118 in 24h)

CPU profile (V8 inspector via --inspect, 20s window, 5,059 samples, no graphs running):

 34.7%  (idle)
 13.9%  lstat                                               (native fs)
 10.7%  open                                                (native fs)
  5.1%  readFileUtf8                                        (native fs)
  3.5%  path.resolve
  2.9%  stat
  2.4%  existsSync
  2.1%  realpathSync
  1.1%  readJsonFileSync                  openclaw/json-files
  0.9%  isPathInside                      openclaw/boundary-path
  0.7%  resolveInstalledManifestRegistryIndexFingerprint    openclaw/manifest-registry-installed
  0.5%  resolveBoundaryPathSync           openclaw/boundary-path
  0.5%  resolveInstalledPackageMetadata   openclaw/manifest-registry-installed
  0.4%  loadPluginManifest                openclaw/discovery
  0.4%  loadInstalledPluginIndexInstallRecordsSync  openclaw/manifest-registry
  0.4%  readTrustedPackageManifest        openclaw/discovery
  0.4%  safeRealpathSync                  openclaw/discovery
  0.3%  readJsonObjectFileSync            openclaw/manifest-registry
  0.3%  advanceCanonicalCursorForSegment  openclaw/boundary-path
  0.3%  getResult                         openclaw/plugin-cache-primitives

37% of total time is fs syscalls (lstat + open + stat + realpath + readFile* + existsSync). The named JS frames are all in plugin-discovery/manifest-registry. Not provider plugins, not in-flight tool calls — just plugin-manifest scanning, continuously.

After — plugins.bundledDiscovery: "allowlist"

Same workload, 30s window, 8,051 samples:

 96.0%  (idle)
  1.9%  (program)
  0.5%  lstat
  0.2%  open
  0.1%  realpathSync
  0.1%  readFileUtf8
  0.1%  existsSync
  0.0%  readJsonObjectFileSync   openclaw/manifest-registry  (4 samples)
  ...all manifest-registry / discovery hot frames are gone

/readyz:

  • Before: cpuCoreRatio=1.0, delayP99Ms=1626
  • After: cpuCoreRatio≈0.05, delayP99Ms=22

gateway call health durationMs: 4826 → ~150.

Reproducer

  1. Install OpenClaw with the default plugins.bundledDiscovery: "compat" (or set it explicitly).
  2. Configure plugins.allow with a small subset (~9 plugins). Leave the other ~84 bundled plugins disabled but installed.
  3. Run the gateway with --inspect=127.0.0.1:9229. Don't trigger any model calls.
  4. Profile with Chrome DevTools (or similar CDP client) for 30s.
  5. Observe steady-state CPU at ~100% and the manifest-registry/discovery functions dominating self-time.
  6. openclaw config set plugins.bundledDiscovery allowlist, restart gateway, profile again. CPU drops to ~5%, hot frames are gone.

Suggested fix direction

I don't know the codebase well enough to be prescriptive, but the symptom suggests the bundled-plugin discovery in compat mode is re-walking every bundled plugin's manifest on a tight schedule (or worse, on every gateway request) instead of cache-and-invalidate-on-change. Either:

  • Cache bundled-plugin discovery results across the lifetime of the gateway with explicit invalidation on plugin change, or
  • Make compat mode log a deprecation warning and shorten the discovery loop, or
  • Default bundledDiscovery to "allowlist" for new installs and let compat be opt-in for legacy configs that need it.

Worth noting that openclaw doctor already flags this with: "plugins.allow is restrictive, but bundled provider discovery is still in legacy compatibility mode. ... set plugins.bundledDiscovery to 'allowlist' after confirming omitted bundled providers are intentionally blocked." — which is exactly the right advice. The CPU cost just isn't proportional to "legacy compatibility".

Thanks

Thanks for the consistent fixes around plugin discovery / event-loop saturation in 2026.5.5–5.7 — happy to provide the raw .cpuprofile files if useful.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions