Bug type
Crash (process/app exits or hangs)
Beta release blocker
No
Summary
resolveExternalCatalogPreferOver performs uncached synchronous disk reads (3 files per call) inside an O(N^2) loop over plugin auto-enable candidates, causing openclaw doctor to hang indefinitely at 100% CPU when agents.list contains ~50+ agents with per-agent model overrides.
Steps to reproduce
-
Configure openclaw.json with:
- Multiple auth profiles (synthetic, openrouter, together, zai, minimax)
- Corresponding
models.providers entries
agents.list with ~130 agents, each declaring model: { primary: "provider/model", fallbacks: [...] }
plugins.entries enabling telegram, discord, and provider plugins
channels with telegram and discord enabled
-
Run openclaw doctor
-
Observe: process hangs after displaying the "Plugins" box, consuming 100% CPU indefinitely.
Minimal repro: any config with ~50+ agents.list entries each declaring model overrides
using 3+ different provider prefixes (e.g. synthetic/, openrouter/, together/) should trigger the issue.
Removing all agents.list[].model fields resolves the hang.
Expected behavior
openclaw doctor completes all steps and exits within a reasonable time regardless of the number of configured agents.
Actual behavior
Process hangs after the Plugins step. strace shows a tight loop of synchronous reads of ~/.openclaw/mpm/plugins.json, ~/.openclaw/mpm/catalog.json, and ~/.openclaw/plugins/catalog.json repeating indefinitely. Process must be killed manually. On a 16 GB machine, the process reached 682 MB RSS before being killed, suggesting a possible memory leak in the loop as well.
OpenClaw version
2026.4.11
Operating system
Linux Mint 22.3 (x86_64), Node v22.22.0
Install method
npm global
Model
N/A (bug is in config resolution, not model calls)
Provider / routing chain
N/A (hang occurs before any provider calls)
Additional provider/model setup details
Config uses 5 custom provider prefixes across agents:
- synthetic (OpenAI-compat, api.synthetic.new)
- openrouter
- together
- zai (OpenAI-compat, api.z.ai)
- minimax (Anthropic-compat, api.minimax.io)
Each of ~130 agents declares model.primary + 1-2 fallbacks using these providers.
agents.defaults.model also declares a primary + fallbacks.
plugins.entries explicitly enables telegram, discord, minimax, synthetic, openrouter, together, zai.
Logs, screenshots, and evidence
strace output showing the tight read loop:
access("/home/user/.openclaw/mpm/plugins.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/plugins.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "[]\n", 8192) = 3
read(24, "", 8192) = 0
close(24) = 0
access("/home/user/.openclaw/mpm/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192) = 3
read(24, "", 8192) = 0
close(24) = 0
access("/home/user/.openclaw/plugins/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/plugins/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192) = 3
read(24, "", 8192) = 0
close(24) = 0
[repeats indefinitely]
Root cause traced to `plugin-auto-enable-rMc8VJBA.js`:
- `materializePluginAutoEnableCandidatesInternal` iterates all candidates (line ~587)
- For each candidate, `shouldSkipPreferredPluginAutoEnable` iterates all *other* candidates (line ~235)
- For each pair, `resolvePreferredOverIds` calls `resolveExternalCatalogPreferOver` (line ~232)
- `resolveExternalCatalogPreferOver` performs 3 synchronous `fs.readFileSync` + `fs.existsSync` calls with no caching (lines 207-216)
- Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs
Confirmed workaround: adding a `Map`-based memo cache on `channelId` to `resolveExternalCatalogPreferOver` resolves the hang completely. Doctor completes in seconds with the full 130-agent config.
Impact and severity
- Affected: Any user with a large multi-agent config using per-agent model overrides across multiple providers.
- Severity: Blocks workflow.
openclaw doctor and openclaw gateway restart both hang, making the system unusable until model overrides are removed.
- Frequency: 100% reproducible with ~50+ agents declaring model overrides.
- Consequence: Unable to start or restart the gateway, run doctor, or use the system at all without stripping per-agent model config.
Additional information
Suggested fix: memoise resolveExternalCatalogPreferOver by channelId. The external catalog files do not change during a single process invocation, so caching is safe. Patch applied locally and confirmed working:
const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
for (const rawPath of resolveExternalCatalogPaths(env)) {
const resolved = resolveUserPath(rawPath, env);
if (!fs.existsSync(resolved)) continue;
try {
const channel = parseExternalCatalogChannelEntries(JSON.parse(fs.readFileSync(resolved, "utf-8"))).find((entry) => entry.id === channelId);
if (channel) { _externalCatalogPreferOverCache.set(channelId, channel.preferOver); return channel.preferOver; }
} catch {}
}
const _result = []; _externalCatalogPreferOverCache.set(channelId, _result); return _result;
}
An additional improvement would be to also cache resolveExternalCatalogPaths and the parsed file contents, since those are invariant within a process run and currently re-read for every unique channelId.
Bisection results confirming the trigger:
plugins.enabled = false -> doctor completes (auto-enable skipped entirely)
agents.list = [] + del(agents.defaults.model) -> doctor completes
agents.defaults.model alone (no agents.list models) -> doctor completes
agents.defaults.models catalog alone (2 entries) -> doctor completes
- Full agents.list with per-agent models -> hangs
Bug type
Crash (process/app exits or hangs)
Beta release blocker
No
Summary
resolveExternalCatalogPreferOverperforms uncached synchronous disk reads (3 files per call) inside an O(N^2) loop over plugin auto-enable candidates, causingopenclaw doctorto hang indefinitely at 100% CPU when agents.list contains ~50+ agents with per-agent model overrides.Steps to reproduce
Configure
openclaw.jsonwith:models.providersentriesagents.listwith ~130 agents, each declaringmodel: { primary: "provider/model", fallbacks: [...] }plugins.entriesenabling telegram, discord, and provider pluginschannelswith telegram and discord enabledRun
openclaw doctorObserve: process hangs after displaying the "Plugins" box, consuming 100% CPU indefinitely.
Minimal repro: any config with ~50+ agents.list entries each declaring model overrides
using 3+ different provider prefixes (e.g. synthetic/, openrouter/, together/) should trigger the issue.
Removing all agents.list[].model fields resolves the hang.
Expected behavior
openclaw doctorcompletes all steps and exits within a reasonable time regardless of the number of configured agents.Actual behavior
Process hangs after the Plugins step.
straceshows a tight loop of synchronous reads of~/.openclaw/mpm/plugins.json,~/.openclaw/mpm/catalog.json, and~/.openclaw/plugins/catalog.jsonrepeating indefinitely. Process must be killed manually. On a 16 GB machine, the process reached 682 MB RSS before being killed, suggesting a possible memory leak in the loop as well.OpenClaw version
2026.4.11
Operating system
Linux Mint 22.3 (x86_64), Node v22.22.0
Install method
npm global
Model
N/A (bug is in config resolution, not model calls)
Provider / routing chain
N/A (hang occurs before any provider calls)
Additional provider/model setup details
Config uses 5 custom provider prefixes across agents:
Each of ~130 agents declares model.primary + 1-2 fallbacks using these providers.
agents.defaults.model also declares a primary + fallbacks.
plugins.entries explicitly enables telegram, discord, minimax, synthetic, openrouter, together, zai.
Logs, screenshots, and evidence
Impact and severity
openclaw doctorandopenclaw gateway restartboth hang, making the system unusable until model overrides are removed.Additional information
Suggested fix: memoise
resolveExternalCatalogPreferOverbychannelId. The external catalog files do not change during a single process invocation, so caching is safe. Patch applied locally and confirmed working:An additional improvement would be to also cache
resolveExternalCatalogPathsand the parsed file contents, since those are invariant within a process run and currently re-read for every unique channelId.Bisection results confirming the trigger:
plugins.enabled = false-> doctor completes (auto-enable skipped entirely)agents.list = []+del(agents.defaults.model)-> doctor completesagents.defaults.modelalone (no agents.list models) -> doctor completesagents.defaults.modelscatalog alone (2 entries) -> doctor completes