Reliability suggestion: active-memory blocks replies and QMD boot initialization can overload gateway in 2026.4.24
Summary
On a multi-agent OpenClaw gateway, enabling the official active-memory plugin can make normal replies slow or unreliable. The main issue is that active-memory currently runs a full embedded agent/model call inside before_prompt_build, using the active conversation model by default, and waits for it before the actual user reply is built.
In the same environment, QMD memory startup initialization arms memory managers for all configured agents on gateway boot. Each manager can start its own boot update. This is useful, but when many agents are configured it can create a burst of QMD work at startup. Combined with active-memory running per user message, the gateway can experience high CPU, long response latency, and timeout cascades.
This is not a transcript duplication issue. It is a separate reliability/defaults concern.
Environment
- OpenClaw version:
2026.4.24
- Platform: macOS ARM64
- Runtime: Gateway with multiple agents
- Active model observed:
openai-codex/gpt-5.5
- QMD backend enabled for memory search
memory.qmd.update.onBoot = true
memory.qmd.update.embedInterval = 30m
memory.qmd.limits.timeoutMs = 40000
active-memory.timeoutMs = 30000
active-memory.queryMode = recent
active-memory.maxSummaryChars = 220
Observed behavior
When active-memory was enabled, every eligible interactive user message triggered lines like:
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 start timeoutMs=30000 queryChars=1505
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=timeout elapsedMs=40808 summaryChars=0
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=timeout elapsedMs=63548 summaryChars=0
active-memory: agent=tino session=... activeProvider=openai-codex activeModel=gpt-5.5 done status=empty elapsedMs=29864 summaryChars=0
So a 30s active-memory timeout produced 30–60s of extra latency and often returned no usable memory (summaryChars=0).
Separately, on gateway boot/restart, logs repeatedly showed:
qmd memory startup initialization armed for 10 agents: "tino", "jonathan", "betalpha-social", "jccat", "analyst", "news", "reporter", "social", "travel-pm", "family-pm"
This means boot initialization is not limited to the currently active agent/session. It initializes QMD memory for every configured agent with memory search enabled.
Current implementation notes
active-memory blocks prompt construction
extensions/active-memory/index.js registers:
api.on("before_prompt_build", async (event, ctx) => {
...
const result = await maybeResolveActiveRecall(...);
if (!result.summary) return;
return { prependContext: promptPrefix };
});
maybeResolveActiveRecall(...) calls runRecallSubagent(...).
runRecallSubagent(...) uses:
params.api.runtime.agent.runEmbeddedPiAgent({
provider: modelRef.provider,
model: modelRef.model,
timeoutMs: params.config.timeoutMs,
toolsAllow: ["memory_search", "memory_get"],
bootstrapContextMode: "lightweight",
silentExpected: true,
...
});
The model defaults to the current run model / agent primary model unless active-memory.config.model is explicitly set. In this environment that meant openai-codex/gpt-5.5 was used as a per-message memory recall subagent.
QMD onBoot initializes all agents
server-startup-memory-kT6lKCrb.js does:
const agentIds = listAgentIds(params.cfg);
for (const agentId of agentIds) {
if (!resolveMemorySearchConfig(params.cfg, agentId)) continue;
const resolved = resolveActiveMemoryBackendConfig({ cfg: params.cfg, agentId });
if (resolved.backend !== "qmd") continue;
await getActiveMemorySearchManager({ cfg: params.cfg, agentId });
armedAgentIds.push(agentId);
}
Each QMD manager can then run boot update because qmd.update.onBoot is true. In qmd-manager-LLKxprVD.js, initialize(...) starts:
if (this.qmd.update.onBoot) {
const bootRun = this.runUpdate("boot", true);
...
}
QMD update queueing is per qmdDir, so different agents can still start separate boot updates. The embed lock is global, but the update phase can still create a startup burst across agents.
Impact
- User replies are delayed before the actual model run begins.
- A failed/empty memory lookup can add tens of seconds while providing no useful context.
- Gateway CPU can spike during boot or restart when many agents arm QMD memory managers.
- Active-memory and QMD startup work can overlap, creating timeout cascades.
- Operators may think normal chat or model latency is broken, when the delay is pre-prompt memory recall.
Why this is surprising
The feature name suggests a lightweight memory retrieval layer, but the default behavior is closer to: run another full embedded LLM turn before each eligible user reply, using the same active model unless configured otherwise.
That may be powerful, but it is unsafe as a default for slow/expensive models or high-traffic agents.
Suggested fixes for next release
1. Make active-memory fail-open and non-blocking by default
Do not block the actual user reply on active-memory unless explicitly configured.
Possible modes:
mode: "nonblocking" default: start recall opportunistically; only inject if it returns very quickly.
mode: "blocking" opt-in: current behavior for operators who want maximum recall.
deadlineMs: hard budget for pre-prompt recall, default maybe 1000–3000ms.
If recall misses the deadline, skip injection and let the user reply proceed.
2. Use a cheap/fast recall model by default, not the current conversation model
If no active-memory.config.model is set, default to a lightweight model profile rather than ctx.modelProviderId/ctx.modelId or the agent primary model.
At minimum, warn loudly when active-memory inherits a slow/high-cost model.
3. Enforce hard timeout cancellation
Observed elapsed time exceeded configured timeoutMs substantially (30000 configured, 40808 / 63548 observed). The abort signal may not stop the embedded run promptly.
The active-memory timeout should be a hard wall-clock budget for the pre-prompt hook.
4. Add concurrency limits for active-memory recall
Per agent/session limits would prevent multiple simultaneous recall subagents from stacking during active chat bursts.
Suggested defaults:
- one active-memory recall per agent
- one active-memory recall per session
- drop or reuse cached result when a recall is already running
5. Add QMD boot concurrency control / startup jitter across agents
qmd memory startup initialization should avoid boot-time bursts across all configured agents.
Possible approaches:
- global max concurrent QMD boot updates, default 1
- jitter per agent
- lazy-initialize QMD manager on first memory search instead of arming every agent on boot
- separate
onBootAgents allowlist or onBoot: "activeOnly"
6. Make the operational cost visible in status/doctor
openclaw status / doctor could flag:
- active-memory is enabled and inherits a slow primary model
- active-memory timeout is high
- QMD onBoot is enabled for many agents
- active-memory is returning mostly timeout/empty results
Recommended safer default profile
For multi-agent gateways, a safer profile might be:
{
"plugins": {
"entries": {
"active-memory": {
"enabled": true,
"config": {
"enabled": true,
"mode": "nonblocking",
"timeoutMs": 2000,
"model": "<fast-cheap-default>",
"queryMode": "message",
"recentUserTurns": 1,
"recentAssistantTurns": 0,
"cacheTtlMs": 60000
}
}
}
},
"memory": {
"qmd": {
"update": {
"onBoot": "activeOnly",
"embedInterval": "60m",
"maxConcurrentBootUpdates": 1
}
}
}
}
Exact schema names can differ; the point is safer semantics.
Workaround used locally
Active-memory was disabled and memory-core dreaming was kept disabled. With those disabled, gateway CPU and reply latency returned to a stable baseline, while manual memory_search / QMD queries still worked.
This suggests the issue is not QMD search itself, but the combination of blocking per-message active-memory embedded LLM recall and broad QMD boot/update work.
Reliability suggestion: active-memory blocks replies and QMD boot initialization can overload gateway in 2026.4.24
Summary
On a multi-agent OpenClaw gateway, enabling the official
active-memoryplugin can make normal replies slow or unreliable. The main issue is thatactive-memorycurrently runs a full embedded agent/model call insidebefore_prompt_build, using the active conversation model by default, and waits for it before the actual user reply is built.In the same environment, QMD memory startup initialization arms memory managers for all configured agents on gateway boot. Each manager can start its own boot update. This is useful, but when many agents are configured it can create a burst of QMD work at startup. Combined with active-memory running per user message, the gateway can experience high CPU, long response latency, and timeout cascades.
This is not a transcript duplication issue. It is a separate reliability/defaults concern.
Environment
2026.4.24openai-codex/gpt-5.5memory.qmd.update.onBoot = truememory.qmd.update.embedInterval = 30mmemory.qmd.limits.timeoutMs = 40000active-memory.timeoutMs = 30000active-memory.queryMode = recentactive-memory.maxSummaryChars = 220Observed behavior
When
active-memorywas enabled, every eligible interactive user message triggered lines like:So a 30s active-memory timeout produced 30–60s of extra latency and often returned no usable memory (
summaryChars=0).Separately, on gateway boot/restart, logs repeatedly showed:
This means boot initialization is not limited to the currently active agent/session. It initializes QMD memory for every configured agent with memory search enabled.
Current implementation notes
active-memory blocks prompt construction
extensions/active-memory/index.jsregisters:maybeResolveActiveRecall(...)callsrunRecallSubagent(...).runRecallSubagent(...)uses:The model defaults to the current run model / agent primary model unless
active-memory.config.modelis explicitly set. In this environment that meantopenai-codex/gpt-5.5was used as a per-message memory recall subagent.QMD onBoot initializes all agents
server-startup-memory-kT6lKCrb.jsdoes:Each QMD manager can then run boot update because
qmd.update.onBootis true. Inqmd-manager-LLKxprVD.js,initialize(...)starts:QMD update queueing is per
qmdDir, so different agents can still start separate boot updates. The embed lock is global, but the update phase can still create a startup burst across agents.Impact
Why this is surprising
The feature name suggests a lightweight memory retrieval layer, but the default behavior is closer to: run another full embedded LLM turn before each eligible user reply, using the same active model unless configured otherwise.
That may be powerful, but it is unsafe as a default for slow/expensive models or high-traffic agents.
Suggested fixes for next release
1. Make active-memory fail-open and non-blocking by default
Do not block the actual user reply on active-memory unless explicitly configured.
Possible modes:
mode: "nonblocking"default: start recall opportunistically; only inject if it returns very quickly.mode: "blocking"opt-in: current behavior for operators who want maximum recall.deadlineMs: hard budget for pre-prompt recall, default maybe 1000–3000ms.If recall misses the deadline, skip injection and let the user reply proceed.
2. Use a cheap/fast recall model by default, not the current conversation model
If no
active-memory.config.modelis set, default to a lightweight model profile rather thanctx.modelProviderId/ctx.modelIdor the agent primary model.At minimum, warn loudly when active-memory inherits a slow/high-cost model.
3. Enforce hard timeout cancellation
Observed elapsed time exceeded configured
timeoutMssubstantially (30000configured,40808/63548observed). The abort signal may not stop the embedded run promptly.The active-memory timeout should be a hard wall-clock budget for the pre-prompt hook.
4. Add concurrency limits for active-memory recall
Per agent/session limits would prevent multiple simultaneous recall subagents from stacking during active chat bursts.
Suggested defaults:
5. Add QMD boot concurrency control / startup jitter across agents
qmd memory startup initializationshould avoid boot-time bursts across all configured agents.Possible approaches:
onBootAgentsallowlist oronBoot: "activeOnly"6. Make the operational cost visible in status/doctor
openclaw status/ doctor could flag:Recommended safer default profile
For multi-agent gateways, a safer profile might be:
{ "plugins": { "entries": { "active-memory": { "enabled": true, "config": { "enabled": true, "mode": "nonblocking", "timeoutMs": 2000, "model": "<fast-cheap-default>", "queryMode": "message", "recentUserTurns": 1, "recentAssistantTurns": 0, "cacheTtlMs": 60000 } } } }, "memory": { "qmd": { "update": { "onBoot": "activeOnly", "embedInterval": "60m", "maxConcurrentBootUpdates": 1 } } } }Exact schema names can differ; the point is safer semantics.
Workaround used locally
Active-memory was disabled and memory-core dreaming was kept disabled. With those disabled, gateway CPU and reply latency returned to a stable baseline, while manual
memory_search/ QMD queries still worked.This suggests the issue is not QMD search itself, but the combination of blocking per-message active-memory embedded LLM recall and broad QMD boot/update work.