Environment
- OpenClaw version: 2026.4.25
- QMD version: 2.1.0 (bab86d5)
- Platform: Ubuntu 24.04, GCP e2-standard-4 (4 vCPU, 16 GB RAM)
- Workspace size: 37+ Markdown files in memory root +
memory/ tree
- GGUF model: default (
embeddinggemma-300m-qat-Q8_0.gguf, ~0.6 GB, auto-downloaded)
- searchMode:
query (hybrid BM25 + vector)
Observed behavior
After enabling hybrid search (searchMode: "query"), the gateway emits this warning repeatedly every 2–4 minutes:
{"subsystem":"memory","message":"qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s"}
{"subsystem":"memory","message":"qmd embed failed (interval): Error: qmd embed timed out after 120000ms; backing off for 120s"}
The embed process runs at 100–186% CPU on 4 vCPU for ~2 minutes before being killed. Peak RAM during GGUF model load: 9.6 GB. The embed never completes — every attempt times out at exactly 120 s. Vector search is effectively disabled.
Root cause
The GGUF embedding model takes 3–4 minutes on a 4-core CPU to embed a 37-file workspace. The default memory.qmd.update.embedTimeoutMs is 120 s — less than the actual embed duration.
The fix (memory.qmd.update.embedTimeoutMs: 600000) is only discoverable via openclaw config schema or the memory config reference page. The error message does not mention it.
Inconsistency with built-in engine
agents.defaults.memorySearch.sync.embeddingBatchTimeoutSeconds correctly differentiates:
- 600 s for local/self-hosted providers (
local, ollama, lmstudio)
- 120 s for hosted providers (OpenAI, Gemini, etc.)
memory.qmd.update.embedTimeoutMs applies the same 120 s default regardless of whether the embedding model is a local GGUF or a hosted API. For local GGUF workloads, 120 s is consistently insufficient on commodity hardware.
Suggested fixes (pick one or combine)
-
Increase the default for embedTimeoutMs to 600 s (matching the built-in engine's local-provider default), or detect when searchMode is vsearch/query with the default GGUF model and apply a longer default automatically.
-
Improve the error message to mention the fix:
qmd embed timed out after 120000ms — consider increasing memory.qmd.update.embedTimeoutMs (current default: 120000)
-
Add to the QMD troubleshooting docs a dedicated entry:
qmd embed timed out after 120000ms? Increase memory.qmd.update.embedTimeoutMs. Default is 120 000 ms. On commodity hardware with the default GGUF model and a 30–50 file workspace, embedding takes 3–5 minutes. Set to 600000 or higher.
Workaround
{
memory: {
qmd: {
update: {
embedTimeoutMs: 600000, // 10 minutes — sufficient for ~40 files on 4-core CPU
},
},
},
}
Or use a smaller/faster GGUF model via QMD_EMBED_MODEL:
export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"
Additional context
The gateway remains healthy throughout (Slack connected, Atlassian/GWS MCP tools working). The embed failures are WARN-level and non-blocking. However, with the default 120 s timeout, vector search is effectively permanently disabled on any commodity host with a medium or larger workspace, with no clear path to resolution from the error message alone.
Environment
memory/treeembeddinggemma-300m-qat-Q8_0.gguf, ~0.6 GB, auto-downloaded)query(hybrid BM25 + vector)Observed behavior
After enabling hybrid search (
searchMode: "query"), the gateway emits this warning repeatedly every 2–4 minutes:The embed process runs at 100–186% CPU on 4 vCPU for ~2 minutes before being killed. Peak RAM during GGUF model load: 9.6 GB. The embed never completes — every attempt times out at exactly 120 s. Vector search is effectively disabled.
Root cause
The GGUF embedding model takes 3–4 minutes on a 4-core CPU to embed a 37-file workspace. The default
memory.qmd.update.embedTimeoutMsis 120 s — less than the actual embed duration.The fix (
memory.qmd.update.embedTimeoutMs: 600000) is only discoverable viaopenclaw config schemaor the memory config reference page. The error message does not mention it.Inconsistency with built-in engine
agents.defaults.memorySearch.sync.embeddingBatchTimeoutSecondscorrectly differentiates:local,ollama,lmstudio)memory.qmd.update.embedTimeoutMsapplies the same 120 s default regardless of whether the embedding model is a local GGUF or a hosted API. For local GGUF workloads, 120 s is consistently insufficient on commodity hardware.Suggested fixes (pick one or combine)
Increase the default for
embedTimeoutMsto 600 s (matching the built-in engine's local-provider default), or detect whensearchModeisvsearch/querywith the default GGUF model and apply a longer default automatically.Improve the error message to mention the fix:
Add to the QMD troubleshooting docs a dedicated entry:
Workaround
Or use a smaller/faster GGUF model via
QMD_EMBED_MODEL:Additional context
The gateway remains healthy throughout (Slack connected, Atlassian/GWS MCP tools working). The embed failures are WARN-level and non-blocking. However, with the default 120 s timeout, vector search is effectively permanently disabled on any commodity host with a medium or larger workspace, with no clear path to resolution from the error message alone.