Summary
When memory.backend = "qmd" is enabled, OpenClaw spawns a fresh qmd query process for each memory_search call. This causes ~19 second latency per search because QMD must cold-load three GGUF models on every invocation:
- embeddinggemma-300M (query embedding)
- qwen3-reranker-0.6b (reranking)
- Qwen3-0.6B (query expansion/HyDE)
Observed Behavior
$ time qmd query "test query" --limit 3
...
qmd query "test query" --limit 3 6.82s user 1.15s system 41% cpu 19.095 total
Only 41% CPU utilization — most time is spent loading models from disk, not computing.
By contrast, qmd vsearch (vector-only, 1 model) takes ~3s, and qmd search (BM25, no models) takes ~0.2s.
Proposed Solution
QMD already exposes an MCP server (qmd mcp) that keeps models warm between queries. Instead of spawning fresh qmd query processes, OpenClaw could:
- Start
qmd mcp as a persistent sidecar (similar to how other MCP servers are managed)
- Send search requests via MCP protocol instead of CLI spawn
- Models stay loaded → queries drop from ~19s to ~2-3s
The QMD MCP server exposes these tools:
qmd_search - BM25 keyword search
qmd_vsearch - Vector semantic search
qmd_query - Hybrid search with reranking
qmd_get - Document retrieval
Alternative: Add vsearch mode option
A simpler alternative would be adding a config option like memory.qmd.searchMode: "vsearch" to use vector-only search (~3s) instead of the full query pipeline (~19s). This trades some quality (no reranking/query expansion) for 6x speed improvement.
References
Environment
- OpenClaw 2026.2.2
- QMD (latest from tobi/qmd)
- macOS (Apple Silicon)
- Config:
memory.backend: "qmd", memory.qmd.limits.timeoutMs: 20000
Summary
When
memory.backend = "qmd"is enabled, OpenClaw spawns a freshqmd queryprocess for eachmemory_searchcall. This causes ~19 second latency per search because QMD must cold-load three GGUF models on every invocation:Observed Behavior
Only 41% CPU utilization — most time is spent loading models from disk, not computing.
By contrast,
qmd vsearch(vector-only, 1 model) takes ~3s, andqmd search(BM25, no models) takes ~0.2s.Proposed Solution
QMD already exposes an MCP server (
qmd mcp) that keeps models warm between queries. Instead of spawning freshqmd queryprocesses, OpenClaw could:qmd mcpas a persistent sidecar (similar to how other MCP servers are managed)The QMD MCP server exposes these tools:
qmd_search- BM25 keyword searchqmd_vsearch- Vector semantic searchqmd_query- Hybrid search with rerankingqmd_get- Document retrievalAlternative: Add vsearch mode option
A simpler alternative would be adding a config option like
memory.qmd.searchMode: "vsearch"to use vector-only search (~3s) instead of the full query pipeline (~19s). This trades some quality (no reranking/query expansion) for 6x speed improvement.References
Environment
memory.backend: "qmd",memory.qmd.limits.timeoutMs: 20000