Skip to content

[Feature]: Use QMD MCP server mode for warm model caching #9048

@meretrout

Description

@meretrout

Summary

When memory.backend = "qmd" is enabled, OpenClaw spawns a fresh qmd query process for each memory_search call. This causes ~19 second latency per search because QMD must cold-load three GGUF models on every invocation:

  • embeddinggemma-300M (query embedding)
  • qwen3-reranker-0.6b (reranking)
  • Qwen3-0.6B (query expansion/HyDE)

Observed Behavior

$ time qmd query "test query" --limit 3
...
qmd query "test query" --limit 3  6.82s user 1.15s system 41% cpu 19.095 total

Only 41% CPU utilization — most time is spent loading models from disk, not computing.

By contrast, qmd vsearch (vector-only, 1 model) takes ~3s, and qmd search (BM25, no models) takes ~0.2s.

Proposed Solution

QMD already exposes an MCP server (qmd mcp) that keeps models warm between queries. Instead of spawning fresh qmd query processes, OpenClaw could:

  1. Start qmd mcp as a persistent sidecar (similar to how other MCP servers are managed)
  2. Send search requests via MCP protocol instead of CLI spawn
  3. Models stay loaded → queries drop from ~19s to ~2-3s

The QMD MCP server exposes these tools:

  • qmd_search - BM25 keyword search
  • qmd_vsearch - Vector semantic search
  • qmd_query - Hybrid search with reranking
  • qmd_get - Document retrieval

Alternative: Add vsearch mode option

A simpler alternative would be adding a config option like memory.qmd.searchMode: "vsearch" to use vector-only search (~3s) instead of the full query pipeline (~19s). This trades some quality (no reranking/query expansion) for 6x speed improvement.

References

Environment

  • OpenClaw 2026.2.2
  • QMD (latest from tobi/qmd)
  • macOS (Apple Silicon)
  • Config: memory.backend: "qmd", memory.qmd.limits.timeoutMs: 20000

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions