Skip to content

memory.qmd.update.embedTimeoutMs default (120 s) is too low for local GGUF; error message doesn't surface the fix #74204

@Skeptomenos

Description

@Skeptomenos

Environment

  • OpenClaw version: 2026.4.25
  • QMD version: 2.1.0 (bab86d5)
  • Platform: Ubuntu 24.04, GCP e2-standard-4 (4 vCPU, 16 GB RAM)
  • Workspace size: 37+ Markdown files in memory root + memory/ tree
  • GGUF model: default (embeddinggemma-300m-qat-Q8_0.gguf, ~0.6 GB, auto-downloaded)
  • searchMode: query (hybrid BM25 + vector)

Observed behavior

After enabling hybrid search (searchMode: "query"), the gateway emits this warning repeatedly every 2–4 minutes:

{"subsystem":"memory","message":"qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s"}
{"subsystem":"memory","message":"qmd embed failed (interval): Error: qmd embed timed out after 120000ms; backing off for 120s"}

The embed process runs at 100–186% CPU on 4 vCPU for ~2 minutes before being killed. Peak RAM during GGUF model load: 9.6 GB. The embed never completes — every attempt times out at exactly 120 s. Vector search is effectively disabled.

Root cause

The GGUF embedding model takes 3–4 minutes on a 4-core CPU to embed a 37-file workspace. The default memory.qmd.update.embedTimeoutMs is 120 s — less than the actual embed duration.

The fix (memory.qmd.update.embedTimeoutMs: 600000) is only discoverable via openclaw config schema or the memory config reference page. The error message does not mention it.

Inconsistency with built-in engine

agents.defaults.memorySearch.sync.embeddingBatchTimeoutSeconds correctly differentiates:

  • 600 s for local/self-hosted providers (local, ollama, lmstudio)
  • 120 s for hosted providers (OpenAI, Gemini, etc.)

memory.qmd.update.embedTimeoutMs applies the same 120 s default regardless of whether the embedding model is a local GGUF or a hosted API. For local GGUF workloads, 120 s is consistently insufficient on commodity hardware.

Suggested fixes (pick one or combine)

  1. Increase the default for embedTimeoutMs to 600 s (matching the built-in engine's local-provider default), or detect when searchMode is vsearch/query with the default GGUF model and apply a longer default automatically.

  2. Improve the error message to mention the fix:

    qmd embed timed out after 120000ms — consider increasing memory.qmd.update.embedTimeoutMs (current default: 120000)
    
  3. Add to the QMD troubleshooting docs a dedicated entry:

    qmd embed timed out after 120000ms? Increase memory.qmd.update.embedTimeoutMs. Default is 120 000 ms. On commodity hardware with the default GGUF model and a 30–50 file workspace, embedding takes 3–5 minutes. Set to 600000 or higher.

Workaround

{
  memory: {
    qmd: {
      update: {
        embedTimeoutMs: 600000,  // 10 minutes — sufficient for ~40 files on 4-core CPU
      },
    },
  },
}

Or use a smaller/faster GGUF model via QMD_EMBED_MODEL:

export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"

Additional context

The gateway remains healthy throughout (Slack connected, Atlassian/GWS MCP tools working). The embed failures are WARN-level and non-blocking. However, with the default 120 s timeout, vector search is effectively permanently disabled on any commodity host with a medium or larger workspace, with no clear path to resolution from the error message alone.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions