fix(memory): increase QMD embedTimeoutMs default to 600s for local GG… by tanshanshan · Pull Request #87572 · openclaw/openclaw

tanshanshan · 2026-05-28T07:40:56Z

Summary

Problem: Local GGUF models (e.g., embeddinggemma-300m) take 3-4 minutes to embed a 30-50 file workspace on commodity hardware (4-core CPU), but memory.qmd.update.embedTimeoutMs default is only 120s, causing repeated timeout failures and effectively disabling vector search.
Solution: Increase DEFAULT_QMD_EMBED_TIMEOUT_MS from 120_000 to 600_000 to match built-in engine's embeddingBatchTimeoutSeconds local-provider default. Improve error message to surface config path: memory.qmd.update.embedTimeoutMs.
What changed: 3 files — backend-config.ts (default change), qmd-manager.ts (error message), backend-config.test.ts (test expectation).
What did NOT change: No runtime behavior change beyond timeout threshold. No config schema change. No API surface change.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Fixes memory.qmd.update.embedTimeoutMs default (120 s) is too low for local GGUF; error message doesn't surface the fix #74204
Related [Bug]: Memory index with Ollama embeddings times out after 120s even though embeddings are ready and batch is disabled #60856 (Ollama embeddings timeout, already fixed)
This PR fixes a bug or regression

Motivation

The current 120s default is insufficient for local GGUF workloads, while built-in engine correctly uses 600s for local/self-hosted providers. Users discover the fix (embedTimeoutMs: 600000) only via openclaw config schema or docs, and error message does not hint at the config path. This PR aligns defaults and surfaces the fix in the error message.

Real behavior proof (required for external PRs)

Behavior addressed: Increase QMD embed timeout default from 120s to 600s; improve error message to mention config path.
Real environment tested: Linux (x86_64), Node 22, local OpenClaw checkout on fix/qmd-embed-timeout-74204 branch.
Exact steps or command run after this patch:

grep -n "DEFAULT_QMD_EMBED_TIMEOUT_MS" packages/memory-host-sdk/src/host/backend-config.ts

grep -n "qmd embed failed" extensions/memory-core/src/memory/qmd-manager.ts

node scripts/run-vitest.mjs packages/memory-host-sdk/src/host/backend-config.test.ts -t "embedTimeoutMs"

Evidence after fix:

$ grep -n "DEFAULT_QMD_EMBED_TIMEOUT_MS" packages/memory-host-sdk/src/host/backend-config.ts
102:const DEFAULT_QMD_EMBED_TIMEOUT_MS = 600_000;

$ grep -n "qmd embed failed" extensions/memory-core/src/memory/qmd-manager.ts
1800:    log.warn(
1801:      `qmd embed failed (${reason}): ${String(err)} — consider increasing memory.qmd.update.embedTimeoutMs (default: 600000); backing off for ${Math.ceil(delayMs / 1000)}s`,
1802:    );

$ node scripts/run-vitest.mjs packages/memory-host-sdk/src/host/backend-config.test.ts -t "embedTimeoutMs"
✓ backend-config.test.ts (1 test) 1 passed

Observed result after fix: Default changed from 120_000 to 600_000. Error message now includes config path hint. Test expectation updated and passed.
What was not tested: Full gateway startup with QMD GGUF embedding on real workspace (no GGUF model available locally).

Root Cause (if applicable)

N/A — config default mismatch, not a bug in runtime logic.

Regression Test Plan (if applicable)

N/A — default change, no behavior regression risk.

User-visible / Behavior Changes

Users with local GGUF embedding will see fewer timeout failures after upgrade. Error message now hints at config path if timeout still occurs.

Diagram (if applicable)

Before: DEFAULT_QMD_EMBED_TIMEOUT_MS = 120_000
After:  DEFAULT_QMD_EMBED_TIMEOUT_MS = 600_000

Error message before: "qmd embed failed (...) — backing off for ..."
Error message after:  "qmd embed failed (...) — consider increasing memory.qmd.update.embedTimeoutMs (default: 600000); backing off for ..."

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: Linux x86_64
Runtime/container: Node 22
Model/provider: N/A
Integration/channel: QMD memory
Relevant config (redacted): N/A

Steps

Checkout fix/qmd-embed-timeout-74204 branch
Verify default changed via grep
Verify error message updated via grep
Run test expectation

Expected

DEFAULT_QMD_EMBED_TIMEOUT_MS is 600_000. Error message mentions memory.qmd.update.embedTimeoutMs. Test passes.

Actual

All checks pass as expected.

Evidence

Failing test/log before + passing after
Trace/log snippets (grep output)
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: Default value change, error message update, test expectation alignment
Edge cases checked: N/A (simple config change)
What you did NOT verify: Live gateway startup with QMD GGUF embedding

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No (default only)
Migration needed? No

Risks and Mitigations

None — simple default increase aligning with built-in engine's local provider default.

Fixes #74204

clawsweeper · 2026-05-28T07:42:56Z

Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 2:27 AM ET / 06:27 UTC.

Summary
The PR raises the QMD embedTimeoutMs default from 120s to 600s and updates the QMD embed warning text, schema help, and backend-config test expectations.

PR surface: Source 0, Tests 0. Total 0 across 4 files.

Reproducibility: Partially: source-level reproduction is clear because current main resolves unset QMD embed timeout to 120_000 and passes it to qmd embed. I did not establish a live current-main QMD/GGUF workspace repro in this read-only review.

Review metrics: 2 noteworthy metrics.

QMD embed timeout default: 1 default changed: 120000 ms -> 600000 ms. This changes unset QMD behavior for existing users, so maintainers should notice the upgrade impact before merge.
Derived QMD lock budget: 600000 ms implies 1200s stale and 3600s retry wait budget before min clamps. The lock helper derives budgets from embedTimeoutMs, so the default change affects more than the child-process timeout.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P1] Add redacted terminal logs, live output, or a linked artifact showing real QMD/GGUF embedding completes under the unset 600s default, or get maintainer proof override.
[P1] Make the maintainer acceptance of the longer default and derived lock budgets explicit before merge.

Proof guidance:

[P1] Needs stronger real behavior proof before merge: The PR has terminal grep/test output, but not real QMD/GGUF embed completion or runtime warning proof; add redacted logs/live output or a linked artifact, or obtain a maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

[P1] Existing QMD users with unset memory.qmd.update.embedTimeoutMs would move from a 120s embed kill threshold to 600s, which can make failed embeds take longer to surface after upgrade.
[P1] Because QMD lock stale/retry budgets scale from embedTimeoutMs, the new default implies 1200s stale and 3600s retry wait budgets before minimum clamps, so maintainers should explicitly accept that operator impact.
[P1] The supplied proof shows code/test output but not a real QMD/GGUF embed completing under the new unset default or the updated warning appearing in runtime logs.

Maintainer options:

Require QMD proof or an explicit override (recommended)
Add redacted terminal logs, live output, or a linked artifact showing a real QMD/GGUF embed completing under the unset 600s default, or have a maintainer explicitly record proof override and default-risk acceptance.
Accept the broader default change knowingly
Maintainers can choose to merge once they are comfortable that longer unset QMD embed and lock waits are the intended upgrade behavior for all QMD users.
Pause for a narrower timeout strategy
If the broad default is too risky, pause this PR and pursue a local-GGUF-specific default/help path or a docs-only workaround instead.

Next step before merge

[P1] Human review is needed for proof override and compatibility acceptance; the latest head appears to have handled the mechanical test/schema-help updates.

Security
Cleared: The diff changes timeout defaults, warning/help text, and tests only; it adds no dependency, CI, permission, secret, download, or new code-execution surface.

Review details

Best possible solution:

Land the narrow default/help/test update only after maintainers accept the longer default and lock-budget behavior and either receive real QMD/GGUF proof or explicitly record a proof override.

Do we have a high-confidence way to reproduce the issue?

Partially: source-level reproduction is clear because current main resolves unset QMD embed timeout to 120_000 and passes it to qmd embed. I did not establish a live current-main QMD/GGUF workspace repro in this read-only review.

Is this the best way to solve the issue?

Unclear as a best fix without maintainer acceptance: raising the default is a plausible, narrow mitigation and matches local embedding precedent, but it broadens timeout and lock behavior for every unset QMD user. A more selective local-GGUF default could reduce upgrade impact if maintainers prefer that product shape.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 28550c3847cc.

Label changes

Label changes:

add status: 🛠️ actively grinding: The PR author has acted after the latest ClawSweeper review and work remains. Needs stronger real behavior proof before merge: The PR has terminal grep/test output, but not real QMD/GGUF embed completion or runtime warning proof; add redacted logs/live output or a linked artifact, or obtain a maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
remove status: 📣 needs proof: Current PR status label is status: 🛠️ actively grinding.

Label justifications:

P2: This is a normal-priority memory/QMD bug fix with limited blast radius but real usability impact for local GGUF users.
merge-risk: 🚨 compatibility: The PR changes the default behavior for existing QMD users who have not explicitly set memory.qmd.update.embedTimeoutMs.
rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
status: 🛠️ actively grinding: The PR author has acted after the latest ClawSweeper review and work remains. Needs stronger real behavior proof before merge: The PR has terminal grep/test output, but not real QMD/GGUF embed completion or runtime warning proof; add redacted logs/live output or a linked artifact, or obtain a maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Source 0, Tests 0. Total 0 across 4 files.

View PR surface stats

Area	Files	Added	Removed
Source	3	3	3
Tests	1	2	2
Docs	0	0	0
Config	0	0	0
Generated	0	0	0
Other	0	0	0
Total	4	5	5

Acceptance criteria:

[P1] Redacted real QMD/GGUF embed proof under unset memory.qmd.update.embedTimeoutMs, or explicit maintainer proof override.
[P2] Focused config test for resolveMemoryBackendConfig default/fallback expectations after the default change.

What I checked:

Current main still defaults QMD embed timeout to 120s: DEFAULT_QMD_EMBED_TIMEOUT_MS is still 120_000 on current main, and unset config resolves through this default. (packages/memory-host-sdk/src/host/backend-config.ts:102, 28550c3847cc)
QMD runtime uses the resolved timeout for embed: QmdMemoryManager passes this.qmd.update.embedTimeoutMs directly to qmd embed, so the default change affects runtime behavior when users do not set the config key. (extensions/memory-core/src/memory/qmd-manager.ts:1562, 28550c3847cc)
Timeout also drives QMD lock budgets: The QMD write-lock helper derives retry wait budget as expected * 6 and stale time as expected * 2, so moving the default to 600s also lengthens lock behavior for unset configs. (extensions/memory-core/src/memory/qmd-manager.ts:196, 28550c3847cc)
Built-in embedding precedent supports a 600s local default: The built-in memory embedding path uses 600s for local batch embeddings, and its tests assert local batch timeout resolution to 600_000. (extensions/memory-core/src/memory/manager-embedding-ops.ts:53, 28550c3847cc)
Latest release has not shipped this PR's change: The latest release tag v2026.5.28 points at e93216080aa1f425d3ab127014603eba8e365b2d, where the QMD embed timeout default is still 120_000. (packages/memory-host-sdk/src/host/backend-config.ts:102, e93216080aa1)
PR head changes the requested default and related text: The provided PR diff for head 26ec8787efd388c9117bed50fa8032c0e6f496b1 changes the default, QMD warning, schema help, and both backend-config default/fallback expectations. (packages/memory-host-sdk/src/host/backend-config.ts:102, 26ec8787efd3)

Likely related people:

steipete: Peter Steinberger authored the memory-host SDK move and the memory-engine-to-plugin move that carry the central config and QMD manager surfaces touched here. (role: feature-history owner; confidence: medium; commits: eebce9e9c7cb, cad83db8b2f7, e93216080aa1; files: packages/memory-host-sdk/src/host/backend-config.ts, extensions/memory-core/src/memory/qmd-manager.ts, src/config/schema.help.ts)
Firas Alswihry: The shallow checkout's current blame for the relevant memory-core/config files points at a recent memory-core commit authored by Firas; confidence is limited because the repository history is grafted. (role: recent area contributor; confidence: low; commits: 70c59f59b258; files: extensions/memory-core/src/memory/qmd-manager.ts, packages/memory-host-sdk/src/host/backend-config.ts, src/config/schema.help.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

clawsweeper · 2026-05-28T07:50:24Z

ClawSweeper PR egg: 🎁 locked until real behavior proof passes.

Details

No creature or rarity is rolled until proof passes.
Eggs are collectible flavor only; they do not affect labels, ratings, merge decisions, or automation.

Keep config help in sync with DEFAULT_QMD_EMBED_TIMEOUT_MS change. ClawSweeper P2 finding: src/config/schema.help.ts:1304

ClawSweeper P2 finding: backend-config.test.ts:467 still expected 120000 for the non-finite fallback case after DEFAULT_QMD_EMBED_TIMEOUT_MS changed.

…UF models Increase DEFAULT_QMD_EMBED_TIMEOUT_MS from 120s to 600s to match built-in engine's local provider default. Local GGUF models need 3-4 minutes to embed a 30-50 file workspace on commodity hardware. Also improve error message to surface the config path: 'qmd embed failed (...) — consider increasing memory.qmd.update.embedTimeoutMs (default: 600000)' Fixes openclaw#74204

openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels May 28, 2026

This comment was marked as low quality.

Sign in to view

RomneyDa added dependency-guard-backfill and removed dependency-guard-backfill labels May 29, 2026

谭珊珊0668000363 added 3 commits June 1, 2026 14:18

docs(config): update embedTimeoutMs default to 600000 in schema help

e627798

Keep config help in sync with DEFAULT_QMD_EMBED_TIMEOUT_MS change. ClawSweeper P2 finding: src/config/schema.help.ts:1304

fix(test): update embedTimeoutMs fallback expectation to 600000

ac2a172

ClawSweeper P2 finding: backend-config.test.ts:467 still expected 120000 for the non-finite fallback case after DEFAULT_QMD_EMBED_TIMEOUT_MS changed.

osolmaz self-assigned this Jun 3, 2026

clawsweeper Bot added status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(memory): increase QMD embedTimeoutMs default to 600s for local GG…#87572

fix(memory): increase QMD embedTimeoutMs default to 600s for local GG…#87572
tanshanshan wants to merge 3 commits into
openclaw:mainfrom
tanshanshan:fix/qmd-embed-timeout-74204

tanshanshan commented May 28, 2026

Uh oh!

clawsweeper Bot commented May 28, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 28, 2026

Uh oh!

This comment was marked as low quality.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tanshanshan commented May 28, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Motivation

Real behavior proof (required for external PRs)

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 28, 2026

Uh oh!

This comment was marked as low quality.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clawsweeper Bot commented May 28, 2026 •

edited

Loading