Skip to content

fix(memory): increase QMD embedTimeoutMs default to 600s for local GG…#87572

Open
tanshanshan wants to merge 3 commits into
openclaw:mainfrom
tanshanshan:fix/qmd-embed-timeout-74204
Open

fix(memory): increase QMD embedTimeoutMs default to 600s for local GG…#87572
tanshanshan wants to merge 3 commits into
openclaw:mainfrom
tanshanshan:fix/qmd-embed-timeout-74204

Conversation

@tanshanshan

Copy link
Copy Markdown

Summary

  • Problem: Local GGUF models (e.g., embeddinggemma-300m) take 3-4 minutes to embed a 30-50 file workspace on commodity hardware (4-core CPU), but memory.qmd.update.embedTimeoutMs default is only 120s, causing repeated timeout failures and effectively disabling vector search.
  • Solution: Increase DEFAULT_QMD_EMBED_TIMEOUT_MS from 120_000 to 600_000 to match built-in engine's embeddingBatchTimeoutSeconds local-provider default. Improve error message to surface config path: memory.qmd.update.embedTimeoutMs.
  • What changed: 3 files — backend-config.ts (default change), qmd-manager.ts (error message), backend-config.test.ts (test expectation).
  • What did NOT change: No runtime behavior change beyond timeout threshold. No config schema change. No API surface change.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Motivation

The current 120s default is insufficient for local GGUF workloads, while built-in engine correctly uses 600s for local/self-hosted providers. Users discover the fix (embedTimeoutMs: 600000) only via openclaw config schema or docs, and error message does not hint at the config path. This PR aligns defaults and surfaces the fix in the error message.

Real behavior proof (required for external PRs)

  • Behavior addressed: Increase QMD embed timeout default from 120s to 600s; improve error message to mention config path.
  • Real environment tested: Linux (x86_64), Node 22, local OpenClaw checkout on fix/qmd-embed-timeout-74204 branch.
  • Exact steps or command run after this patch:

grep -n "DEFAULT_QMD_EMBED_TIMEOUT_MS" packages/memory-host-sdk/src/host/backend-config.ts

grep -n "qmd embed failed" extensions/memory-core/src/memory/qmd-manager.ts

node scripts/run-vitest.mjs packages/memory-host-sdk/src/host/backend-config.test.ts -t "embedTimeoutMs"

  • Evidence after fix:
$ grep -n "DEFAULT_QMD_EMBED_TIMEOUT_MS" packages/memory-host-sdk/src/host/backend-config.ts
102:const DEFAULT_QMD_EMBED_TIMEOUT_MS = 600_000;

$ grep -n "qmd embed failed" extensions/memory-core/src/memory/qmd-manager.ts
1800:    log.warn(
1801:      `qmd embed failed (${reason}): ${String(err)} — consider increasing memory.qmd.update.embedTimeoutMs (default: 600000); backing off for ${Math.ceil(delayMs / 1000)}s`,
1802:    );

$ node scripts/run-vitest.mjs packages/memory-host-sdk/src/host/backend-config.test.ts -t "embedTimeoutMs"
✓ backend-config.test.ts (1 test) 1 passed
  • Observed result after fix: Default changed from 120_000 to 600_000. Error message now includes config path hint. Test expectation updated and passed.
  • What was not tested: Full gateway startup with QMD GGUF embedding on real workspace (no GGUF model available locally).

Root Cause (if applicable)

N/A — config default mismatch, not a bug in runtime logic.

Regression Test Plan (if applicable)

N/A — default change, no behavior regression risk.

User-visible / Behavior Changes

Users with local GGUF embedding will see fewer timeout failures after upgrade. Error message now hints at config path if timeout still occurs.

Diagram (if applicable)

Before: DEFAULT_QMD_EMBED_TIMEOUT_MS = 120_000
After:  DEFAULT_QMD_EMBED_TIMEOUT_MS = 600_000

Error message before: "qmd embed failed (...) — backing off for ..."
Error message after:  "qmd embed failed (...) — consider increasing memory.qmd.update.embedTimeoutMs (default: 600000); backing off for ..."

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux x86_64
  • Runtime/container: Node 22
  • Model/provider: N/A
  • Integration/channel: QMD memory
  • Relevant config (redacted): N/A

Steps

  1. Checkout fix/qmd-embed-timeout-74204 branch
  2. Verify default changed via grep
  3. Verify error message updated via grep
  4. Run test expectation

Expected

DEFAULT_QMD_EMBED_TIMEOUT_MS is 600_000. Error message mentions memory.qmd.update.embedTimeoutMs. Test passes.

Actual

All checks pass as expected.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets (grep output)
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: Default value change, error message update, test expectation alignment
  • Edge cases checked: N/A (simple config change)
  • What you did NOT verify: Live gateway startup with QMD GGUF embedding

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No (default only)
  • Migration needed? No

Risks and Mitigations

None — simple default increase aligning with built-in engine's local provider default.

Fixes #74204

@openclaw-barnacle openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 1, 2026, 2:27 AM ET / 06:27 UTC.

Summary
The PR raises the QMD embedTimeoutMs default from 120s to 600s and updates the QMD embed warning text, schema help, and backend-config test expectations.

PR surface: Source 0, Tests 0. Total 0 across 4 files.

Reproducibility: Partially: source-level reproduction is clear because current main resolves unset QMD embed timeout to 120_000 and passes it to qmd embed. I did not establish a live current-main QMD/GGUF workspace repro in this read-only review.

Review metrics: 2 noteworthy metrics.

  • QMD embed timeout default: 1 default changed: 120000 ms -> 600000 ms. This changes unset QMD behavior for existing users, so maintainers should notice the upgrade impact before merge.
  • Derived QMD lock budget: 600000 ms implies 1200s stale and 3600s retry wait budget before min clamps. The lock helper derives budgets from embedTimeoutMs, so the default change affects more than the child-process timeout.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted terminal logs, live output, or a linked artifact showing real QMD/GGUF embedding completes under the unset 600s default, or get maintainer proof override.
  • [P1] Make the maintainer acceptance of the longer default and derived lock budgets explicit before merge.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR has terminal grep/test output, but not real QMD/GGUF embed completion or runtime warning proof; add redacted logs/live output or a linked artifact, or obtain a maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] Existing QMD users with unset memory.qmd.update.embedTimeoutMs would move from a 120s embed kill threshold to 600s, which can make failed embeds take longer to surface after upgrade.
  • [P1] Because QMD lock stale/retry budgets scale from embedTimeoutMs, the new default implies 1200s stale and 3600s retry wait budgets before minimum clamps, so maintainers should explicitly accept that operator impact.
  • [P1] The supplied proof shows code/test output but not a real QMD/GGUF embed completing under the new unset default or the updated warning appearing in runtime logs.

Maintainer options:

  1. Require QMD proof or an explicit override (recommended)
    Add redacted terminal logs, live output, or a linked artifact showing a real QMD/GGUF embed completing under the unset 600s default, or have a maintainer explicitly record proof override and default-risk acceptance.
  2. Accept the broader default change knowingly
    Maintainers can choose to merge once they are comfortable that longer unset QMD embed and lock waits are the intended upgrade behavior for all QMD users.
  3. Pause for a narrower timeout strategy
    If the broad default is too risky, pause this PR and pursue a local-GGUF-specific default/help path or a docs-only workaround instead.

Next step before merge

  • [P1] Human review is needed for proof override and compatibility acceptance; the latest head appears to have handled the mechanical test/schema-help updates.

Security
Cleared: The diff changes timeout defaults, warning/help text, and tests only; it adds no dependency, CI, permission, secret, download, or new code-execution surface.

Review details

Best possible solution:

Land the narrow default/help/test update only after maintainers accept the longer default and lock-budget behavior and either receive real QMD/GGUF proof or explicitly record a proof override.

Do we have a high-confidence way to reproduce the issue?

Partially: source-level reproduction is clear because current main resolves unset QMD embed timeout to 120_000 and passes it to qmd embed. I did not establish a live current-main QMD/GGUF workspace repro in this read-only review.

Is this the best way to solve the issue?

Unclear as a best fix without maintainer acceptance: raising the default is a plausible, narrow mitigation and matches local embedding precedent, but it broadens timeout and lock behavior for every unset QMD user. A more selective local-GGUF default could reduce upgrade impact if maintainers prefer that product shape.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 28550c3847cc.

Label changes

Label changes:

  • add status: 🛠️ actively grinding: The PR author has acted after the latest ClawSweeper review and work remains. Needs stronger real behavior proof before merge: The PR has terminal grep/test output, but not real QMD/GGUF embed completion or runtime warning proof; add redacted logs/live output or a linked artifact, or obtain a maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • remove status: 📣 needs proof: Current PR status label is status: 🛠️ actively grinding.

Label justifications:

  • P2: This is a normal-priority memory/QMD bug fix with limited blast radius but real usability impact for local GGUF users.
  • merge-risk: 🚨 compatibility: The PR changes the default behavior for existing QMD users who have not explicitly set memory.qmd.update.embedTimeoutMs.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 🛠️ actively grinding: The PR author has acted after the latest ClawSweeper review and work remains. Needs stronger real behavior proof before merge: The PR has terminal grep/test output, but not real QMD/GGUF embed completion or runtime warning proof; add redacted logs/live output or a linked artifact, or obtain a maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source 0, Tests 0. Total 0 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 3 3 3 0
Tests 1 2 2 0
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 5 5 0

Acceptance criteria:

  • [P1] Redacted real QMD/GGUF embed proof under unset memory.qmd.update.embedTimeoutMs, or explicit maintainer proof override.
  • [P2] Focused config test for resolveMemoryBackendConfig default/fallback expectations after the default change.

What I checked:

Likely related people:

  • steipete: Peter Steinberger authored the memory-host SDK move and the memory-engine-to-plugin move that carry the central config and QMD manager surfaces touched here. (role: feature-history owner; confidence: medium; commits: eebce9e9c7cb, cad83db8b2f7, e93216080aa1; files: packages/memory-host-sdk/src/host/backend-config.ts, extensions/memory-core/src/memory/qmd-manager.ts, src/config/schema.help.ts)
  • Firas Alswihry: The shallow checkout's current blame for the relevant memory-core/config files points at a recent memory-core commit authored by Firas; confidence is limited because the repository history is grafted. (role: recent area contributor; confidence: low; commits: 70c59f59b258; files: extensions/memory-core/src/memory/qmd-manager.ts, packages/memory-host-sdk/src/host/backend-config.ts, src/config/schema.help.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg: 🎁 locked until real behavior proof passes.

Details
  • No creature or rarity is rolled until proof passes.
  • Eggs are collectible flavor only; they do not affect labels, ratings, merge decisions, or automation.

@tanshanshan

This comment was marked as low quality.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 29, 2026
谭珊珊0668000363 added 3 commits June 1, 2026 14:18
Keep config help in sync with DEFAULT_QMD_EMBED_TIMEOUT_MS change.
ClawSweeper P2 finding: src/config/schema.help.ts:1304
ClawSweeper P2 finding: backend-config.test.ts:467 still expected 120000
for the non-finite fallback case after DEFAULT_QMD_EMBED_TIMEOUT_MS changed.
…UF models

Increase DEFAULT_QMD_EMBED_TIMEOUT_MS from 120s to 600s to match
built-in engine's local provider default. Local GGUF models need
3-4 minutes to embed a 30-50 file workspace on commodity hardware.

Also improve error message to surface the config path:
'qmd embed failed (...) — consider increasing memory.qmd.update.embedTimeoutMs (default: 600000)'

Fixes openclaw#74204
@osolmaz osolmaz self-assigned this Jun 3, 2026
@clawsweeper clawsweeper Bot added status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: memory-core Extension: memory-core merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: XS status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory.qmd.update.embedTimeoutMs default (120 s) is too low for local GGUF; error message doesn't surface the fix

3 participants