fix(memory): preserve qmd lexical search for hyphenated queries by giodl73-repo · Pull Request #81423 · openclaw/openclaw

giodl73-repo · 2026-05-13T13:46:47Z

Summary

Preserve the raw user query for QMD lexical (lex) searches so dashed identifiers, dates, and version strings can still match exactly.
Normalize only QMD semantic sub-searches (vec / hyde) by replacing word-internal hyphens with spaces before sending the v2 qmd.query searches array.
Add focused coverage for hybrid and vector-only QMD MCP payloads.

Tests

node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"
node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"
pnpm check:changed
git diff --check

Full qmd-manager.test.ts run on Windows reached 104/105 passing; the remaining failure was the existing symlink-permission case (EPERM creating a symlink), unrelated to this change.

Real behavior proof

Behavior or issue addressed: memory_search should not lose QMD lexical recall just because a query contains hyphenated tokens such as sqlite-vec, 2026-05-04, or multi-agent, while semantic QMD searches should avoid word-internal hyphens that QMD v2.0.1 treats as NOT-operator syntax.
Real environment tested: WSL Ubuntu-24.04 source worktrees created from upstream/main and PR head 0f7460c4d99d028e07a629bfd178afc105a6c308.
Exact steps or command run after this patch: created clean detached worktrees for upstream/main and fork/fix-qmd-hyphenated-memory-search; in each worktree ran node --import ./node_modules/tsx/dist/loader.mjs ./probe-qmd-hyphenated-search.mjs. The probe imported the real QmdMemoryManager and called the real buildV2Searches() payload-construction seam for hybrid query and vector-only vsearch payloads.
Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):

Fresh proof artifact: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/pr-81423-qmd-hyphenated-search-before-after-proof.png
Proof summary: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/summary.txt

BEFORE upstream/main:
status=0
lexPreserved=true
semanticNormalized=false
vectorOnlyNormalized=false
lexQuery=sqlite-vec backend health 2026-05-04 multi-agent
vecQuery=sqlite-vec backend health 2026-05-04 multi-agent
hydeQuery=sqlite-vec backend health 2026-05-04 multi-agent
vectorOnlyQuery=sqlite-vec backend health
ok=false

AFTER PR #81423 head:
status=0
lexPreserved=true
semanticNormalized=true
vectorOnlyNormalized=true
lexQuery=sqlite-vec backend health 2026-05-04 multi-agent
vecQuery=sqlite vec backend health 2026 05 04 multi agent
hydeQuery=sqlite vec backend health 2026 05 04 multi agent
vectorOnlyQuery=sqlite vec backend health
ok=true

Observed result after fix: upstream/main preserves the lexical query but sends hyphenated text to semantic vec, hyde, and vector-only payloads. PR head preserves the raw lexical branch while normalizing only semantic payloads, so exact lexical recall remains available and QMD semantic validation no longer sees word-internal hyphens.
What was not tested: no live QMD 2.0.1 daemon or full memory_search command against an installed QMD index was run in this environment. The proof verifies the runtime payload-construction seam that feeds QMD MCP calls.

clawsweeper · 2026-05-13T13:50:34Z

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by maintainer comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors can comment @clawsweeper re-review or @clawsweeper re-run on their own open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR changes memory-core QMD v2 payload construction so lex keeps raw hyphenated queries while vec/hyde/vsearch use a semantic-normalized query, adds focused tests, and adds a changelog entry.

Reproducibility: yes. at source level. Current main sends raw hyphenated queries to QMD vec/hyde/vsearch, QMD v2.0.1 rejects semantic queries matching /-\w/, and the fallback wrapper switches the request to the builtin index.

Real behavior proof
Sufficient (linked_artifact): The PR includes a linked before/after runtime-seam proof artifact showing QmdMemoryManager payload construction preserving lex and normalizing semantic payloads, plus post-refresh validation reported by the contributor.

Next step before merge
No automated repair is needed; the protected maintainer label and clean diff leave this for explicit maintainer review or landing.

Security
Cleared: The diff only changes memory query payload construction, focused tests, and changelog text, with no dependency, workflow, script, secret, or package-resolution changes.

Review details

Best possible solution:

Land this cleaned semantic-only QMD normalization after explicit maintainer review, keeping raw lex recall, focused regression coverage, and the linked issue-closing behavior intact.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main sends raw hyphenated queries to QMD vec/hyde/vsearch, QMD v2.0.1 rejects semantic queries matching /-\w/, and the fallback wrapper switches the request to the builtin index.

Is this the best way to solve the issue?

Yes. Normalizing only QMD semantic sub-search payloads while preserving raw lex is the narrowest maintainable fix and avoids broader fallback-policy changes.

Acceptance criteria:

node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"
node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"
pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/memory-core/src/memory/qmd-manager.ts extensions/memory-core/src/memory/qmd-manager.test.ts
git diff --check
pnpm check:changed

What I checked:

Current main QMD payload construction: Current main builds QMD v2 searches with the same raw query for lex, vec, and hyde, so hyphenated tokens are still sent to semantic QMD sub-searches before this PR. (extensions/memory-core/src/memory/qmd-manager.ts:1951, 6ebc5e471929)
Current main fallback behavior: When the primary QMD manager throws, the fallback wrapper marks it failed and switches the request to the builtin index, matching the linked issue's total-QMD-recall loss path. (extensions/memory-core/src/memory/search-manager.ts:431, 6ebc5e471929)
QMD dependency contract: QMD v2.0.1 validates vec/hyde queries with /-\w/ and /-"/ rejection while lex validation is separate, so preserving lex and normalizing only semantic sub-searches matches the dependency boundary. (tobi/qmd/src/store.ts:2599)
PR head implementation: The current PR head computes a semantic query once, uses it for vec/hyde/vsearch, keeps the raw query for lex, and uses a lookahead normalizer that handles chained tokens such as sqlite-vec-qmd. (extensions/memory-core/src/memory/qmd-manager.ts:1955, 5e8277e7c809)
PR regression coverage: The PR adds focused tests asserting raw lex plus normalized vec/hyde payloads for hybrid search and normalized vec payloads for vector-only search. (extensions/memory-core/src/memory/qmd-manager.test.ts:2701, 5e8277e7c809)
Real behavior proof artifact: The linked proof summary and inspected image show before/after QmdMemoryManager payload construction: current main preserves lex but leaves semantic/vector-only hyphens, while the PR behavior preserves lex and normalizes semantic payloads. (0f7460c4d99d)

Likely related people:

vincentkoc: Vincent Koc has a dense recent history in memory-core QMD compatibility, query semantics, collection recovery, and test hardening around the affected files. (role: recent area contributor; confidence: high; commits: 5707038e6c5a, 098f4eeebbed, 7c9108aaf7d0; files: extensions/memory-core/src/memory/qmd-manager.ts, extensions/memory-core/src/memory/qmd-manager.test.ts)
armanddp: Armand du Plessis authored the QMD 1.1+ mcporter compatibility change that introduced the current buildV2Searches typed payload seam. (role: introduced behavior; confidence: high; commits: b888741462c8; files: extensions/memory-core/src/memory/qmd-manager.ts, extensions/memory-core/src/memory/qmd-manager.test.ts)
steipete: Peter Steinberger appears in recent memory-core history, including the fallback manager history and high shortlog activity for the central memory files. (role: recent area contributor; confidence: medium; commits: cad83db8b2f7, 5dbc969b469c; files: extensions/memory-core/src/memory/search-manager.ts, extensions/memory-core/src/memory/qmd-manager.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6ebc5e471929.

giodl73-repo · 2026-05-16T22:46:29Z

Updated #81423 with a fresh deterministic proof image and gate-friendly proof section.

The proof compares upstream/main against PR head 0f7460c4d99d028e07a629bfd178afc105a6c308 using the real QmdMemoryManager.buildV2Searches() payload-construction seam:

Before: lexical search preserves the raw hyphenated query, but semantic vec/hyde and vector-only payloads also keep hyphens.
After: lexical search still preserves raw hyphenated text, while semantic payloads normalize word-internal hyphens.

Fresh proof image: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/pr-81423-qmd-hyphenated-search-before-after-proof.png
Proof summary: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/summary.txt

giodl73-repo · 2026-05-17T16:38:29Z

Updated this branch against current main and resolved the CHANGELOG.md conflict cleanly.

What changed in the refresh:

Kept CHANGELOG.md on current main and retained only the intended Memory/QMD fix entry.
Tightened the semantic-only QMD hyphen normalizer so chained hyphenated tokens like sqlite-vec-qmd normalize completely for vec/hyde/vsearch payloads.
Preserved raw lexical QMD query text for lex searches.

Validation run after the final rebase:

node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"
node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"
pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/memory-core/src/memory/qmd-manager.ts extensions/memory-core/src/memory/qmd-manager.test.ts
git diff --check
pnpm check:changed

…claw#81423)

* fix(gateway): clear CLI bindings on session reset * fix(gateway): preserve spawned sessions in configured lists * fix(channels): clear canonical stale routes * fix(telegram): preserve forum topic origin targets * fix(agents): skip fallback for session coordination errors * fix(agents): persist subagent registry before returning accepted (openclaw#83132) (openclaw#83238) * fix(memory): catch up stale sessions on startup (openclaw#82341) * fix(memory): preserve qmd lexical search for hyphenated queries (openclaw#81423) * fix(anthropic): preserve Claude image capability (openclaw#83756) * fix(agents): exclude tool result details from guard budget (openclaw#75525) * fix(provider): use Together video API endpoint * fix(telegram): preserve implicit default account (openclaw#82794) * fix(gateway): allow trusted-proxy local-direct password fallback (openclaw#82953) * fix(discord): return subagent thread delivery origin * fix: add missing prerequisites for upstream-ported fixes Add SessionWriteLockTimeoutError class and hasSessionWriteLockTimeout helper needed by the ported fix(agents) skip-fallback commit. Remove route property references from session-delivery.ts that don't exist in gemmaclaw's SessionEntry type. Add authorizePasswordAuth helper that was present in upstream but missing from gemmaclaw's auth.ts. * fix: remove route assertions incompatible with gemmaclaw SessionEntry Remove test assertions using .route property that exists in upstream's SessionEntry type but not in gemmaclaw's, restoring typecheck green. * fix(memory-core): yield event loop during fallback vector search (openclaw#81172) (openclaw#83758) Summary: - The branch changes memory-core fallback vector search to scan chunks in 256-row rowid batches with `setImmediate` yields, updates regression tests, and adds a changelog entry. - Reproducibility: yes. from source and supplied live output. Current main synchronously scans fallback vector ... and the PR body shows the before/after heartbeat behavior through the actual `searchVector` fallback path. Automerge notes: - PR branch already contained follow-up commit before automerge: test(memory-core): add boundary, parity, and concurrent-insert covera… - PR branch already contained follow-up commit before automerge: fix(memory-core): yield event loop during fallback vector search (#81… Validation: - ClawSweeper review passed for head 0ede3d7. - Required merge gates passed before the squash merge. Prepared head SHA: 0ede3d7 Review: openclaw#83758 (comment) Co-authored-by: NW <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(subagents): collect unresolved announce batches (openclaw#83701) Summary: - The PR changes collect-mode follow-up queue routing so unresolved-origin items can batch with a single resolved route and later compatible items can resume batching after a true cross-channel drain. - Reproducibility: yes. at source level: current main treats unkeyed-plus-same-keyed queue items as cross-chan ... failing path is directly visible in `src/utils/queue-helpers.ts` and `src/auto-reply/reply/queue/drain.ts`. Automerge notes: - PR branch already contained follow-up commit before automerge: Merge remote-tracking branch 'origin/main' into maint-83701-20260518 Validation: - ClawSweeper review passed for head e6ad029. - Required merge gates passed before the squash merge. Prepared head SHA: e6ad029 Review: openclaw#83701 (comment) Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(config): accept gateway remote port * fix: restore Array<{}> closing bracket in manager-search.ts Cherry-pick 68b3729 accidentally dropped the '>' from '}>', producing a syntax error. Restore '}>;' as it was in origin/main. * fix: add remotePort to GatewayRemoteConfig and GatewayRemoteConfigSchema * fix(agents): prioritize manual session turns (openclaw#82765) * fix(agents): prioritize manual session turns * docs: update changelog for session priority --------- Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> * revert: fix(agents): prioritize manual session turns (openclaw#82765) - upstream deps not in gemmaclaw * fix: resolve undefined variable errors in cherry-picked extension code * fix(tui): preserve draft while chat is busy * fix(tui): add pendingChatRunId to TuiStateAccess for cherry-picked tui commit * fix(memory-wiki): make wiki_lint tool output path-safe (openclaw#83687) * fix(ui): render session-scoped tool events (openclaw#83734) * chore: regenerate base config schema after upstream cherry-picks * fix(agents): add persistSubagentRunsToDiskOrThrow to subagent-registry test mock New export added to subagent-registry-state.ts was missing from the vi.mock definition, causing all tests in the suite to skip and the module to fail to load. * fix(telegram): wire buildTelegramInboundOriginTarget into session context Cherry-pick 675e053 added the helper and the test assertion but did not update bot-message-context.session.ts to use it. OriginatingTo now correctly includes :topic:<id> for forum groups. * fix(memory): correct session path format in startup-catchup test sessionPathForFile returns sessions/<basename> (no agent dir), but the cherry-picked test used sessions/main/<basename>. The clean-file test always failed because the path mismatch made every file look unindexed. * fix(together): update video generation test URL from v1 to v2 The source uses TOGETHER_VIDEO_BASE_URL = https://api.together.xyz/v2 but the cherry-picked test still asserted the old v1 URL. --------- Co-authored-by: nitinjwadhawan <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Galin Iliev <iliev@galcho.com> Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> Co-authored-by: Harry Xie <harryhsieh963@yahoo.com>

…claw#81423)

giodl73-repo mentioned this pull request May 13, 2026

memory_search: qmd validation rejects hyphenated tokens, causes total fallback to builtin index #81328

Closed

openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: S maintainer Maintainer-authored PR labels May 13, 2026

clawsweeper Bot mentioned this pull request May 13, 2026

fix(memory): sanitize word-internal hyphens in qmd search queries #81336

Closed

14 tasks

giodl73-repo force-pushed the fix-qmd-hyphenated-memory-search branch 3 times, most recently from 0e92a4c to a24c410 Compare May 16, 2026 06:04

giodl73-repo marked this pull request as ready for review May 16, 2026 06:09

giodl73-repo force-pushed the fix-qmd-hyphenated-memory-search branch 5 times, most recently from 0cc0b00 to 0f7460c Compare May 16, 2026 14:16

clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. P2 Normal backlog priority with limited blast radius. labels May 16, 2026

fix(memory): preserve qmd lexical search for hyphenated queries

5e8277e

giodl73-repo force-pushed the fix-qmd-hyphenated-memory-search branch from 0f7460c to 5e8277e Compare May 17, 2026 16:38

clawsweeper Bot added the impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. label May 17, 2026

giodl73-repo merged commit 44c3d8e into openclaw:main May 17, 2026
113 checks passed

github-actions Bot mentioned this pull request May 17, 2026

📡 Upstream Digest — 2026-05-17 18:56 UTC curtismercier/openclaw-mods#885

Open

clawsweeper Bot mentioned this pull request May 18, 2026

Feature Request: Memory Trust Tagging by Source #7707

Open

woodygreen pushed a commit to woodygreen/openclaw that referenced this pull request May 18, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

291541d

…claw#81423)

galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 20, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

5f20cda

…claw#81423)

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

a654f98

…claw#81423)

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

e3d75b5

…claw#81423)

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

f468ef6

…claw#81423)

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

39a05b4

…claw#81423)

galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

a609f8f

…claw#81423)

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

9c7b47d

…claw#81423)

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

5e5f469

…claw#81423)

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

c228e3a

…claw#81423)

jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

0f9be2f

…claw#81423)

SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

15fae23

…claw#81423)

sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026

fix(memory): preserve qmd lexical search for hyphenated queries (open…

cce90a3

…claw#81423)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(memory): preserve qmd lexical search for hyphenated queries#81423

fix(memory): preserve qmd lexical search for hyphenated queries#81423
giodl73-repo merged 1 commit into
openclaw:mainfrom
giodl73-repo:fix-qmd-hyphenated-memory-search

giodl73-repo commented May 13, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 13, 2026 •

edited

Loading

Uh oh!

giodl73-repo commented May 16, 2026

Uh oh!

giodl73-repo commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

giodl73-repo commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Real behavior proof

Uh oh!

clawsweeper Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giodl73-repo commented May 16, 2026

Uh oh!

giodl73-repo commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

giodl73-repo commented May 13, 2026 •

edited

Loading

clawsweeper Bot commented May 13, 2026 •

edited

Loading