Skip to content

fix(memory): preserve qmd lexical search for hyphenated queries#81423

Merged
giodl73-repo merged 1 commit into
openclaw:mainfrom
giodl73-repo:fix-qmd-hyphenated-memory-search
May 17, 2026
Merged

fix(memory): preserve qmd lexical search for hyphenated queries#81423
giodl73-repo merged 1 commit into
openclaw:mainfrom
giodl73-repo:fix-qmd-hyphenated-memory-search

Conversation

@giodl73-repo

@giodl73-repo giodl73-repo commented May 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Preserve the raw user query for QMD lexical (lex) searches so dashed identifiers, dates, and version strings can still match exactly.
  • Normalize only QMD semantic sub-searches (vec / hyde) by replacing word-internal hyphens with spaces before sending the v2 qmd.query searches array.
  • Add focused coverage for hybrid and vector-only QMD MCP payloads.

Fixes #81328.

Tests

  • node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"
  • node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"
  • pnpm check:changed
  • git diff --check

Full qmd-manager.test.ts run on Windows reached 104/105 passing; the remaining failure was the existing symlink-permission case (EPERM creating a symlink), unrelated to this change.

Real behavior proof

  • Behavior or issue addressed: memory_search should not lose QMD lexical recall just because a query contains hyphenated tokens such as sqlite-vec, 2026-05-04, or multi-agent, while semantic QMD searches should avoid word-internal hyphens that QMD v2.0.1 treats as NOT-operator syntax.
  • Real environment tested: WSL Ubuntu-24.04 source worktrees created from upstream/main and PR head 0f7460c4d99d028e07a629bfd178afc105a6c308.
  • Exact steps or command run after this patch: created clean detached worktrees for upstream/main and fork/fix-qmd-hyphenated-memory-search; in each worktree ran node --import ./node_modules/tsx/dist/loader.mjs ./probe-qmd-hyphenated-search.mjs. The probe imported the real QmdMemoryManager and called the real buildV2Searches() payload-construction seam for hybrid query and vector-only vsearch payloads.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):

PR #81423 QMD hyphenated search before/after proof

Fresh proof artifact: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/pr-81423-qmd-hyphenated-search-before-after-proof.png
Proof summary: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/summary.txt

BEFORE upstream/main:
status=0
lexPreserved=true
semanticNormalized=false
vectorOnlyNormalized=false
lexQuery=sqlite-vec backend health 2026-05-04 multi-agent
vecQuery=sqlite-vec backend health 2026-05-04 multi-agent
hydeQuery=sqlite-vec backend health 2026-05-04 multi-agent
vectorOnlyQuery=sqlite-vec backend health
ok=false

AFTER PR #81423 head:
status=0
lexPreserved=true
semanticNormalized=true
vectorOnlyNormalized=true
lexQuery=sqlite-vec backend health 2026-05-04 multi-agent
vecQuery=sqlite vec backend health 2026 05 04 multi agent
hydeQuery=sqlite vec backend health 2026 05 04 multi agent
vectorOnlyQuery=sqlite vec backend health
ok=true
  • Observed result after fix: upstream/main preserves the lexical query but sends hyphenated text to semantic vec, hyde, and vector-only payloads. PR head preserves the raw lexical branch while normalizing only semantic payloads, so exact lexical recall remains available and QMD semantic validation no longer sees word-internal hyphens.
  • What was not tested: no live QMD 2.0.1 daemon or full memory_search command against an installed QMD index was run in this environment. The proof verifies the runtime payload-construction seam that feeds QMD MCP calls.

@clawsweeper

clawsweeper Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by maintainer comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors can comment @clawsweeper re-review or @clawsweeper re-run on their own open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR changes memory-core QMD v2 payload construction so lex keeps raw hyphenated queries while vec/hyde/vsearch use a semantic-normalized query, adds focused tests, and adds a changelog entry.

Reproducibility: yes. at source level. Current main sends raw hyphenated queries to QMD vec/hyde/vsearch, QMD v2.0.1 rejects semantic queries matching /-\w/, and the fallback wrapper switches the request to the builtin index.

Real behavior proof
Sufficient (linked_artifact): The PR includes a linked before/after runtime-seam proof artifact showing QmdMemoryManager payload construction preserving lex and normalizing semantic payloads, plus post-refresh validation reported by the contributor.

Next step before merge
No automated repair is needed; the protected maintainer label and clean diff leave this for explicit maintainer review or landing.

Security
Cleared: The diff only changes memory query payload construction, focused tests, and changelog text, with no dependency, workflow, script, secret, or package-resolution changes.

Review details

Best possible solution:

Land this cleaned semantic-only QMD normalization after explicit maintainer review, keeping raw lex recall, focused regression coverage, and the linked issue-closing behavior intact.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main sends raw hyphenated queries to QMD vec/hyde/vsearch, QMD v2.0.1 rejects semantic queries matching /-\w/, and the fallback wrapper switches the request to the builtin index.

Is this the best way to solve the issue?

Yes. Normalizing only QMD semantic sub-search payloads while preserving raw lex is the narrowest maintainable fix and avoids broader fallback-policy changes.

Acceptance criteria:

  • node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"
  • node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/memory-core/src/memory/qmd-manager.ts extensions/memory-core/src/memory/qmd-manager.test.ts
  • git diff --check
  • pnpm check:changed

What I checked:

  • Current main QMD payload construction: Current main builds QMD v2 searches with the same raw query for lex, vec, and hyde, so hyphenated tokens are still sent to semantic QMD sub-searches before this PR. (extensions/memory-core/src/memory/qmd-manager.ts:1951, 6ebc5e471929)
  • Current main fallback behavior: When the primary QMD manager throws, the fallback wrapper marks it failed and switches the request to the builtin index, matching the linked issue's total-QMD-recall loss path. (extensions/memory-core/src/memory/search-manager.ts:431, 6ebc5e471929)
  • QMD dependency contract: QMD v2.0.1 validates vec/hyde queries with /-\w/ and /-"/ rejection while lex validation is separate, so preserving lex and normalizing only semantic sub-searches matches the dependency boundary. (tobi/qmd/src/store.ts:2599)
  • PR head implementation: The current PR head computes a semantic query once, uses it for vec/hyde/vsearch, keeps the raw query for lex, and uses a lookahead normalizer that handles chained tokens such as sqlite-vec-qmd. (extensions/memory-core/src/memory/qmd-manager.ts:1955, 5e8277e7c809)
  • PR regression coverage: The PR adds focused tests asserting raw lex plus normalized vec/hyde payloads for hybrid search and normalized vec payloads for vector-only search. (extensions/memory-core/src/memory/qmd-manager.test.ts:2701, 5e8277e7c809)
  • Real behavior proof artifact: The linked proof summary and inspected image show before/after QmdMemoryManager payload construction: current main preserves lex but leaves semantic/vector-only hyphens, while the PR behavior preserves lex and normalizes semantic payloads. (0f7460c4d99d)

Likely related people:

  • vincentkoc: Vincent Koc has a dense recent history in memory-core QMD compatibility, query semantics, collection recovery, and test hardening around the affected files. (role: recent area contributor; confidence: high; commits: 5707038e6c5a, 098f4eeebbed, 7c9108aaf7d0; files: extensions/memory-core/src/memory/qmd-manager.ts, extensions/memory-core/src/memory/qmd-manager.test.ts)
  • armanddp: Armand du Plessis authored the QMD 1.1+ mcporter compatibility change that introduced the current buildV2Searches typed payload seam. (role: introduced behavior; confidence: high; commits: b888741462c8; files: extensions/memory-core/src/memory/qmd-manager.ts, extensions/memory-core/src/memory/qmd-manager.test.ts)
  • steipete: Peter Steinberger appears in recent memory-core history, including the fallback manager history and high shortlog activity for the central memory files. (role: recent area contributor; confidence: medium; commits: cad83db8b2f7, 5dbc969b469c; files: extensions/memory-core/src/memory/search-manager.ts, extensions/memory-core/src/memory/qmd-manager.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6ebc5e471929.

@giodl73-repo giodl73-repo force-pushed the fix-qmd-hyphenated-memory-search branch 3 times, most recently from 0e92a4c to a24c410 Compare May 16, 2026 06:04
@giodl73-repo giodl73-repo marked this pull request as ready for review May 16, 2026 06:09
@giodl73-repo giodl73-repo force-pushed the fix-qmd-hyphenated-memory-search branch 5 times, most recently from 0cc0b00 to 0f7460c Compare May 16, 2026 14:16
@giodl73-repo

Copy link
Copy Markdown
Contributor Author

Updated #81423 with a fresh deterministic proof image and gate-friendly proof section.

The proof compares upstream/main against PR head 0f7460c4d99d028e07a629bfd178afc105a6c308 using the real QmdMemoryManager.buildV2Searches() payload-construction seam:

  • Before: lexical search preserves the raw hyphenated query, but semantic vec/hyde and vector-only payloads also keep hyphens.
  • After: lexical search still preserves raw hyphenated text, while semantic payloads normalize word-internal hyphens.

Fresh proof image: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/pr-81423-qmd-hyphenated-search-before-after-proof.png
Proof summary: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/summary.txt

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. P2 Normal backlog priority with limited blast radius. labels May 16, 2026
@giodl73-repo giodl73-repo force-pushed the fix-qmd-hyphenated-memory-search branch from 0f7460c to 5e8277e Compare May 17, 2026 16:38
@giodl73-repo

Copy link
Copy Markdown
Contributor Author

Updated this branch against current main and resolved the CHANGELOG.md conflict cleanly.

What changed in the refresh:

  • Kept CHANGELOG.md on current main and retained only the intended Memory/QMD fix entry.
  • Tightened the semantic-only QMD hyphen normalizer so chained hyphenated tokens like sqlite-vec-qmd normalize completely for vec/hyde/vsearch payloads.
  • Preserved raw lexical QMD query text for lex searches.

Validation run after the final rebase:

  • node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"
  • node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/memory-core/src/memory/qmd-manager.ts extensions/memory-core/src/memory/qmd-manager.test.ts
  • git diff --check
  • pnpm check:changed

@clawsweeper clawsweeper Bot added the impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. label May 17, 2026
@giodl73-repo giodl73-repo merged commit 44c3d8e into openclaw:main May 17, 2026
113 checks passed
woodygreen pushed a commit to woodygreen/openclaw that referenced this pull request May 18, 2026
frankhli843 added a commit to gemmaclaw/gemmaclaw that referenced this pull request May 19, 2026
* fix(gateway): clear CLI bindings on session reset

* fix(gateway): preserve spawned sessions in configured lists

* fix(channels): clear canonical stale routes

* fix(telegram): preserve forum topic origin targets

* fix(agents): skip fallback for session coordination errors

* fix(agents): persist subagent registry before returning accepted (openclaw#83132) (openclaw#83238)

* fix(memory): catch up stale sessions on startup (openclaw#82341)

* fix(memory): preserve qmd lexical search for hyphenated queries (openclaw#81423)

* fix(anthropic): preserve Claude image capability (openclaw#83756)

* fix(agents): exclude tool result details from guard budget (openclaw#75525)

* fix(provider): use Together video API endpoint

* fix(telegram): preserve implicit default account (openclaw#82794)

* fix(gateway): allow trusted-proxy local-direct password fallback (openclaw#82953)

* fix(discord): return subagent thread delivery origin

* fix: add missing prerequisites for upstream-ported fixes

Add SessionWriteLockTimeoutError class and hasSessionWriteLockTimeout
helper needed by the ported fix(agents) skip-fallback commit. Remove
route property references from session-delivery.ts that don't exist in
gemmaclaw's SessionEntry type. Add authorizePasswordAuth helper that was
present in upstream but missing from gemmaclaw's auth.ts.

* fix: remove route assertions incompatible with gemmaclaw SessionEntry

Remove test assertions using .route property that exists in upstream's
SessionEntry type but not in gemmaclaw's, restoring typecheck green.

* fix(memory-core): yield event loop during fallback vector search (openclaw#81172) (openclaw#83758)

Summary:
- The branch changes memory-core fallback vector search to scan chunks in 256-row rowid batches with `setImmediate` yields, updates regression tests, and adds a changelog entry.
- Reproducibility: yes. from source and supplied live output. Current main synchronously scans fallback vector ...  and the PR body shows the before/after heartbeat behavior through the actual `searchVector` fallback path.

Automerge notes:
- PR branch already contained follow-up commit before automerge: test(memory-core): add boundary, parity, and concurrent-insert covera…
- PR branch already contained follow-up commit before automerge: fix(memory-core): yield event loop during fallback vector search (#81…

Validation:
- ClawSweeper review passed for head 0ede3d7.
- Required merge gates passed before the squash merge.

Prepared head SHA: 0ede3d7
Review: openclaw#83758 (comment)

Co-authored-by: NW <nitinwadhawan66@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>

* fix(subagents): collect unresolved announce batches (openclaw#83701)

Summary:
- The PR changes collect-mode follow-up queue routing so unresolved-origin items can batch with a single resolved route and later compatible items can resume batching after a true cross-channel drain.
- Reproducibility: yes. at source level: current main treats unkeyed-plus-same-keyed queue items as cross-chan ... failing path is directly visible in `src/utils/queue-helpers.ts` and `src/auto-reply/reply/queue/drain.ts`.

Automerge notes:
- PR branch already contained follow-up commit before automerge: Merge remote-tracking branch 'origin/main' into maint-83701-20260518

Validation:
- ClawSweeper review passed for head e6ad029.
- Required merge gates passed before the squash merge.

Prepared head SHA: e6ad029
Review: openclaw#83701 (comment)

Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>

* fix(config): accept gateway remote port

* fix: restore Array<{}> closing bracket in manager-search.ts

Cherry-pick 68b3729 accidentally dropped the '>' from '}>',
producing a syntax error. Restore '}>;' as it was in origin/main.

* fix: add remotePort to GatewayRemoteConfig and GatewayRemoteConfigSchema

* fix(agents): prioritize manual session turns (openclaw#82765)

* fix(agents): prioritize manual session turns

* docs: update changelog for session priority

---------

Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com>

* revert: fix(agents): prioritize manual session turns (openclaw#82765) - upstream deps not in gemmaclaw

* fix: resolve undefined variable errors in cherry-picked extension code

* fix(tui): preserve draft while chat is busy

* fix(tui): add pendingChatRunId to TuiStateAccess for cherry-picked tui commit

* fix(memory-wiki): make wiki_lint tool output path-safe (openclaw#83687)

* fix(ui): render session-scoped tool events (openclaw#83734)

* chore: regenerate base config schema after upstream cherry-picks

* fix(agents): add persistSubagentRunsToDiskOrThrow to subagent-registry test mock

New export added to subagent-registry-state.ts was missing from the
vi.mock definition, causing all tests in the suite to skip and the
module to fail to load.

* fix(telegram): wire buildTelegramInboundOriginTarget into session context

Cherry-pick 675e053 added the helper and the test assertion but did not
update bot-message-context.session.ts to use it. OriginatingTo now
correctly includes :topic:<id> for forum groups.

* fix(memory): correct session path format in startup-catchup test

sessionPathForFile returns sessions/<basename> (no agent dir), but the
cherry-picked test used sessions/main/<basename>. The clean-file test
always failed because the path mismatch made every file look unindexed.

* fix(together): update video generation test URL from v1 to v2

The source uses TOGETHER_VIDEO_BASE_URL = https://api.together.xyz/v2
but the cherry-picked test still asserted the old v1 URL.

---------

Co-authored-by: nitinjwadhawan <nitinwadhawan66@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Co-authored-by: Galin Iliev <iliev@galcho.com>
Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com>
Co-authored-by: Harry Xie <harryhsieh963@yahoo.com>
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 20, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: memory-core Extension: memory-core impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. maintainer Maintainer-authored PR P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory_search: qmd validation rejects hyphenated tokens, causes total fallback to builtin index

1 participant