fix(memory): preserve qmd lexical search for hyphenated queries#81423
Conversation
|
Codex review: needs maintainer review before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. at source level. Current main sends raw hyphenated queries to QMD vec/hyde/vsearch, QMD v2.0.1 rejects semantic queries matching Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this cleaned semantic-only QMD normalization after explicit maintainer review, keeping raw lex recall, focused regression coverage, and the linked issue-closing behavior intact. Do we have a high-confidence way to reproduce the issue? Yes, at source level. Current main sends raw hyphenated queries to QMD vec/hyde/vsearch, QMD v2.0.1 rejects semantic queries matching Is this the best way to solve the issue? Yes. Normalizing only QMD semantic sub-search payloads while preserving raw lex is the narrowest maintainable fix and avoids broader fallback-policy changes. Acceptance criteria:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 6ebc5e471929. |
0e92a4c to
a24c410
Compare
0cc0b00 to
0f7460c
Compare
|
Updated #81423 with a fresh deterministic proof image and gate-friendly proof section. The proof compares
Fresh proof image: https://raw.githubusercontent.com/giodl73-repo/openclaw/proof-artifacts/pr-81423-fresh/pr-81423/pr-81423-qmd-hyphenated-search-before-after-proof.png |
0f7460c to
5e8277e
Compare
|
Updated this branch against current main and resolved the CHANGELOG.md conflict cleanly. What changed in the refresh:
Validation run after the final rebase:
|
* fix(gateway): clear CLI bindings on session reset * fix(gateway): preserve spawned sessions in configured lists * fix(channels): clear canonical stale routes * fix(telegram): preserve forum topic origin targets * fix(agents): skip fallback for session coordination errors * fix(agents): persist subagent registry before returning accepted (openclaw#83132) (openclaw#83238) * fix(memory): catch up stale sessions on startup (openclaw#82341) * fix(memory): preserve qmd lexical search for hyphenated queries (openclaw#81423) * fix(anthropic): preserve Claude image capability (openclaw#83756) * fix(agents): exclude tool result details from guard budget (openclaw#75525) * fix(provider): use Together video API endpoint * fix(telegram): preserve implicit default account (openclaw#82794) * fix(gateway): allow trusted-proxy local-direct password fallback (openclaw#82953) * fix(discord): return subagent thread delivery origin * fix: add missing prerequisites for upstream-ported fixes Add SessionWriteLockTimeoutError class and hasSessionWriteLockTimeout helper needed by the ported fix(agents) skip-fallback commit. Remove route property references from session-delivery.ts that don't exist in gemmaclaw's SessionEntry type. Add authorizePasswordAuth helper that was present in upstream but missing from gemmaclaw's auth.ts. * fix: remove route assertions incompatible with gemmaclaw SessionEntry Remove test assertions using .route property that exists in upstream's SessionEntry type but not in gemmaclaw's, restoring typecheck green. * fix(memory-core): yield event loop during fallback vector search (openclaw#81172) (openclaw#83758) Summary: - The branch changes memory-core fallback vector search to scan chunks in 256-row rowid batches with `setImmediate` yields, updates regression tests, and adds a changelog entry. - Reproducibility: yes. from source and supplied live output. Current main synchronously scans fallback vector ... and the PR body shows the before/after heartbeat behavior through the actual `searchVector` fallback path. Automerge notes: - PR branch already contained follow-up commit before automerge: test(memory-core): add boundary, parity, and concurrent-insert covera… - PR branch already contained follow-up commit before automerge: fix(memory-core): yield event loop during fallback vector search (#81… Validation: - ClawSweeper review passed for head 0ede3d7. - Required merge gates passed before the squash merge. Prepared head SHA: 0ede3d7 Review: openclaw#83758 (comment) Co-authored-by: NW <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(subagents): collect unresolved announce batches (openclaw#83701) Summary: - The PR changes collect-mode follow-up queue routing so unresolved-origin items can batch with a single resolved route and later compatible items can resume batching after a true cross-channel drain. - Reproducibility: yes. at source level: current main treats unkeyed-plus-same-keyed queue items as cross-chan ... failing path is directly visible in `src/utils/queue-helpers.ts` and `src/auto-reply/reply/queue/drain.ts`. Automerge notes: - PR branch already contained follow-up commit before automerge: Merge remote-tracking branch 'origin/main' into maint-83701-20260518 Validation: - ClawSweeper review passed for head e6ad029. - Required merge gates passed before the squash merge. Prepared head SHA: e6ad029 Review: openclaw#83701 (comment) Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(config): accept gateway remote port * fix: restore Array<{}> closing bracket in manager-search.ts Cherry-pick 68b3729 accidentally dropped the '>' from '}>', producing a syntax error. Restore '}>;' as it was in origin/main. * fix: add remotePort to GatewayRemoteConfig and GatewayRemoteConfigSchema * fix(agents): prioritize manual session turns (openclaw#82765) * fix(agents): prioritize manual session turns * docs: update changelog for session priority --------- Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> * revert: fix(agents): prioritize manual session turns (openclaw#82765) - upstream deps not in gemmaclaw * fix: resolve undefined variable errors in cherry-picked extension code * fix(tui): preserve draft while chat is busy * fix(tui): add pendingChatRunId to TuiStateAccess for cherry-picked tui commit * fix(memory-wiki): make wiki_lint tool output path-safe (openclaw#83687) * fix(ui): render session-scoped tool events (openclaw#83734) * chore: regenerate base config schema after upstream cherry-picks * fix(agents): add persistSubagentRunsToDiskOrThrow to subagent-registry test mock New export added to subagent-registry-state.ts was missing from the vi.mock definition, causing all tests in the suite to skip and the module to fail to load. * fix(telegram): wire buildTelegramInboundOriginTarget into session context Cherry-pick 675e053 added the helper and the test assertion but did not update bot-message-context.session.ts to use it. OriginatingTo now correctly includes :topic:<id> for forum groups. * fix(memory): correct session path format in startup-catchup test sessionPathForFile returns sessions/<basename> (no agent dir), but the cherry-picked test used sessions/main/<basename>. The clean-file test always failed because the path mismatch made every file look unindexed. * fix(together): update video generation test URL from v1 to v2 The source uses TOGETHER_VIDEO_BASE_URL = https://api.together.xyz/v2 but the cherry-picked test still asserted the old v1 URL. --------- Co-authored-by: nitinjwadhawan <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Galin Iliev <iliev@galcho.com> Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> Co-authored-by: Harry Xie <harryhsieh963@yahoo.com>
Summary
lex) searches so dashed identifiers, dates, and version strings can still match exactly.vec/hyde) by replacing word-internal hyphens with spaces before sending the v2qmd.querysearches array.Fixes #81328.
Tests
node scripts/test-projects.mjs extensions/memory-core/src/memory/qmd-manager.test.ts --reporter=verbose -t "hyphenated tokens|vector-only QMD"node scripts/test-projects.mjs extensions/memory-core/src/memory/search-manager.test.ts --reporter=verbose -t "falls back to builtin search when qmd fails with sqlite busy|keeps original qmd error when fallback manager initialization fails"pnpm check:changedgit diff --checkFull
qmd-manager.test.tsrun on Windows reached 104/105 passing; the remaining failure was the existing symlink-permission case (EPERMcreating a symlink), unrelated to this change.Real behavior proof
memory_searchshould not lose QMD lexical recall just because a query contains hyphenated tokens such assqlite-vec,2026-05-04, ormulti-agent, while semantic QMD searches should avoid word-internal hyphens that QMD v2.0.1 treats as NOT-operator syntax.upstream/mainand PR head0f7460c4d99d028e07a629bfd178afc105a6c308.upstream/mainandfork/fix-qmd-hyphenated-memory-search; in each worktree rannode --import ./node_modules/tsx/dist/loader.mjs ./probe-qmd-hyphenated-search.mjs. The probe imported the realQmdMemoryManagerand called the realbuildV2Searches()payload-construction seam for hybridqueryand vector-onlyvsearchpayloads.upstream/mainpreserves the lexical query but sends hyphenated text to semanticvec,hyde, and vector-only payloads. PR head preserves the raw lexical branch while normalizing only semantic payloads, so exact lexical recall remains available and QMD semantic validation no longer sees word-internal hyphens.memory_searchcommand against an installed QMD index was run in this environment. The proof verifies the runtime payload-construction seam that feeds QMD MCP calls.