fix(search): default FTS5 multi-keyword queries to OR instead of AND#9651
Open
memphislee09-source wants to merge 1 commit into
Open
Conversation
FTS5 treats space-separated terms as AND by default, which makes multi-keyword session search (e.g. '特洛伊 海伦') return almost nothing since few messages contain ALL search terms simultaneously. Add Step 7 to _sanitize_fts5_query that automatically joins plain terms with OR when no explicit boolean operators (AND/OR/NOT) are present. Quoted phrases are preserved as single tokens during conversion. Before: '特洛伊 海伦' → implicit AND → 0 results After: '特洛伊 OR 海伦' → 5+ results from matching sessions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
FTS5 treats space-separated terms as AND by default. This makes multi-keyword session search (e.g.
特洛伊 海伦,Trojan war Helen) return almost nothing, since few individual messages contain ALL search terms simultaneously. Users expect recall-style search to match ANY term, not ALL.Root Cause
_sanitize_fts5_queryinhermes_state.pypasses queries straight through to FTS5 MATCH, which interprets spaces as implicit AND. A query like特洛伊 海伦requires both terms in the same message — only 0-1 results vs. the expected 5+.Fix
Added Step 7 to
_sanitize_fts5_query: when no explicit boolean operators (AND/OR/NOT) are present, automatically join space-separated terms withOR. Quoted phrases are preserved as single tokens.特洛伊 海伦→ implicit AND → 0 results特洛伊 OR 海伦→ 5+ results"exact phrase"→ preserved ✅"exact phrase"→ preserved ✅python NOT java→ preserved ✅python NOT java→ preserved ✅docker OR kubernetes→ preserved ✅docker OR kubernetes→ preserved ✅Changes
hermes_state.py— Added Step 7 OR conversion in_sanitize_fts5_querytests/test_hermes_state.py— Updated test expectation forhello world→hello OR worldTesting
session_search('Trojan war Helen 特洛伊 海伦')now returns 2 sessions instead of 0