Bug
buildFTS5Query() in store.ts mishandles hyphenated identifiers in lex queries. Bare DEC-0054 is parsed as negation (DEC minus 0054). Quoted "DEC-0054" gets sanitized to dec0054 which doesn't match the FTS5 unicode61 index tokens (dec, 0054).
Related: #414 (same hyphen issue, but for vec/hyde queries)
Reproduction
lex "DEC-0054" → 0 results (quoted phrase — sanitizer strips hyphen, becomes "dec0054")
lex DEC-0054 → 0 results (bare term — parsed as "DEC" minus "0054", negation)
lex "dec 0054" → works ✅ (workaround — but nobody types IDs this way)
Expected
"DEC-0054" and DEC-0054 should both find documents containing DEC-0054.
Root Cause
Two issues in buildFTS5Query():
-
Bare terms: DEC-0054 hits the negation branch — hyphen is interpreted as -term operator.
-
Quoted phrases: "DEC-0054" goes through sanitizeFTS5Term() which strips all non-letter/non-number characters (/[^\p{L}\p{N}']/gu), producing dec0054. The FTS5 unicode61 tokenizer splits hyphens at index time into separate tokens (dec, 0054), but the query sanitizer concatenates them.
Proposed Fix
- Quoted phrases: split on
[\s-]+ instead of \s+, so "DEC-0054" becomes phrase "dec 0054"
- Bare terms: when a term contains internal hyphens (not leading), split into multiple positive terms:
DEC-0054 → "dec"* "0054"*
This aligns query tokenization with unicode61 index tokenization.
Impact
Hyphenated identifiers are standard in technical knowledge bases: RFC-0011, DEC-0054, GDE-0003, PIO-225, CVE-2024-1234. All are unsearchable via lex without manually replacing hyphens with spaces.
Environment
- QMD: v2.0.1
- Platform: macOS (Apple Silicon)
- Node: v24.2.0
- Installed via: npm
Bug
buildFTS5Query()instore.tsmishandles hyphenated identifiers in lex queries. BareDEC-0054is parsed as negation (DECminus0054). Quoted"DEC-0054"gets sanitized todec0054which doesn't match the FTS5unicode61index tokens (dec,0054).Related: #414 (same hyphen issue, but for vec/hyde queries)
Reproduction
Expected
"DEC-0054"andDEC-0054should both find documents containingDEC-0054.Root Cause
Two issues in
buildFTS5Query():Bare terms:
DEC-0054hits the negation branch — hyphen is interpreted as-termoperator.Quoted phrases:
"DEC-0054"goes throughsanitizeFTS5Term()which strips all non-letter/non-number characters (/[^\p{L}\p{N}']/gu), producingdec0054. The FTS5unicode61tokenizer splits hyphens at index time into separate tokens (dec,0054), but the query sanitizer concatenates them.Proposed Fix
[\s-]+instead of\s+, so"DEC-0054"becomes phrase"dec 0054"DEC-0054→"dec"* "0054"*This aligns query tokenization with
unicode61index tokenization.Impact
Hyphenated identifiers are standard in technical knowledge bases:
RFC-0011,DEC-0054,GDE-0003,PIO-225,CVE-2024-1234. All are unsearchable via lex without manually replacing hyphens with spaces.Environment