Skip to content

Lex search fails on hyphenated identifiers (e.g. DEC-0054, RFC-0011) #417

@fxstein

Description

@fxstein

Bug

buildFTS5Query() in store.ts mishandles hyphenated identifiers in lex queries. Bare DEC-0054 is parsed as negation (DEC minus 0054). Quoted "DEC-0054" gets sanitized to dec0054 which doesn't match the FTS5 unicode61 index tokens (dec, 0054).

Related: #414 (same hyphen issue, but for vec/hyde queries)

Reproduction

lex "DEC-0054"   → 0 results   (quoted phrase — sanitizer strips hyphen, becomes "dec0054")
lex DEC-0054     → 0 results   (bare term — parsed as "DEC" minus "0054", negation)
lex "dec 0054"   → works ✅    (workaround — but nobody types IDs this way)

Expected

"DEC-0054" and DEC-0054 should both find documents containing DEC-0054.

Root Cause

Two issues in buildFTS5Query():

  1. Bare terms: DEC-0054 hits the negation branch — hyphen is interpreted as -term operator.

  2. Quoted phrases: "DEC-0054" goes through sanitizeFTS5Term() which strips all non-letter/non-number characters (/[^\p{L}\p{N}']/gu), producing dec0054. The FTS5 unicode61 tokenizer splits hyphens at index time into separate tokens (dec, 0054), but the query sanitizer concatenates them.

Proposed Fix

  • Quoted phrases: split on [\s-]+ instead of \s+, so "DEC-0054" becomes phrase "dec 0054"
  • Bare terms: when a term contains internal hyphens (not leading), split into multiple positive terms: DEC-0054"dec"* "0054"*

This aligns query tokenization with unicode61 index tokenization.

Impact

Hyphenated identifiers are standard in technical knowledge bases: RFC-0011, DEC-0054, GDE-0003, PIO-225, CVE-2024-1234. All are unsearchable via lex without manually replacing hyphens with spaces.

Environment

  • QMD: v2.0.1
  • Platform: macOS (Apple Silicon)
  • Node: v24.2.0
  • Installed via: npm

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions