Problem
codedb search <term> returns no results for identifier-shaped terms that clearly exist in the index — even though codedb word <term> finds them with 1000+ hits. So full-text (trigram) search is blind to identifiers, while word (inverted index) is not.
Repro (openclaw, 13,654 files indexed)
$ codedb <root> search openrouter
✗ no results for "openrouter"
$ codedb <root> word openrouter
✓ 1227 hits for 'openrouter'
$ codedb <root> search imessage
✗ no results for "imessage"
# but common words work fine:
$ codedb <root> search function ✓ 50 results
$ codedb <root> search import ✓ 50 results
So search works for common words but returns nothing for identifiers/feature names — exactly the terms a code-intelligence query is most likely to use.
Impact
Any consumer that uses search for concept/identifier retrieval gets an empty set. In engram (which re-ranks codedb search candidates) this made retrieval silently degenerate on identifier-heavy repos — fetchQuery surfaced zero candidates and had to fall back to word. It also shows up in the engram benchmark as "the changed file isn't a lexical hit" misses (e.g. searchContent, symbol, validation, correctness on this very repo).
Likely cause
Trigram search and the word inverted index diverge: identifiers (camelCase, long/low-frequency tokens) appear in word but not in search results. Either they aren't trigram-indexed/scored, or search filters them out.
Suggested fix
Have search consult the word inverted index (union, or fall back when trigram is empty) for identifier-like queries — or ensure identifiers are trigram-indexed so search and word agree on what exists.
Surfaced while running an ALMA-style retrieval experiment with engram over codedb features.
Problem
codedb search <term>returns no results for identifier-shaped terms that clearly exist in the index — even thoughcodedb word <term>finds them with 1000+ hits. So full-text (trigram)searchis blind to identifiers, whileword(inverted index) is not.Repro (openclaw, 13,654 files indexed)
So
searchworks for common words but returns nothing for identifiers/feature names — exactly the terms a code-intelligence query is most likely to use.Impact
Any consumer that uses
searchfor concept/identifier retrieval gets an empty set. In engram (which re-ranks codedbsearchcandidates) this made retrieval silently degenerate on identifier-heavy repos —fetchQuerysurfaced zero candidates and had to fall back toword. It also shows up in the engram benchmark as "the changed file isn't a lexical hit" misses (e.g.searchContent,symbol,validation,correctnesson this very repo).Likely cause
Trigram
searchand thewordinverted index diverge: identifiers (camelCase, long/low-frequency tokens) appear inwordbut not insearchresults. Either they aren't trigram-indexed/scored, orsearchfilters them out.Suggested fix
Have
searchconsult thewordinverted index (union, or fall back when trigram is empty) for identifier-like queries — or ensure identifiers are trigram-indexed sosearchandwordagree on what exists.Surfaced while running an ALMA-style retrieval experiment with engram over codedb features.