Skip to content

codedb search returns no results for identifier terms that word indexes #547

@justrach

Description

@justrach

Problem

codedb search <term> returns no results for identifier-shaped terms that clearly exist in the index — even though codedb word <term> finds them with 1000+ hits. So full-text (trigram) search is blind to identifiers, while word (inverted index) is not.

Repro (openclaw, 13,654 files indexed)

$ codedb <root> search openrouter
✗ no results for "openrouter"
$ codedb <root> word openrouter
✓ 1227 hits for 'openrouter'

$ codedb <root> search imessage
✗ no results for "imessage"

# but common words work fine:
$ codedb <root> search function     ✓ 50 results
$ codedb <root> search import        ✓ 50 results

So search works for common words but returns nothing for identifiers/feature names — exactly the terms a code-intelligence query is most likely to use.

Impact

Any consumer that uses search for concept/identifier retrieval gets an empty set. In engram (which re-ranks codedb search candidates) this made retrieval silently degenerate on identifier-heavy repos — fetchQuery surfaced zero candidates and had to fall back to word. It also shows up in the engram benchmark as "the changed file isn't a lexical hit" misses (e.g. searchContent, symbol, validation, correctness on this very repo).

Likely cause

Trigram search and the word inverted index diverge: identifiers (camelCase, long/low-frequency tokens) appear in word but not in search results. Either they aren't trigram-indexed/scored, or search filters them out.

Suggested fix

Have search consult the word inverted index (union, or fall back when trigram is empty) for identifier-like queries — or ensure identifiers are trigram-indexed so search and word agree on what exists.

Surfaced while running an ALMA-style retrieval experiment with engram over codedb features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions