Found while building engram's codedb bridge. Sibling of #547 (single-token identifier terms that search missed but word caught — this is the multi-word analogue, and word doesn't catch it either).
Problem
A multi-word query returns zero results from both search and word, even when every individual token has plentiful hits. There is no per-token fallback, so phrase-shaped queries silently dead-end.
$ codedb ~/openclaw search "gateway websocket reconnect"
✗ no results for "gateway websocket reconnect"
$ codedb ~/openclaw search gateway
✓ 50 results for "gateway" ⚡ 3.3ms
$ codedb ~/openclaw search websocket
✓ 50 results
$ codedb ~/openclaw word "gateway websocket reconnect"
✗ no hits for 'gateway websocket reconnect'
- Expected: a multi-word query surfaces files matching the tokens (AND, or OR with files matching more tokens ranked higher).
- Actual: 0 results unless the exact phrase appears verbatim.
Why this matters for the MCP/agent use case
Agents naturally pass task phrases ("gateway websocket reconnect", "fix auth session timeout"), not single identifiers. Via codedb_search over MCP this returns an empty result set with no hint that tokenizing would have worked — the agent concludes the codebase has nothing relevant. engram now works around it by splitting the query into salient tokens and merging per-token search calls client-side, but every consumer shouldn't have to reimplement that.
Suggested fix
When a query contains whitespace and the phrase has no (or thin) hits, tokenize and run per-token matching, ranking files that hit more tokens higher (a (tokenized) note in the output would keep it transparent). Alternatively an explicit --any/--all flag — but the silent-empty default is the trap.
Seen on codedb 0.2.5824 (macOS, warm daemon, 19.7k-file indexed repo).
Found while building engram's codedb bridge. Sibling of #547 (single-token identifier terms that
searchmissed butwordcaught — this is the multi-word analogue, andworddoesn't catch it either).Problem
A multi-word query returns zero results from both
searchandword, even when every individual token has plentiful hits. There is no per-token fallback, so phrase-shaped queries silently dead-end.Why this matters for the MCP/agent use case
Agents naturally pass task phrases ("gateway websocket reconnect", "fix auth session timeout"), not single identifiers. Via
codedb_searchover MCP this returns an empty result set with no hint that tokenizing would have worked — the agent concludes the codebase has nothing relevant. engram now works around it by splitting the query into salient tokens and merging per-tokensearchcalls client-side, but every consumer shouldn't have to reimplement that.Suggested fix
When a query contains whitespace and the phrase has no (or thin) hits, tokenize and run per-token matching, ranking files that hit more tokens higher (a
(tokenized)note in the output would keep it transparent). Alternatively an explicit--any/--allflag — but the silent-empty default is the trap.Seen on
codedb 0.2.5824(macOS, warm daemon, 19.7k-file indexed repo).