Context
A deep investigation into LCM's search architecture reveals that the recency-only ordering in lcm_grep was a deliberate design choice, not an oversight. The DAG structure itself is the relevance filter — terms that survive multi-level compression are already high-signal. The "smart" retrieval path (lcm_expand_query) uses sub-agent LLM reasoning for relevance, which is far more sophisticated than BM25.
However, for users with very long histories (200K+ messages), the inability to surface relevant older summaries when token budgets are tight is a real gap. FTS5 already computes BM25 scores — they're retrieved from the DB but discarded at sort time.
The Three Search Paths
| Tool |
Purpose |
Sort Order |
Who Uses It |
lcm_grep |
Keyword lookup → returns IDs for follow-up |
Recency |
Main agent (routing) |
lcm_describe |
Inspect a summary node's subtree structure |
N/A |
Sub-agent |
lcm_expand_query |
Autonomous sub-agent does multi-step retrieval |
Sub-agent's judgment |
Main agent (deep recall) |
lcm_expand |
Low-level DAG traversal |
By depth |
Sub-agent only |
Why Recency Was Chosen (Intentional)
- DAG = relevance filter. Terms surviving multi-level compression are already high-signal. BM25 on top double-counts.
- lcm_grep is a routing tool. Returns IDs + snippets for follow-up, not final answers. Recency is a reasonable routing default.
- lcm_expand_query handles "smart" retrieval. Sub-agent LLM reasoning >> BM25 heuristics.
Where BM25 Would Help (Without Conflicting)
1. Add configurable sort to lcm_grep (PRIMARY RECOMMENDATION)
Add a sort parameter: "recency" (default), "relevance" (ORDER BY FTS5 rank), or "hybrid".
- Preserves backward compatibility (recency default)
- RAG+LCM hybrid users get relevance option
- Pure LCM users keep current behavior
- FTS5 rank is already computed — just needs to be used in ORDER BY
2. Flip sort priority in describeAndExpand (INTERNAL OPTIMIZATION)
expansion.ts line 186-194 currently sorts recency-first, rank-as-tiebreaker. Flip to rank-first, recency-tiebreaker. This determines which summaries consume the expansion token budget — putting the most relevant ones first improves expand quality.
3. Fix phrase matching in fts5-sanitize.ts
"error handling" is split into "error" "handling" (AND of two terms) instead of preserved as a phrase. Small sanitizer fix.
4. Add periodic FTS optimization
After compaction, call INSERT INTO messages_fts(messages_fts) VALUES('optimize') to defragment. Prevents gradual search degradation.
5. Add messages_fts_cjk trigram table
CJK trigram search only covers summaries. Messages fall back to slow LIKE.
Where BM25 Could Hurt
- Stale depth-2 summaries displacing fresh context — BM25 doesn't understand DAG depth
- Double-counting importance — DAG compression + BM25 frequency compounds bias toward older compressed nodes
- Breaking agent expectations — agents expect chronological grep results
LCM vs RAG
| Aspect |
RAG |
LCM |
| Retrieval |
Vector similarity (finds paraphrases) |
Keyword + DAG traversal + sub-agent LLM |
| Ranking |
Embedding distance |
Recency + BM25-lite (assembler only) |
| Losslessness |
Lossy (chunks may miss context) |
Lossless (every message recoverable) |
| Smart retrieval |
Top-K by similarity |
Sub-agent with multi-step reasoning |
LCM's key advantage: every message is recoverable via DAG traversal. RAG loses context at chunking boundaries. LCM's gap: keyword search can't find semantic matches (synonyms, paraphrases).
Proposed Implementation
Files to change
src/tools/lcm-grep-tool.ts — add sort parameter to schema
src/retrieval.ts — add sort option to GrepInput, apply in grep()
src/expansion.ts — flip sort priority in describeAndExpand
src/store/fts5-sanitize.ts — fix phrase matching
src/db/migration.ts — add messages_fts_cjk table + periodic FTS optimize
Configuration
The sort parameter would be per-tool-call (agent chooses per query), not a global config. This lets the agent use recency for "what just happened?" and relevance for "find the discussion about X from last week."
Related
Context
A deep investigation into LCM's search architecture reveals that the recency-only ordering in
lcm_grepwas a deliberate design choice, not an oversight. The DAG structure itself is the relevance filter — terms that survive multi-level compression are already high-signal. The "smart" retrieval path (lcm_expand_query) uses sub-agent LLM reasoning for relevance, which is far more sophisticated than BM25.However, for users with very long histories (200K+ messages), the inability to surface relevant older summaries when token budgets are tight is a real gap. FTS5 already computes BM25 scores — they're retrieved from the DB but discarded at sort time.
The Three Search Paths
lcm_greplcm_describelcm_expand_querylcm_expandWhy Recency Was Chosen (Intentional)
Where BM25 Would Help (Without Conflicting)
1. Add configurable
sorttolcm_grep(PRIMARY RECOMMENDATION)Add a
sortparameter:"recency"(default),"relevance"(ORDER BY FTS5 rank), or"hybrid".2. Flip sort priority in
describeAndExpand(INTERNAL OPTIMIZATION)expansion.tsline 186-194 currently sorts recency-first, rank-as-tiebreaker. Flip to rank-first, recency-tiebreaker. This determines which summaries consume the expansion token budget — putting the most relevant ones first improves expand quality.3. Fix phrase matching in
fts5-sanitize.ts"error handling"is split into"error" "handling"(AND of two terms) instead of preserved as a phrase. Small sanitizer fix.4. Add periodic FTS optimization
After compaction, call
INSERT INTO messages_fts(messages_fts) VALUES('optimize')to defragment. Prevents gradual search degradation.5. Add
messages_fts_cjktrigram tableCJK trigram search only covers summaries. Messages fall back to slow LIKE.
Where BM25 Could Hurt
LCM vs RAG
LCM's key advantage: every message is recoverable via DAG traversal. RAG loses context at chunking boundaries. LCM's gap: keyword search can't find semantic matches (synonyms, paraphrases).
Proposed Implementation
Files to change
src/tools/lcm-grep-tool.ts— addsortparameter to schemasrc/retrieval.ts— add sort option toGrepInput, apply ingrep()src/expansion.ts— flip sort priority indescribeAndExpandsrc/store/fts5-sanitize.ts— fix phrase matchingsrc/db/migration.ts— addmessages_fts_cjktable + periodic FTS optimizeConfiguration
The
sortparameter would be per-tool-call (agent chooses per query), not a global config. This lets the agent use recency for "what just happened?" and relevance for "find the discussion about X from last week."Related