feat: configurable search sort (recency/relevance/hybrid) + phrase matching fix#296
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves lcm_grep search quality by adding a configurable sort order (recency/relevance/hybrid) and updating FTS5 query sanitization to preserve user-quoted phrases as phrase queries.
Changes:
- Added
sortparameter tolcm_greptool schema and forwarded it into retrieval. - Implemented
recency/relevance/hybridresult sorting inRetrievalEngine.grep. - Updated FTS5 query sanitization to keep
"..."groups intact as phrases rather than splitting on whitespace.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/tools/lcm-grep-tool.ts | Adds sort param to tool schema and forwards it to retrieval. |
| src/retrieval.ts | Introduces sort modes for grep results (recency/relevance/hybrid). |
| src/store/fts5-sanitize.ts | Preserves user-quoted phrases during FTS5 query sanitization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Part of the LCM Performance & Cache Optimization Sprint — see #297 for the full tracking issue linking all 5 PRs. |
Addresses PR Martian-Engineering#296 review comments: 1. Hybrid sort math: FTS5 rank is negative (more negative = better). Old formula multiplied negative rank by age factor, making older results rank BETTER (inverted). Fixed: negate rank to get positive relevance score, divide by age penalty, negate result for ascending sort. 2. Sort scope: documented that sort is applied post-LIMIT — the store queries ORDER BY created_at DESC LIMIT ?, so relevance/hybrid modes only re-rank within the top-N-by-recency window. Known limitation. 3. Added 4 phrase matching tests to fts5-sanitize.test.ts: - Multi-word quoted phrase preserved - Multiple quoted phrases preserved - Mixed quoted and unquoted terms - Empty quoted phrase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses PR Martian-Engineering#296 review round 3: 1. Relevance sort: added recency tiebreaker when FTS5 ranks are equal, preventing implementation-dependent ordering. 2. Sort mode validation: relevance/hybrid silently fall back to recency when mode is regex (FTS5 rank not available). Previously accepted without effect. 3. Removed unused 'remaining' variable in fts5-sanitize.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // AGE_DECAY_RATE: ~1% relevance penalty per 10 hours of age. | ||
| // At 100h: 10% penalty. At 1000h (~6 weeks): 50% penalty. |
There was a problem hiding this comment.
The AGE_DECAY_RATE comment math is slightly off: with agePenalty = 1 + ageHours * 0.001, at 100h the relevance becomes 1/1.1 (~9.1% reduction), not 10%. Either adjust the wording (e.g., “~9%”) or tweak the rate/formula to match the stated percentages so future tuning is based on correct intuition.
| // AGE_DECAY_RATE: ~1% relevance penalty per 10 hours of age. | |
| // At 100h: 10% penalty. At 1000h (~6 weeks): 50% penalty. | |
| // AGE_DECAY_RATE: ~1% increase in the age penalty per 10 hours of age. | |
| // At 100h: ~9% relevance reduction. At 1000h (~6 weeks): 50% reduction. |
There was a problem hiding this comment.
Valid — the exact math at 100h is 1/(1+0.1) = 9.1% reduction, not 10%. Will fix the comment to say '~9%' instead of '~10%'. The rate itself is correct — just the human-readable approximation was rounded.
…e matching Two search quality improvements: 1. Add sort parameter to lcm_grep: "recency" (default, preserves current behavior), "relevance" (FTS5 BM25 rank — best match first), or "hybrid" (blends relevance with recency). FTS5 already computes BM25 scores on every search but they were being ignored. The "recency" default preserves the original design intent — the DAG structure already filters for importance through compression, and lcm_expand_query handles smart retrieval via sub-agent LLM reasoning. But for users with 200K+ message histories, relevance sorting surfaces important older matches that recency-only misses. 2. Fix phrase matching in fts5-sanitize.ts: user-quoted phrases like "error handling" were being split into "error" AND "handling" (two separate terms). Now preserved as a single phrase match. Fixes Martian-Engineering#293 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses PR Martian-Engineering#296 review comments: 1. Hybrid sort math: FTS5 rank is negative (more negative = better). Old formula multiplied negative rank by age factor, making older results rank BETTER (inverted). Fixed: negate rank to get positive relevance score, divide by age penalty, negate result for ascending sort. 2. Sort scope: documented that sort is applied post-LIMIT — the store queries ORDER BY created_at DESC LIMIT ?, so relevance/hybrid modes only re-rank within the top-N-by-recency window. Known limitation. 3. Added 4 phrase matching tests to fts5-sanitize.test.ts: - Multi-word quoted phrase preserved - Multiple quoted phrases preserved - Mixed quoted and unquoted terms - Empty quoted phrase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses PR Martian-Engineering#296 review round 3: 1. Relevance sort: added recency tiebreaker when FTS5 ranks are equal, preventing implementation-dependent ordering. 2. Sort mode validation: relevance/hybrid silently fall back to recency when mode is regex (FTS5 rank not available). Previously accepted without effect. 3. Removed unused 'remaining' variable in fts5-sanitize.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ription Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move lcm_grep sort handling into the FTS query layer so relevance and hybrid ordering are applied before LIMIT truncates candidates. Add a changeset, update agent-facing grep guidance and delegated retrieval prompts, and cover the new behavior with focused tests. Regeneration-Prompt: |\n Address the PR review findings on the search-quality work for lcm_grep. Preserve the new sort parameter and quoted-phrase behavior, but fix the implementation so full-text relevance and hybrid ordering happen before LIMIT is applied instead of reranking a recency-truncated window in memory. Keep regex and fallback behavior intact. Also add the missing changeset because this is user-facing plugin behavior. Finally inspect the prompt and tool-instruction surfaces that teach the model how to use LCM recall tools, and update them so agents know when to prefer full_text, when to quote exact phrases, and when relevance or hybrid sorting is appropriate. Add regression tests for the ranking path and prompt guidance.
fff8b3a to
d582a4c
Compare
|
Thank you! |
Summary
Two search quality improvements for `lcm_grep`:
Fixes #293
Why This Matters
FTS5 already computes BM25 relevance scores on every search, but they were being discarded at sort time. On conversations with 200K+ messages, recency-only sorting buries relevant older matches while surfacing recent-but-tangential hits.
Design Rationale (from LCM architecture analysis)
The `"recency"` default is preserved because LCM's design intentionally relies on:
The `"relevance"` option is for users with very long histories where the routing function benefits from surfacing the best keyword match, not just the most recent one. The agent chooses per-query — use recency for "what just happened?" and relevance for "find the discussion about X from last week."
Changes
Test plan