Skip to content

feat(memory): migrate recall search from LIKE to SQLite FTS5#436

Merged
Aaronontheweb merged 4 commits into
devfrom
claude-wt-fts5
Mar 26, 2026
Merged

feat(memory): migrate recall search from LIKE to SQLite FTS5#436
Aaronontheweb merged 4 commits into
devfrom
claude-wt-fts5

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

  • Replace LIKE '%term%' substring queries with FTS5 MATCH across all 4 memory recall search paths (SearchByPlanInternalAsync, SearchAutoRecallDocumentsAsync, SearchMemoriesAsync, FindCandidatesByContentAsync)
  • Add memory_documents_fts and memory_records_fts virtual tables with Porter stemmer tokenization (porter unicode61 remove_diacritics 2)
  • FTS tables rebuilt on each startup from base tables; runtime mutations INSERT-only (phantoms cleaned at next restart)
  • BM25 relevance scoring with column weights (title=10, body=1, aliases=5, facets=3) replaces naive LIKE-hit counting
  • CTE-based queries overfetch from FTS then JOIN with base tables for policy filtering (domain, boundary, audience, sensitivity, recall_mode, expiry)
  • Post-SQL scoring pipeline (DeterministicCandidateSelector + RecallRank) unchanged
  • Dead TokenizeQuery method and private StopWords field removed; shared TextTokenizer remains for query pre-processing

Closes #412

Test plan

  • Full solution builds with 0 warnings, 0 errors
  • All 689 existing tests pass (including 103 memory-related)
  • Slopwatch: 0 new violations
  • Manual: start daemon, trigger memory writes via Slack, verify find_memories returns FTS-quality results

…t search

Replace LIKE '%term%' substring queries with FTS5 MATCH for all memory
recall paths. This improves search precision (word-boundary matching
instead of substring), adds BM25 relevance scoring, and provides
index-backed queries that scale with document count.

Changes:
- Add memory_documents_fts and memory_records_fts virtual tables with
  Porter stemmer tokenization, rebuilt on each startup
- Rewrite SearchByPlanInternalAsync, SearchAutoRecallDocumentsAsync,
  SearchMemoriesAsync, and FindCandidatesByContentAsync to use CTE-based
  FTS5 MATCH with bm25() ranking and policy-filter JOINs
- Add FTS insert-on-write to all mutation paths (UpsertDocumentAsync,
  UpdateDocumentTextAsync, SupersedeRecordAsync, both batch methods)
- Remove dead TokenizeQuery method and private StopWords set (shared
  TextTokenizer remains for query pre-processing)

Closes #412
@Aaronontheweb Aaronontheweb added the memory Memory formation, recall, curation pipeline label Mar 26, 2026
@Aaronontheweb Aaronontheweb marked this pull request as ready for review March 26, 2026 16:39
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) March 26, 2026 16:40
@Aaronontheweb Aaronontheweb merged commit 0836e75 into dev Mar 26, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the claude-wt-fts5 branch March 26, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

memory Memory formation, recall, curation pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(memory): migrate recall search from LIKE to SQLite FTS5 full-text search

1 participant