Skip to content

[Bug]: memory_search results dominated by session transcripts, memory files never returned #19913

@alextempr

Description

@alextempr

Summary

When using QMD as memory backend (memory.backend: "qmd"), the memory_search tool returns results almost exclusively from sessions-* collections, while memory-dir-* and memory-root-* files are systematically excluded from top results — even when they contain exact matches for the query.

Steps to reproduce

  1. Configure memory.backend: "qmd" with sessions.enabled: true
  2. Have memory files (memory/*.md) with specific facts (e.g., "Nicole and Daniele first autonomous play session")
  3. Use plugins that inject text via before_agent_start hooks (e.g., critical-rules, facts injection)
  4. Run memory_search with a query matching content in memory files

Expected behavior

Results should include matches from memory-dir-* collection (the actual memory files), not exclusively from session transcripts.

Actual behavior

All top results come from sessions-* collection. Memory files with exact matches for the query don't appear in the top 6 results at all.

OpenClaw version

2026.2.17

Operating system

macOS (Apple Silicon M4)

Install method

mac app + QMD (installed via bun), local embeddings (EmbeddingGemma 300M), sessions enabled

Logs, screenshots, and evidence

Impact and severity

Affected: Any user with QMD backend + sessions enabled + plugins that inject boilerplate
Severity: High — memory_search is unreliable; curated memory files are effectively invisible
Frequency: 100% reproducible with this config
Consequence: Agent cannot recall stored facts, relies solely on session context

Additional information

Root Cause Analysis

  • Session transcripts contain plugin-injected boilerplate (<critical-rules>, <relevant-facts>) at every turn
  • This repeated text inflates session relevance scores in hybrid search (vector + BM25 + reranking)
  • With maxResults: 6, memory files are pushed out entirely
  • BM25-only search (qmd search) correctly ranks memory files first — the issue is in hybrid query mode
  • The FTS fallback added in feat: FTS fallback + query expansion for memory search #18304 only triggers on zero results, not on poor-quality results

Workaround

Using a separate FTS5 facts database for structured recall, but this doesn't fix the core memory_search tool.

Suggested Fixes

  1. Strip plugin-injected boilerplate from session transcripts before indexing
  2. Add collection-level boosting (memory files should rank higher than sessions)
  3. Increase default maxResults or make it configurable per-collection
  4. Trigger FTS fallback when results are all from one collection type

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions