Skip to content

feat(memory): add recall observability for precision diagnosis #561

@Aaronontheweb

Description

@Aaronontheweb

Summary

The automatic memory recall pipeline lacks sufficient logging to diagnose precision issues. When irrelevant memories are injected into context, there is no structured trace showing why those candidates were selected over others.

Current State

The only recall log is turn_memory_recall which emits:

  • degraded (bool)
  • stage (string)
  • durationMs (int)
  • itemCount (int)
  • itemIds (comma-separated)

The SQLiteMemoryRecallCoordinator does emit memory_retrieval_request_plan and memory_retrieval_candidate_selection structured logs, but these don't appear in the daemon log for the observed sessions — possibly because they're at a different log level or the structured log context doesn't propagate.

Required Observability

Per-turn recall trace (always logged)

  • FTS query terms used
  • Number of raw FTS candidates returned
  • Number of candidates after audience/boundary filtering
  • Number of candidates after selector scoring
  • Final selected item IDs with their scores

Detailed candidate trace (debug-level)

  • Per-candidate: ID, title, score breakdown (lexical, facet, anchor, soft-scope, domain affinity)
  • Rejected candidates: top N runners-up with their scores (to understand near-misses)

Recall quality metrics (periodic/stats)

  • Average candidate pool size per recall
  • Score distribution (min/max/median of selected vs rejected)
  • Progressive recall exhaustion rate (how often all candidates were already injected)

Why This Matters

Without this data, diagnosing recall precision issues (#559) requires manually querying the SQLite database and cross-referencing with daemon logs. The retrieval planner, candidate selector, and score function are all opaque at runtime.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    context-pipelineLLM context assembly: prompt layers, dynamic injection, memory recall, temporal groundingmemoryMemory formation, recall, curation pipelinestatsStats, metrics, and telemetry

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions