Summary
mergeHybridResults computes per-result vectorScore (cosine similarity) and textScore (BM25) but discards them when building the final result object, only the weighted combined score survives. Plugins hooking after_tool_call for memory_search receive the full result payload, including score, but have no way to determine whether relevance came from semantic similarity or keyword matching.
Problem to solve
There is currently no ability to benchmark the quality of memory retrieval across embedding models, providers, and query patterns over time. The combined score alone doesn't tell you if a result ranked high because of strong vector similarity, strong keyword overlap, or a mix, which matters when evaluating whether to change embedding models, adjust hybrid weights, or diagnose why an agent is retrieving irrelevant context.
The data already exists at merge time; it's two fields that get dropped in mergeHybridResults before the result is returned.
Proposed solution
Carry vectorScore and textScore through the merge result and add them as optional fields on MemorySearchResult.
Alternatives considered
Considered a manual patch, the issue is that the patch would be overridden during a version upgrade.
Impact
Affected: users
Severity: blocks evaluation
Frequency: every memory search
Consequence: manual workaround
Evidence/examples
No response
Additional information
No response
Summary
mergeHybridResults computes per-result vectorScore (cosine similarity) and textScore (BM25) but discards them when building the final result object, only the weighted combined score survives. Plugins hooking after_tool_call for memory_search receive the full result payload, including score, but have no way to determine whether relevance came from semantic similarity or keyword matching.
Problem to solve
There is currently no ability to benchmark the quality of memory retrieval across embedding models, providers, and query patterns over time. The combined score alone doesn't tell you if a result ranked high because of strong vector similarity, strong keyword overlap, or a mix, which matters when evaluating whether to change embedding models, adjust hybrid weights, or diagnose why an agent is retrieving irrelevant context.
The data already exists at merge time; it's two fields that get dropped in mergeHybridResults before the result is returned.
Proposed solution
Carry vectorScore and textScore through the merge result and add them as optional fields on MemorySearchResult.
Alternatives considered
Considered a manual patch, the issue is that the patch would be overridden during a version upgrade.
Impact
Affected: users
Severity: blocks evaluation
Frequency: every memory search
Consequence: manual workaround
Evidence/examples
No response
Additional information
No response