Problem
Memory recall runs on every FireLlmCall(), including follow-up calls after
each tool result within a single turn. In a 14-iteration tool chain, the same
3 memories get injected 14 times, burning tokens without adding information.
Evidence
Session D0AC6CKBK5K/1774102235.362309:
- 144 LLM calls, 144 memory recalls (1:1)
doc-04c4f3f82b244cefbf9e4cceff057f00 injected in 76 of 144 calls
- Same 3-item recall set repeated across every tool iteration within a turn
- The user hasn't said anything new between tool iterations — the recall
query is identical, so the results are identical
Token cost
At 3 memories × ~200 tokens each × 14 iterations = ~8,400 wasted tokens per
multi-step turn. Across a 144-call session, this adds up significantly and
contributes to faster context growth → earlier compaction → context loss.
Proposed Fix (Three Layers)
1. Only recall at turn boundaries, not tool loop iterations
FireLlmCall() currently calls ResolveRecallBundle() on every invocation.
It should only recall when:
- A new user message starts a turn (from
Ready → Processing)
- A buffered user message is drained mid-loop (new user input = new context)
Tool loop follow-ups (FireLlmCall() after ToolExecutionCompleted) should
reuse the recall from the start of the turn since the query hasn't changed.
2. Exclusion-based progressive recall across turns
Maintain a set of memory doc IDs that have already been injected in this
session. Pass this set as an exclusion filter to the recall query so
previously-seen memories don't keep winning the top-N ranking.
This turns recall into a progressive exploration of the memory space:
Turn 1: recall → docs A, B, C → inject, add to seen set
Turn 2: recall with exclude={A,B,C} → docs D, E, F → inject new ones
Turn 3: recall with exclude={A,B,C,D,E,F} → docs G, H → inject new ones
Over the course of a session, the bot draws from a wider pool of its
memory rather than fixating on the same top-3 docs. Each turn surfaces
memories that were previously crowded out by the dominant matches.
When to reset the exclusion set:
- On compaction — earlier injections are gone from context, so previously
seen memories may need to be re-surfaced
- If the exclusion set grows large enough that recall returns zero results,
clear it and start fresh
When to refresh recall mid-turn:
3. Track injection state per session (like skills)
Similar to how skills track _loadedSkillNames and only inject new ones,
maintain _injectedMemoryIds as session-level state:
- Add IDs when memories are injected into the context
- Skip re-injection if the recalled set is a subset of already-injected IDs
- Reset on compaction (since compaction may discard the earlier injections)
Impact
Token efficiency:
- 14x reduction in memory-related token usage per multi-step turn
- A 14-iteration turn goes from ~8,400 tokens of memory content to ~600
Context longevity:
Memory breadth:
- Progressive exclusion surfaces diverse memories across a session
- Bot develops richer awareness over multi-turn conversations instead of
fixating on the same 3 most-similar documents
No behavioral regression:
- The LLM still sees recalled memories — just not redundantly
- First injection per memory is identical to current behavior
- Subsequent turns get fresh memories that are still relevant but different
Implementation Notes
The SQLiteMemoryRecallCoordinator already accepts query parameters. Adding
an excludeIds parameter to the recall interface is straightforward:
// Current
ResolveRecallBundle(recallQuery)
// Proposed
ResolveRecallBundle(recallQuery, excludeIds: _injectedMemoryIds)
The SQLite query adds WHERE id NOT IN (...) to the candidate selection.
Related
Problem
Memory recall runs on every
FireLlmCall(), including follow-up calls aftereach tool result within a single turn. In a 14-iteration tool chain, the same
3 memories get injected 14 times, burning tokens without adding information.
Evidence
Session
D0AC6CKBK5K/1774102235.362309:doc-04c4f3f82b244cefbf9e4cceff057f00injected in 76 of 144 callsquery is identical, so the results are identical
Token cost
At 3 memories × ~200 tokens each × 14 iterations = ~8,400 wasted tokens per
multi-step turn. Across a 144-call session, this adds up significantly and
contributes to faster context growth → earlier compaction → context loss.
Proposed Fix (Three Layers)
1. Only recall at turn boundaries, not tool loop iterations
FireLlmCall()currently callsResolveRecallBundle()on every invocation.It should only recall when:
Ready→Processing)Tool loop follow-ups (
FireLlmCall()afterToolExecutionCompleted) shouldreuse the recall from the start of the turn since the query hasn't changed.
2. Exclusion-based progressive recall across turns
Maintain a set of memory doc IDs that have already been injected in this
session. Pass this set as an exclusion filter to the recall query so
previously-seen memories don't keep winning the top-N ranking.
This turns recall into a progressive exploration of the memory space:
Over the course of a session, the bot draws from a wider pool of its
memory rather than fixating on the same top-3 docs. Each turn surfaces
memories that were previously crowded out by the dominant matches.
When to reset the exclusion set:
seen memories may need to be re-surfaced
clear it and start fresh
When to refresh recall mid-turn:
user input changes the recall query, so a fresh recall with the updated
query (still applying exclusions) is appropriate
3. Track injection state per session (like skills)
Similar to how skills track
_loadedSkillNamesand only inject new ones,maintain
_injectedMemoryIdsas session-level state:Impact
Token efficiency:
Context longevity:
Memory breadth:
fixating on the same 3 most-similar documents
No behavioral regression:
Implementation Notes
The
SQLiteMemoryRecallCoordinatoralready accepts query parameters. Addingan
excludeIdsparameter to the recall interface is straightforward:The SQLite query adds
WHERE id NOT IN (...)to the candidate selection.Related