Skip to content

perf(memory): stop re-injecting same memories on every tool loop iteration #370

@Aaronontheweb

Description

@Aaronontheweb

Problem

Memory recall runs on every FireLlmCall(), including follow-up calls after
each tool result within a single turn. In a 14-iteration tool chain, the same
3 memories get injected 14 times, burning tokens without adding information.

Evidence

Session D0AC6CKBK5K/1774102235.362309:

  • 144 LLM calls, 144 memory recalls (1:1)
  • doc-04c4f3f82b244cefbf9e4cceff057f00 injected in 76 of 144 calls
  • Same 3-item recall set repeated across every tool iteration within a turn
  • The user hasn't said anything new between tool iterations — the recall
    query is identical, so the results are identical

Token cost

At 3 memories × ~200 tokens each × 14 iterations = ~8,400 wasted tokens per
multi-step turn. Across a 144-call session, this adds up significantly and
contributes to faster context growth → earlier compaction → context loss.

Proposed Fix (Three Layers)

1. Only recall at turn boundaries, not tool loop iterations

FireLlmCall() currently calls ResolveRecallBundle() on every invocation.
It should only recall when:

  • A new user message starts a turn (from ReadyProcessing)
  • A buffered user message is drained mid-loop (new user input = new context)

Tool loop follow-ups (FireLlmCall() after ToolExecutionCompleted) should
reuse the recall from the start of the turn since the query hasn't changed.

2. Exclusion-based progressive recall across turns

Maintain a set of memory doc IDs that have already been injected in this
session. Pass this set as an exclusion filter to the recall query so
previously-seen memories don't keep winning the top-N ranking.

This turns recall into a progressive exploration of the memory space:

Turn 1: recall → docs A, B, C → inject, add to seen set
Turn 2: recall with exclude={A,B,C} → docs D, E, F → inject new ones
Turn 3: recall with exclude={A,B,C,D,E,F} → docs G, H → inject new ones

Over the course of a session, the bot draws from a wider pool of its
memory rather than fixating on the same top-3 docs. Each turn surfaces
memories that were previously crowded out by the dominant matches.

When to reset the exclusion set:

  • On compaction — earlier injections are gone from context, so previously
    seen memories may need to be re-surfaced
  • If the exclusion set grows large enough that recall returns zero results,
    clear it and start fresh

When to refresh recall mid-turn:

3. Track injection state per session (like skills)

Similar to how skills track _loadedSkillNames and only inject new ones,
maintain _injectedMemoryIds as session-level state:

  • Add IDs when memories are injected into the context
  • Skip re-injection if the recalled set is a subset of already-injected IDs
  • Reset on compaction (since compaction may discard the earlier injections)

Impact

Token efficiency:

  • 14x reduction in memory-related token usage per multi-step turn
  • A 14-iteration turn goes from ~8,400 tokens of memory content to ~600

Context longevity:

Memory breadth:

  • Progressive exclusion surfaces diverse memories across a session
  • Bot develops richer awareness over multi-turn conversations instead of
    fixating on the same 3 most-similar documents

No behavioral regression:

  • The LLM still sees recalled memories — just not redundantly
  • First injection per memory is identical to current behavior
  • Subsequent turns get fresh memories that are still relevant but different

Implementation Notes

The SQLiteMemoryRecallCoordinator already accepts query parameters. Adding
an excludeIds parameter to the recall interface is straightforward:

// Current
ResolveRecallBundle(recallQuery)

// Proposed
ResolveRecallBundle(recallQuery, excludeIds: _injectedMemoryIds)

The SQLite query adds WHERE id NOT IN (...) to the candidate selection.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions