perf(memory): stop re-injecting same memories on every tool loop iteration

## Problem

Memory recall runs on every `FireLlmCall()`, including follow-up calls after
each tool result within a single turn. In a 14-iteration tool chain, the same
3 memories get injected 14 times, burning tokens without adding information.

### Evidence

Session `D0AC6CKBK5K/1774102235.362309`:
- 144 LLM calls, 144 memory recalls (1:1)
- `doc-04c4f3f82b244cefbf9e4cceff057f00` injected in **76 of 144** calls
- Same 3-item recall set repeated across every tool iteration within a turn
- The user hasn't said anything new between tool iterations — the recall
  query is identical, so the results are identical

### Token cost

At 3 memories × ~200 tokens each × 14 iterations = ~8,400 wasted tokens per
multi-step turn. Across a 144-call session, this adds up significantly and
contributes to faster context growth → earlier compaction → context loss.

## Proposed Fix (Three Layers)

### 1. Only recall at turn boundaries, not tool loop iterations

`FireLlmCall()` currently calls `ResolveRecallBundle()` on every invocation.
It should only recall when:
- A new user message starts a turn (from `Ready` → `Processing`)
- A buffered user message is drained mid-loop (new user input = new context)

Tool loop follow-ups (`FireLlmCall()` after `ToolExecutionCompleted`) should
reuse the recall from the start of the turn since the query hasn't changed.

### 2. Exclusion-based progressive recall across turns

Maintain a set of memory doc IDs that have already been injected in this
session. Pass this set as an **exclusion filter** to the recall query so
previously-seen memories don't keep winning the top-N ranking.

This turns recall into a progressive exploration of the memory space:

```
Turn 1: recall → docs A, B, C → inject, add to seen set
Turn 2: recall with exclude={A,B,C} → docs D, E, F → inject new ones
Turn 3: recall with exclude={A,B,C,D,E,F} → docs G, H → inject new ones
```

Over the course of a session, the bot draws from a **wider pool** of its
memory rather than fixating on the same top-3 docs. Each turn surfaces
memories that were previously crowded out by the dominant matches.

**When to reset the exclusion set:**
- On compaction — earlier injections are gone from context, so previously
  seen memories may need to be re-surfaced
- If the exclusion set grows large enough that recall returns zero results,
  clear it and start fresh

**When to refresh recall mid-turn:**
- When a buffered user message is drained mid-loop (PR #351) — the new
  user input changes the recall query, so a fresh recall with the updated
  query (still applying exclusions) is appropriate

### 3. Track injection state per session (like skills)

Similar to how skills track `_loadedSkillNames` and only inject new ones,
maintain `_injectedMemoryIds` as session-level state:
- Add IDs when memories are injected into the context
- Skip re-injection if the recalled set is a subset of already-injected IDs
- Reset on compaction (since compaction may discard the earlier injections)

## Impact

**Token efficiency:**
- 14x reduction in memory-related token usage per multi-step turn
- A 14-iteration turn goes from ~8,400 tokens of memory content to ~600

**Context longevity:**
- Slower context growth → delays compaction → preserves more history
- Fewer compaction cycles → less context loss (#318)

**Memory breadth:**
- Progressive exclusion surfaces diverse memories across a session
- Bot develops richer awareness over multi-turn conversations instead of
  fixating on the same 3 most-similar documents

**No behavioral regression:**
- The LLM still sees recalled memories — just not redundantly
- First injection per memory is identical to current behavior
- Subsequent turns get fresh memories that are still relevant but different

## Implementation Notes

The `SQLiteMemoryRecallCoordinator` already accepts query parameters. Adding
an `excludeIds` parameter to the recall interface is straightforward:

```csharp
// Current
ResolveRecallBundle(recallQuery)

// Proposed
ResolveRecallBundle(recallQuery, excludeIds: _injectedMemoryIds)
```

The SQLite query adds `WHERE id NOT IN (...)` to the candidate selection.

## Related

- #350 / PR #351 — runaway tool loop fix (mid-loop buffer drain)
- #318 — compaction quality (slower context growth = less compaction needed)
- #355 — semantic skill discovery (similar deduplication pattern for skills)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(memory): stop re-injecting same memories on every tool loop iteration #370

Problem

Evidence

Token cost

Proposed Fix (Three Layers)

1. Only recall at turn boundaries, not tool loop iterations

2. Exclusion-based progressive recall across turns

3. Track injection state per session (like skills)

Impact

Implementation Notes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf(memory): stop re-injecting same memories on every tool loop iteration #370

Description

Problem

Evidence

Token cost

Proposed Fix (Three Layers)

1. Only recall at turn boundaries, not tool loop iterations

2. Exclusion-based progressive recall across turns

3. Track injection state per session (like skills)

Impact

Implementation Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions