[Bug]: Active memory injection breaks prompt cache hit rate (99.9% → 22%)

[Bug]: Active memory injection breaks prompt cache hit rate (99.9% → 22%)

## Phenomenon
Enabling the `active-memory` plugin causes prompt cache hit rate to collapse
from ~99.9% (clean baseline) to ~22% in production dashboard observations.

Reproduction across two Anthropic-compatible providers (independent test runs
by two coding agents) shows:
- Clean baseline (5 identical prompts, no prependContext): 99.9% warm hit
- Static 32KB prependContext (5 identical prompts, fixed content): 99.8% warm hit
- Variable 32KB prependContext (5 prompts, content changes per call): 0.0% warm hit

## Root cause
The `active-memory` plugin injects recalled memories via the `before_prompt_build`
hook, returning content through `hookResult.prependContext`. In the prompt-preparation
layer, this context is concatenated **into the user message** (not the system prompt).

For every eligible conversational reply, the plugin spawns a blocking memory
sub-agent that runs `memory_search` to recall facts. The recall query is derived
from the current user message, so different user messages → different recalled
fact lists → character-level changes in `prependContext`.

Anthropic protocol `cache_control: {type: "ephemeral"}` markers are intended to
isolate the stable system-prompt block from the variable user-message block.
However, the observed behavior across two Anthropic-compatible providers is that
the cache_control boundary does not work as expected: any character-level change
in the user-message block causes the entire prompt to miss cache (0% hit rate),
instead of just the variable part.

## Reproduction
1. Configure any Anthropic-compatible provider
2. Enable `active-memory` with `config.agents: ["main"]`
3. Send 5 near-identical user messages (e.g. "hi count 3")
4. With prependContext: 32KB of varied content per call, observe:
   - cache_creation_input_tokens and cache_read_input_tokens both = 0
   - All 5 calls rebuild the entire prompt (warm hit rate = 0%)
5. Production dashboard: 22% (weighted average across triggered and non-triggered
   conversational turns)

## Code-level trace

### prependContext injection site
In the prompt preparation layer, `hookResult.prependContext` is concatenated
directly before the user prompt:
```
// selection-DrXxngyT.js ~L12796
effectivePrompt = `${hookResult.prependContext}\n\n${effectivePrompt}`;
```

### active-memory hook returns prependContext
```
// active-memory/index.js ~L1743: before_prompt_build hook
// ~L1829: return value
return { prependContext: promptPrefix };
// promptPrefix = "Untrusted context..." + XML-wrapped summary
```

### cache_control placement
```
// anthropic-payload-policy-jufdNb_5.js ~L66-85
// cache_control placed on the LAST block of the LAST user message
// This should isolate the stable prefix from the variable suffix,
// but does not on the tested endpoints.
```

### memory_search tool registration
```
// memory-core/index.js ~L270
api.registerTool(..., { names: ["memory_search"] });
// Semantic search tool: different queries → different results
```

## Suggested fixes
1. **Document the limitation**: At minimum, document that enabling `active-memory`
   effectively disables prompt cache for any user message that includes variable
   recalled content. This is silent and undocumented.

2. **Stabilize prependContext output**: Have the active-memory plugin produce
   deterministic output for a given session (e.g. fixed ordering, fixed truncation,
   hash-based fingerprint in place of timestamps) so character-level changes don't
   propagate to the user-message block.

3. **Skip active-memory for cache-critical prompts**: Allow callers to mark
   "cache-critical" prefixes that bypass active-memory injection.

4. **Tighten cache_control boundary**: Make the Anthropic cache boundary respect
   the prependContext vs system-prompt split so that stable system-prompt content
   is cached independently of variable user-message content.

## Workaround
Disable `active-memory` (set `enabled: false`) and use `memory_search` explicitly
when needed. Cache hit rate returns to ~99.9%.

## Impact
Any user with `active-memory` enabled and an Anthropic-compatible provider that
supports prompt caching will see cache hit rate collapse. This is silent (no
warning, no log entry) and undocumented. Common configuration, common failure mode.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Active memory injection breaks prompt cache hit rate (99.9% → 22%) #91223

Phenomenon

Root cause

Reproduction

Code-level trace

prependContext injection site

active-memory hook returns prependContext

cache_control placement

memory_search tool registration

Suggested fixes

Workaround

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Active memory injection breaks prompt cache hit rate (99.9% → 22%) #91223

Description

Phenomenon

Root cause

Reproduction

Code-level trace

prependContext injection site

active-memory hook returns prependContext

cache_control placement

memory_search tool registration

Suggested fixes

Workaround

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions