Skip to content

[Bug]: Active memory injection breaks prompt cache hit rate (99.9% → 22%) #91223

@Enominera

Description

@Enominera

[Bug]: Active memory injection breaks prompt cache hit rate (99.9% → 22%)

Phenomenon

Enabling the active-memory plugin causes prompt cache hit rate to collapse
from ~99.9% (clean baseline) to ~22% in production dashboard observations.

Reproduction across two Anthropic-compatible providers (independent test runs
by two coding agents) shows:

  • Clean baseline (5 identical prompts, no prependContext): 99.9% warm hit
  • Static 32KB prependContext (5 identical prompts, fixed content): 99.8% warm hit
  • Variable 32KB prependContext (5 prompts, content changes per call): 0.0% warm hit

Root cause

The active-memory plugin injects recalled memories via the before_prompt_build
hook, returning content through hookResult.prependContext. In the prompt-preparation
layer, this context is concatenated into the user message (not the system prompt).

For every eligible conversational reply, the plugin spawns a blocking memory
sub-agent that runs memory_search to recall facts. The recall query is derived
from the current user message, so different user messages → different recalled
fact lists → character-level changes in prependContext.

Anthropic protocol cache_control: {type: "ephemeral"} markers are intended to
isolate the stable system-prompt block from the variable user-message block.
However, the observed behavior across two Anthropic-compatible providers is that
the cache_control boundary does not work as expected: any character-level change
in the user-message block causes the entire prompt to miss cache (0% hit rate),
instead of just the variable part.

Reproduction

  1. Configure any Anthropic-compatible provider
  2. Enable active-memory with config.agents: ["main"]
  3. Send 5 near-identical user messages (e.g. "hi count 3")
  4. With prependContext: 32KB of varied content per call, observe:
    • cache_creation_input_tokens and cache_read_input_tokens both = 0
    • All 5 calls rebuild the entire prompt (warm hit rate = 0%)
  5. Production dashboard: 22% (weighted average across triggered and non-triggered
    conversational turns)

Code-level trace

prependContext injection site

In the prompt preparation layer, hookResult.prependContext is concatenated
directly before the user prompt:

// selection-DrXxngyT.js ~L12796
effectivePrompt = `${hookResult.prependContext}\n\n${effectivePrompt}`;

active-memory hook returns prependContext

// active-memory/index.js ~L1743: before_prompt_build hook
// ~L1829: return value
return { prependContext: promptPrefix };
// promptPrefix = "Untrusted context..." + XML-wrapped summary

cache_control placement

// anthropic-payload-policy-jufdNb_5.js ~L66-85
// cache_control placed on the LAST block of the LAST user message
// This should isolate the stable prefix from the variable suffix,
// but does not on the tested endpoints.

memory_search tool registration

// memory-core/index.js ~L270
api.registerTool(..., { names: ["memory_search"] });
// Semantic search tool: different queries → different results

Suggested fixes

  1. Document the limitation: At minimum, document that enabling active-memory
    effectively disables prompt cache for any user message that includes variable
    recalled content. This is silent and undocumented.

  2. Stabilize prependContext output: Have the active-memory plugin produce
    deterministic output for a given session (e.g. fixed ordering, fixed truncation,
    hash-based fingerprint in place of timestamps) so character-level changes don't
    propagate to the user-message block.

  3. Skip active-memory for cache-critical prompts: Allow callers to mark
    "cache-critical" prefixes that bypass active-memory injection.

  4. Tighten cache_control boundary: Make the Anthropic cache boundary respect
    the prependContext vs system-prompt split so that stable system-prompt content
    is cached independently of variable user-message content.

Workaround

Disable active-memory (set enabled: false) and use memory_search explicitly
when needed. Cache hit rate returns to ~99.9%.

Impact

Any user with active-memory enabled and an Anthropic-compatible provider that
supports prompt caching will see cache hit rate collapse. This is silent (no
warning, no log entry) and undocumented. Common configuration, common failure mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:otherThis issue has meaningful maintainer-visible impact outside the owned taxonomy.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions