Skip to content

System prompt cache busting: reorder dynamic context layers + move memory recall out of system role #608

@Aaronontheweb

Description

@Aaronontheweb

Summary

The current system prompt assembly order causes prompt cache misses on every LLM turn, even for content that is static for the entire session lifetime. This may be contributing to observed latency and token cost issues.

Root cause

Two problems compound:

1. Memory recall is a System-role message that changes every turn

SessionRecallManager.InjectIntoMessages (src/Netclaw.Actors/Sessions/Pipelines/SessionRecallManager.cs:117-145) injects a [memory-recall] block as ChatRole.System, inserted after the last existing system message. Because recalled items depend on the current user query, this content changes every turn.

Prompt caching works on prefix matching — identical prefix tokens hit cache. A volatile System message early in the prompt busts the cache for everything after it.

2. Static-per-session content is positioned AFTER volatile content

InjectDynamicContextLayers (LlmSessionActor.cs:2166) runs after memory recall injection. This means static content (skill index, subagent catalog, session ID block, attachment hint) sits after the volatile memory recall fence and never gets cached, even though it's identical across all turns in the session.

Current effective order:

System[0]: Persisted prompt (SOUL/AGENTS/TOOLING)     ← CACHED ✓
System[1]: [memory-recall] (volatile per turn)         ← CACHE FENCE — everything after is MISS
           OnceAtStart layers (skill/subagent index)   ← static but MISS ✗
           Current time                                ← volatile, MISS ✗
           [session] block                             ← static but MISS ✗
           [working-context]                           ← volatile, MISS ✗
           [attachments] hint                          ← static but MISS ✗

Proposed fix

Two changes, independent but complementary:

A. Move memory recall from System-role to User-role

Memory recall is per-turn context ("here's relevant background for this user message"), not behavioral instructions. Injecting it as a User-role message right before the current user message keeps the entire system prompt prefix stable and cacheable while still making the recalled items visible to the model.

B. Reorder dynamic context layers: stable first, volatile last

System[0]: Persisted prompt                            ← CACHED ✓
System[1]: OnceAtStart layers (skill/subagent index)   ← now CACHED ✓
           [session] block                             ← now CACHED ✓
           [attachments] hint                          ← now CACHED ✓
           Current time                                ← volatile, cache fence here
           [working-context]                           ← volatile

This moves ~500-1000 tokens of static content into the cached prefix window.

Measurement

Before implementing, we should establish a baseline via the eval suite:

  • Tokens per second (input processing rate)
  • Time-to-first-token (TTFT) per eval case
  • Cached vs uncached input token counts (if the provider reports them)

The dockerized eval infrastructure from PR #603 makes isolated measurement feasible. Compare baseline → fix A alone → fix A+B to quantify the impact.

Source locations

  • src/Netclaw.Actors/Sessions/Pipelines/SessionRecallManager.cs:117-145 — InjectIntoMessages
  • src/Netclaw.Actors/Sessions/LlmSessionActor.cs:2162-2166 — injection order (recall before dynamic layers)
  • src/Netclaw.Actors/Sessions/LlmSessionActor.cs — InjectDynamicContextLayers method
  • No cache_control directives anywhere in the codebase (Anthropic ephemeral markers not used)

Acceptance criteria

  • Memory recall injected as User-role (or at minimum, after all static system content)
  • Static-per-session dynamic layers ordered before volatile-per-turn layers
  • Eval suite baseline + post-fix comparison showing measurable improvement (or confirming no regression if the provider's cache behavior differs from expectation)
  • Existing memory recall tests updated for new injection point
  • No behavioral change to the model's use of recalled memories (eval suite validates)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions