System prompt cache busting: reorder dynamic context layers + move memory recall out of system role

## Summary

The current system prompt assembly order causes prompt cache misses on every LLM turn, even for content that is static for the entire session lifetime. This may be contributing to observed latency and token cost issues.

## Root cause

Two problems compound:

### 1. Memory recall is a System-role message that changes every turn

`SessionRecallManager.InjectIntoMessages` (`src/Netclaw.Actors/Sessions/Pipelines/SessionRecallManager.cs:117-145`) injects a `[memory-recall]` block as `ChatRole.System`, inserted after the last existing system message. Because recalled items depend on the current user query, this content changes every turn.

Prompt caching works on prefix matching — identical prefix tokens hit cache. A volatile System message early in the prompt busts the cache for everything after it.

### 2. Static-per-session content is positioned AFTER volatile content

`InjectDynamicContextLayers` (`LlmSessionActor.cs:2166`) runs after memory recall injection. This means static content (skill index, subagent catalog, session ID block, attachment hint) sits after the volatile memory recall fence and never gets cached, even though it's identical across all turns in the session.

Current effective order:
```
System[0]: Persisted prompt (SOUL/AGENTS/TOOLING)     ← CACHED ✓
System[1]: [memory-recall] (volatile per turn)         ← CACHE FENCE — everything after is MISS
           OnceAtStart layers (skill/subagent index)   ← static but MISS ✗
           Current time                                ← volatile, MISS ✗
           [session] block                             ← static but MISS ✗
           [working-context]                           ← volatile, MISS ✗
           [attachments] hint                          ← static but MISS ✗
```

## Proposed fix

Two changes, independent but complementary:

### A. Move memory recall from System-role to User-role

Memory recall is per-turn context ("here's relevant background for this user message"), not behavioral instructions. Injecting it as a User-role message right before the current user message keeps the entire system prompt prefix stable and cacheable while still making the recalled items visible to the model.

### B. Reorder dynamic context layers: stable first, volatile last

```
System[0]: Persisted prompt                            ← CACHED ✓
System[1]: OnceAtStart layers (skill/subagent index)   ← now CACHED ✓
           [session] block                             ← now CACHED ✓
           [attachments] hint                          ← now CACHED ✓
           Current time                                ← volatile, cache fence here
           [working-context]                           ← volatile
```

This moves ~500-1000 tokens of static content into the cached prefix window.

## Measurement

Before implementing, we should establish a baseline via the eval suite:
- Tokens per second (input processing rate)
- Time-to-first-token (TTFT) per eval case
- Cached vs uncached input token counts (if the provider reports them)

The dockerized eval infrastructure from PR #603 makes isolated measurement feasible. Compare baseline → fix A alone → fix A+B to quantify the impact.

## Source locations

- `src/Netclaw.Actors/Sessions/Pipelines/SessionRecallManager.cs:117-145` — InjectIntoMessages
- `src/Netclaw.Actors/Sessions/LlmSessionActor.cs:2162-2166` — injection order (recall before dynamic layers)
- `src/Netclaw.Actors/Sessions/LlmSessionActor.cs` — InjectDynamicContextLayers method
- No `cache_control` directives anywhere in the codebase (Anthropic ephemeral markers not used)

## Acceptance criteria

- Memory recall injected as User-role (or at minimum, after all static system content)
- Static-per-session dynamic layers ordered before volatile-per-turn layers
- Eval suite baseline + post-fix comparison showing measurable improvement (or confirming no regression if the provider's cache behavior differs from expectation)
- Existing memory recall tests updated for new injection point
- No behavioral change to the model's use of recalled memories (eval suite validates)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System prompt cache busting: reorder dynamic context layers + move memory recall out of system role #608

Summary

Root cause

1. Memory recall is a System-role message that changes every turn

2. Static-per-session content is positioned AFTER volatile content

Proposed fix

A. Move memory recall from System-role to User-role

B. Reorder dynamic context layers: stable first, volatile last

Measurement

Source locations

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

System prompt cache busting: reorder dynamic context layers + move memory recall out of system role #608

Description

Summary

Root cause

1. Memory recall is a System-role message that changes every turn

2. Static-per-session content is positioned AFTER volatile content

Proposed fix

A. Move memory recall from System-role to User-role

B. Reorder dynamic context layers: stable first, volatile last

Measurement

Source locations

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions