Skip to content

[Feature]: Multi-Block Prompt Caching for Anthropic Models #50511

@SWELLWakesurf

Description

@SWELLWakesurf

Summary

Add support for multiple cacheable prompt blocks in Anthropic API calls to reduce cache invalidation when mixing stable system context (SOUL.md, AGENTS.md) with dynamic session context (daily notes, conversation history).

Problem to solve

Current behavior:
OpenClaw loads all workspace context files (SOUL.md, AGENTS.md, TOOLS.md, USER.md, MEMORY.md, HEARTBEAT.md, daily notes) into a single cacheable prompt block.

The issue:

  • Daily notes change every day (new file path: memory/2026-03-19.md)
  • MEMORY.md changes occasionally when updated
  • Even small changes invalidate the entire cache block
  • This causes excessive cache WRITES (expensive, 25% more than regular input tokens)

Real-world impact:
In our production deployment, we're seeing:

  • 71% of costs are cache writes (~$675/month of $950 total)
  • 21% are cache reads (~$200/month)
  • Only 8% are output tokens (~$75/month)

This indicates poor cache efficiency — we're paying to rebuild the cache on nearly every request instead of benefiting from cache hits.

Root Cause

Anthropic's prompt caching works by marking entire blocks as cacheable. When ANY content in that block changes, the entire cache is invalidated and must be rewritten.

OpenClaw currently treats all workspace context as a single cache block — when daily notes rotate, the entire cache (including stable SOUL.md and AGENTS.md) is invalidated.

Proposed solution

Proposed Solution

Support multiple sequential cache blocks in the prompt assembly, leveraging Anthropic's multi-block caching capability:

  • Block 1 (Stable): SOUL.md, AGENTS.md, TOOLS.md, USER.md → cached for days
  • Block 2 (Semi-stable): MEMORY.md, HEARTBEAT.md → cached when unchanged
  • Block 3 (Dynamic): Daily notes, conversation history → not cached

Implementation Approach

Option A: Configuration-Based (Preferred)

{
  agents: {
    defaults: {
      workspace: {
        contextBlocks: [
          { name: "stable", cache: true, files: ["SOUL.md", "AGENTS.md", "TOOLS.md", "USER.md"] },
          { name: "semiStable", cache: true, files: ["MEMORY.md", "HEARTBEAT.md"] },
          { name: "dynamic", cache: false, files: ["memory/YYYY-MM-DD.md"] }
        ]
      }
    }
  }
}

### Alternatives considered

_No response_

### Impact

In our production deployment, we're seeing:
- **71% of costs are cache writes** (~$675/month of $950 total)
- **21% are cache reads** (~$200/month)
- Only **8% are output tokens** (~$75/month)
- This indicates poor cache efficiency  we're paying to rebuild the cache on nearly every request instead of benefiting from cache hits.

### Evidence/examples

_No response_

### Additional information

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions