Summary
Anthropic prompt caching is exact-prefix and cache-invalidating on prompt-prefix rewrites, so local prompt-mutating compaction can burn a freshly written cache even when lossless-claw is no longer stalling the foreground turn. OpenClaw already has a provider-native compaction precedent for OpenAI Responses; Anthropic should get the same first-class support, plus mixed prompt-cache TTLs that keep the expensive stable prefix warm without paying 1-hour write costs for high-churn conversation content.
Problem
- Anthropic-heavy coding sessions pay large prompt-cache write costs on Opus/Claude when long retention is applied too broadly.
- Local prompt rewrites after a turn can invalidate a still-hot conversation cache, reducing the value of
lossless-claw compaction even after the no-stall/background-maintenance work lands.
- Right now OpenClaw does not expose Anthropic native compaction or a mixed TTL policy that matches Anthropic's cache model.
Design Goal
- Use Anthropic native active-context compaction when the provider supports it.
- Preserve stable system/tool/workspace prefixes with
1h cache retention.
- Keep high-churn conversation/trailing user prefixes at
5m retention by default.
- Round-trip Anthropic compaction blocks through the transport so follow-up requests can reuse the provider-native compacted prefix.
- Keep this separate from
lossless-claw, which should stay the lossless searchable sidecar rather than the primary live prompt shaper.
Proposed Changes
- Add Anthropic request params:
anthropicServerCompaction: boolean
anthropicCompactThreshold: number
anthropicCompactPauseAfter?: boolean
anthropicCompactInstructions?: string
- In the Anthropic transport path:
- inject
context_management.edits with compact_20260112 when enabled
- add the required Anthropic beta features
- preserve compaction blocks in assistant history and streamed content
- In the prompt-cache policy:
- keep system/tool/workspace prefix at
1h
- keep trailing conversation / compaction / user prefix at
5m
- avoid rewriting a still-hot cached conversation prefix unless needed for correctness
Why This Matters
- It reduces wasted cache-write spend on volatile conversation content.
- It lets provider-native compaction manage the active Claude prompt while
lossless-claw continues to provide searchable durable memory.
- It preserves cache value for long coding sessions where follow-up turns are frequent.
Acceptance Criteria
- Anthropic requests can opt into native compaction via model/run params.
- Required beta headers are sent only when needed.
- Compaction blocks round-trip through request history and streaming output.
- Long Anthropic cache retention applies
1h only to the stable prefix and keeps trailing conversational content at 5m.
- Existing Anthropic behavior remains unchanged unless the new params are enabled.
Related Work
Summary
Anthropic prompt caching is exact-prefix and cache-invalidating on prompt-prefix rewrites, so local prompt-mutating compaction can burn a freshly written cache even when
lossless-clawis no longer stalling the foreground turn. OpenClaw already has a provider-native compaction precedent for OpenAI Responses; Anthropic should get the same first-class support, plus mixed prompt-cache TTLs that keep the expensive stable prefix warm without paying 1-hour write costs for high-churn conversation content.Problem
lossless-clawcompaction even after the no-stall/background-maintenance work lands.Design Goal
1hcache retention.5mretention by default.lossless-claw, which should stay the lossless searchable sidecar rather than the primary live prompt shaper.Proposed Changes
anthropicServerCompaction: booleananthropicCompactThreshold: numberanthropicCompactPauseAfter?: booleananthropicCompactInstructions?: stringcontext_management.editswithcompact_20260112when enabled1h5mWhy This Matters
lossless-clawcontinues to provide searchable durable memory.Acceptance Criteria
1honly to the stable prefix and keeps trailing conversational content at5m.Related Work