Skip to content

Add Anthropic native compaction and mixed prompt-cache TTL support #65287

@100yenadmin

Description

@100yenadmin

Summary

Anthropic prompt caching is exact-prefix and cache-invalidating on prompt-prefix rewrites, so local prompt-mutating compaction can burn a freshly written cache even when lossless-claw is no longer stalling the foreground turn. OpenClaw already has a provider-native compaction precedent for OpenAI Responses; Anthropic should get the same first-class support, plus mixed prompt-cache TTLs that keep the expensive stable prefix warm without paying 1-hour write costs for high-churn conversation content.

Problem

  • Anthropic-heavy coding sessions pay large prompt-cache write costs on Opus/Claude when long retention is applied too broadly.
  • Local prompt rewrites after a turn can invalidate a still-hot conversation cache, reducing the value of lossless-claw compaction even after the no-stall/background-maintenance work lands.
  • Right now OpenClaw does not expose Anthropic native compaction or a mixed TTL policy that matches Anthropic's cache model.

Design Goal

  • Use Anthropic native active-context compaction when the provider supports it.
  • Preserve stable system/tool/workspace prefixes with 1h cache retention.
  • Keep high-churn conversation/trailing user prefixes at 5m retention by default.
  • Round-trip Anthropic compaction blocks through the transport so follow-up requests can reuse the provider-native compacted prefix.
  • Keep this separate from lossless-claw, which should stay the lossless searchable sidecar rather than the primary live prompt shaper.

Proposed Changes

  • Add Anthropic request params:
    • anthropicServerCompaction: boolean
    • anthropicCompactThreshold: number
    • anthropicCompactPauseAfter?: boolean
    • anthropicCompactInstructions?: string
  • In the Anthropic transport path:
    • inject context_management.edits with compact_20260112 when enabled
    • add the required Anthropic beta features
    • preserve compaction blocks in assistant history and streamed content
  • In the prompt-cache policy:
    • keep system/tool/workspace prefix at 1h
    • keep trailing conversation / compaction / user prefix at 5m
    • avoid rewriting a still-hot cached conversation prefix unless needed for correctness

Why This Matters

  • It reduces wasted cache-write spend on volatile conversation content.
  • It lets provider-native compaction manage the active Claude prompt while lossless-claw continues to provide searchable durable memory.
  • It preserves cache value for long coding sessions where follow-up turns are frequent.

Acceptance Criteria

  • Anthropic requests can opt into native compaction via model/run params.
  • Required beta headers are sent only when needed.
  • Compaction blocks round-trip through request history and streaming output.
  • Long Anthropic cache retention applies 1h only to the stable prefix and keeps trailing conversational content at 5m.
  • Existing Anthropic behavior remains unchanged unless the new params are enabled.

Related Work

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions