Add Anthropic native compaction and mixed prompt-cache TTL support

## Summary
Anthropic prompt caching is exact-prefix and cache-invalidating on prompt-prefix rewrites, so local prompt-mutating compaction can burn a freshly written cache even when `lossless-claw` is no longer stalling the foreground turn. OpenClaw already has a provider-native compaction precedent for OpenAI Responses; Anthropic should get the same first-class support, plus mixed prompt-cache TTLs that keep the expensive stable prefix warm without paying 1-hour write costs for high-churn conversation content.

## Problem
- Anthropic-heavy coding sessions pay large prompt-cache write costs on Opus/Claude when long retention is applied too broadly.
- Local prompt rewrites after a turn can invalidate a still-hot conversation cache, reducing the value of `lossless-claw` compaction even after the no-stall/background-maintenance work lands.
- Right now OpenClaw does not expose Anthropic native compaction or a mixed TTL policy that matches Anthropic's cache model.

## Design Goal
- Use Anthropic native active-context compaction when the provider supports it.
- Preserve stable system/tool/workspace prefixes with `1h` cache retention.
- Keep high-churn conversation/trailing user prefixes at `5m` retention by default.
- Round-trip Anthropic compaction blocks through the transport so follow-up requests can reuse the provider-native compacted prefix.
- Keep this separate from `lossless-claw`, which should stay the lossless searchable sidecar rather than the primary live prompt shaper.

## Proposed Changes
- Add Anthropic request params:
  - `anthropicServerCompaction: boolean`
  - `anthropicCompactThreshold: number`
  - `anthropicCompactPauseAfter?: boolean`
  - `anthropicCompactInstructions?: string`
- In the Anthropic transport path:
  - inject `context_management.edits` with `compact_20260112` when enabled
  - add the required Anthropic beta features
  - preserve compaction blocks in assistant history and streamed content
- In the prompt-cache policy:
  - keep system/tool/workspace prefix at `1h`
  - keep trailing conversation / compaction / user prefix at `5m`
  - avoid rewriting a still-hot cached conversation prefix unless needed for correctness

## Why This Matters
- It reduces wasted cache-write spend on volatile conversation content.
- It lets provider-native compaction manage the active Claude prompt while `lossless-claw` continues to provide searchable durable memory.
- It preserves cache value for long coding sessions where follow-up turns are frequent.

## Acceptance Criteria
- Anthropic requests can opt into native compaction via model/run params.
- Required beta headers are sent only when needed.
- Compaction blocks round-trip through request history and streaming output.
- Long Anthropic cache retention applies `1h` only to the stable prefix and keeps trailing conversational content at `5m`.
- Existing Anthropic behavior remains unchanged unless the new params are enabled.

## Related Work
- openclaw/openclaw#65233 (background turn maintenance for context engines)
- Martian-Engineering/lossless-claw#407
- Martian-Engineering/lossless-claw#408


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Anthropic native compaction and mixed prompt-cache TTL support #65287

Summary

Problem

Design Goal

Proposed Changes

Why This Matters

Acceptance Criteria

Related Work

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add Anthropic native compaction and mixed prompt-cache TTL support #65287

Description

Summary

Problem

Design Goal

Proposed Changes

Why This Matters

Acceptance Criteria

Related Work

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions