Feature Request: Add Gemini Context Caching Support (cachedContents API)

**Problem:**

The OpenClaw framework supports `cacheRetention` for Anthropic models, but does not implement context caching for Google Gemini models. This results in significant cost overhead for agents with large system instructions.

**Current Situation:**

- Gemini models support context caching via the `cachedContents` API
- OpenClaw's `cacheRetention` config only works for Anthropic
- Each request to Gemini re-sends the entire system instruction (~35KB for some agents)
- This results in ~$0.0027 per request just for system instruction overhead

**Example Cost Impact:**

For an agent like Felipe (WhatsApp triage agent):
- System instruction: ~35,000 tokens
- Without caching: $0.075/1M input per request
- With caching: $0.01875/1M cache read (75% savings)

For 100 conversations/day with 10 turns each:
- Without caching: ~$2.70/day
- With caching: ~$0.68/day
- **Monthly savings: ~$60/month**

**Google's Context Caching API:**

```python
# Create cache
content_cache = client.caches.create(
    model="gemini-2.0-flash-001",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction=system_instruction,
        display_name="agent-cache",
        ttl="86400s",  # 24 hours
    ),
)

# Use cache in requests
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="user question",
    cached_content=content_cache.name
)
```

Documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create

**Proposed Implementation:**

1. Add support for `cacheRetention` config for Gemini models
2. Automatically create and manage `cachedContents` for system instructions
3. Cache system instruction on first request
4. Reuse cache for subsequent requests within TTL
5. Auto-refresh cache before expiration

**Config Example:**

```json
{
  "agents": {
    "defaults": {
      "models": {
        "google/gemini-3.1-flash-lite": {
          "params": {
            "cacheRetention": "long"
          }
        }
      }
    }
  }
}
```

**Benefits:**

- 75% cost reduction for Gemini input tokens
- Faster response times (cached system instruction doesn't need re-processing)
- Better user experience for agents with large system prompts
- Feature parity with Anthropic support

**Priority:** High (cost impact for production agents)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add Gemini Context Caching Support (cachedContents API) #51372

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Add Gemini Context Caching Support (cachedContents API) #51372

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions