Skip to content

Feature Request: Add Gemini Context Caching Support (cachedContents API) #51372

@rafaelmariano-glitch

Description

@rafaelmariano-glitch

Problem:

The OpenClaw framework supports cacheRetention for Anthropic models, but does not implement context caching for Google Gemini models. This results in significant cost overhead for agents with large system instructions.

Current Situation:

  • Gemini models support context caching via the cachedContents API
  • OpenClaw's cacheRetention config only works for Anthropic
  • Each request to Gemini re-sends the entire system instruction (~35KB for some agents)
  • This results in ~$0.0027 per request just for system instruction overhead

Example Cost Impact:

For an agent like Felipe (WhatsApp triage agent):

  • System instruction: ~35,000 tokens
  • Without caching: $0.075/1M input per request
  • With caching: $0.01875/1M cache read (75% savings)

For 100 conversations/day with 10 turns each:

  • Without caching: ~$2.70/day
  • With caching: ~$0.68/day
  • Monthly savings: ~$60/month

Google's Context Caching API:

# Create cache
content_cache = client.caches.create(
    model="gemini-2.0-flash-001",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction=system_instruction,
        display_name="agent-cache",
        ttl="86400s",  # 24 hours
    ),
)

# Use cache in requests
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="user question",
    cached_content=content_cache.name
)

Documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create

Proposed Implementation:

  1. Add support for cacheRetention config for Gemini models
  2. Automatically create and manage cachedContents for system instructions
  3. Cache system instruction on first request
  4. Reuse cache for subsequent requests within TTL
  5. Auto-refresh cache before expiration

Config Example:

{
  "agents": {
    "defaults": {
      "models": {
        "google/gemini-3.1-flash-lite": {
          "params": {
            "cacheRetention": "long"
          }
        }
      }
    }
  }
}

Benefits:

  • 75% cost reduction for Gemini input tokens
  • Faster response times (cached system instruction doesn't need re-processing)
  • Better user experience for agents with large system prompts
  • Feature parity with Anthropic support

Priority: High (cost impact for production agents)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions