Problem:
The OpenClaw framework supports cacheRetention for Anthropic models, but does not implement context caching for Google Gemini models. This results in significant cost overhead for agents with large system instructions.
Current Situation:
- Gemini models support context caching via the
cachedContents API
- OpenClaw's
cacheRetention config only works for Anthropic
- Each request to Gemini re-sends the entire system instruction (~35KB for some agents)
- This results in ~$0.0027 per request just for system instruction overhead
Example Cost Impact:
For an agent like Felipe (WhatsApp triage agent):
- System instruction: ~35,000 tokens
- Without caching: $0.075/1M input per request
- With caching: $0.01875/1M cache read (75% savings)
For 100 conversations/day with 10 turns each:
- Without caching: ~$2.70/day
- With caching: ~$0.68/day
- Monthly savings: ~$60/month
Google's Context Caching API:
# Create cache
content_cache = client.caches.create(
model="gemini-2.0-flash-001",
config=CreateCachedContentConfig(
contents=contents,
system_instruction=system_instruction,
display_name="agent-cache",
ttl="86400s", # 24 hours
),
)
# Use cache in requests
response = client.models.generate_content(
model="gemini-2.0-flash-001",
contents="user question",
cached_content=content_cache.name
)
Documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create
Proposed Implementation:
- Add support for
cacheRetention config for Gemini models
- Automatically create and manage
cachedContents for system instructions
- Cache system instruction on first request
- Reuse cache for subsequent requests within TTL
- Auto-refresh cache before expiration
Config Example:
{
"agents": {
"defaults": {
"models": {
"google/gemini-3.1-flash-lite": {
"params": {
"cacheRetention": "long"
}
}
}
}
}
}
Benefits:
- 75% cost reduction for Gemini input tokens
- Faster response times (cached system instruction doesn't need re-processing)
- Better user experience for agents with large system prompts
- Feature parity with Anthropic support
Priority: High (cost impact for production agents)
Problem:
The OpenClaw framework supports
cacheRetentionfor Anthropic models, but does not implement context caching for Google Gemini models. This results in significant cost overhead for agents with large system instructions.Current Situation:
cachedContentsAPIcacheRetentionconfig only works for AnthropicExample Cost Impact:
For an agent like Felipe (WhatsApp triage agent):
For 100 conversations/day with 10 turns each:
Google's Context Caching API:
Documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create
Proposed Implementation:
cacheRetentionconfig for Gemini modelscachedContentsfor system instructionsConfig Example:
{ "agents": { "defaults": { "models": { "google/gemini-3.1-flash-lite": { "params": { "cacheRetention": "long" } } } } } }Benefits:
Priority: High (cost impact for production agents)