-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Description
OpenRouter supports server‑side prompt caching via the cache_control parameter (docs: https://openrouter.ai/docs/guides/best-practices/prompt-caching).
This can significantly reduce token costs by caching the static prefix (system prompt + injected workspace files) across requests.
Currently, OpenClaw already implements this for Anthropic direct via cacheRetention, but OpenRouter - for some providers' - requests don't include cache_control, missing potential savings of ~10‑12k tokens per turn. I recognise some of them cache automatically, but others don't.
Proposed solution:
- Add cache_control parameter to OpenRouter provider configuration
- Compute hash of static prompt prefix (system prompt + injected files)
- Insert cache_control breakpoint after static prefix
- Track and reuse cache IDs across requests with same prefix
Configuration example:
{ agents: { defaults: { models: { "openrouter/openai/chatgpt-4o": { params: { cache_control: { type: "ephemeral", ttl: "1h" } } } } } } }
Benefits:
-
Reduces token burn for identical prefix across turns
-
Works across sessions if prefix unchanged
-
Compatible with all OpenRouter‑supported providers (Anthropic, OpenAI, Gemini, DeepSeek, etc.)
-
Follows existing pattern from Anthropic
cacheRetentionimplementation
Implementation complexity:
Low‑medium (~100‑200 LoC)
2. Code Changes Sketch
Let me create a minimal patch for packages/gateway/src/providers/openrouter.ts (based on inferred structure):
// Hypothetical implementation - needs actual code inspection
import { hash } from 'node:crypto';
interface OpenRouterCacheControl {
type: 'ephemeral';
ttl?: '1h';
}
interface OpenRouterRequest {
messages: Array<{
role: string;
content: Array<{
type: string;
text: string;
cache_control?: OpenRouterCacheControl;
}>;
}>;
cache_control?: OpenRouterCacheControl; // Top-level for some providers
}
class OpenRouterProvider {
private prefixCache = new Map<string, string>(); // hash -> cache_id
async createChatCompletion(request, config) {
const { cache_control } = config.params || {};
if (cache_control) {
// 1. Compute hash of static prefix (system prompt + injected files)
const prefixHash = this.computePrefixHash(request.messages);
// 2. Check for existing cache ID
const cacheId = this.prefixCache.get(prefixHash);
if (cacheId) {
request.cache_id = cacheId;
} else {
// 3. Insert cache_control breakpoint after static prefix
this.insertCacheControl(request.messages, cache_control);
}
}
const response = await this.sendToOpenRouter(request);
// 4. Store new cache ID from response
if (cache_control && response.cache_id && !request.cache_id) {
const prefixHash = this.computePrefixHash(request.messages);
this.prefixCache.set(prefixHash, response.cache_id);
}
return response;
}
private computePrefixHash(messages) {
// Identify static parts: system messages + injected file content
// Hash them for change detection
const staticText = messages
.filter(msg => msg.role === 'system')
.map(msg => msg.content.map(c => c.text).join(''))
.join('');
return hash('sha256').update(staticText).digest('hex');
}
private insertCacheControl(messages, cache_control) {
// Find the last system message or first user message
// Insert cache_control in the appropriate text part
for (const msg of messages) {
if (msg.role === 'system' && msg.content?.length) {
const lastContent = msg.content[msg.content.length - 1];
if (lastContent.type === 'text') {
lastContent.cache_control = cache_control;
break;
}
}
}
}
}
Additional considerations:
- Need to check minimum token requirements per provider
- Handle multiple cache_control breakpoints for Anthropic (max 4)
- Clear cache when workspace files change
- Add metrics to track cache hits/savings