Skip to content

Add Openrouter cache_control support for provider-side prompt caching. #9600

@edutj-t

Description

@edutj-t

OpenRouter supports server‑side prompt caching via the cache_control parameter (docs: https://openrouter.ai/docs/guides/best-practices/prompt-caching).

This can significantly reduce token costs by caching the static prefix (system prompt + injected workspace files) across requests.

Currently, OpenClaw already implements this for Anthropic direct via cacheRetention, but OpenRouter - for some providers' - requests don't include cache_control, missing potential savings of ~10‑12k tokens per turn. I recognise some of them cache automatically, but others don't.

Proposed solution:

  1. Add cache_control parameter to OpenRouter provider configuration
  2. Compute hash of static prompt prefix (system prompt + injected files)
  3. Insert cache_control breakpoint after static prefix
  4. Track and reuse cache IDs across requests with same prefix

Configuration example:
{ agents: { defaults: { models: { "openrouter/openai/chatgpt-4o": { params: { cache_control: { type: "ephemeral", ttl: "1h" } } } } } } }

Benefits:

  • Reduces token burn for identical prefix across turns

  • Works across sessions if prefix unchanged

  • Compatible with all OpenRouter‑supported providers (Anthropic, OpenAI, Gemini, DeepSeek, etc.)

  • Follows existing pattern from Anthropic cacheRetention implementation

Implementation complexity:
Low‑medium (~100‑200 LoC)
2. Code Changes Sketch
Let me create a minimal patch for packages/gateway/src/providers/openrouter.ts (based on inferred structure):

// Hypothetical implementation - needs actual code inspection
import { hash } from 'node:crypto';

interface OpenRouterCacheControl {
  type: 'ephemeral';
  ttl?: '1h';
}

interface OpenRouterRequest {
  messages: Array<{
    role: string;
    content: Array<{
      type: string;
      text: string;
      cache_control?: OpenRouterCacheControl;
    }>;
  }>;
  cache_control?: OpenRouterCacheControl; // Top-level for some providers
}

class OpenRouterProvider {
  private prefixCache = new Map<string, string>(); // hash -> cache_id
  
  async createChatCompletion(request, config) {
    const { cache_control } = config.params || {};
    
    if (cache_control) {
      // 1. Compute hash of static prefix (system prompt + injected files)
      const prefixHash = this.computePrefixHash(request.messages);
      
      // 2. Check for existing cache ID
      const cacheId = this.prefixCache.get(prefixHash);
      if (cacheId) {
        request.cache_id = cacheId;
      } else {
        // 3. Insert cache_control breakpoint after static prefix
        this.insertCacheControl(request.messages, cache_control);
      }
    }
    
    const response = await this.sendToOpenRouter(request);
    
    // 4. Store new cache ID from response
    if (cache_control && response.cache_id && !request.cache_id) {
      const prefixHash = this.computePrefixHash(request.messages);
      this.prefixCache.set(prefixHash, response.cache_id);
    }
    
    return response;
  }
  
  private computePrefixHash(messages) {
    // Identify static parts: system messages + injected file content
    // Hash them for change detection
    const staticText = messages
      .filter(msg => msg.role === 'system')
      .map(msg => msg.content.map(c => c.text).join(''))
      .join('');
    return hash('sha256').update(staticText).digest('hex');
  }
  
  private insertCacheControl(messages, cache_control) {
    // Find the last system message or first user message
    // Insert cache_control in the appropriate text part
    for (const msg of messages) {
      if (msg.role === 'system' && msg.content?.length) {
        const lastContent = msg.content[msg.content.length - 1];
        if (lastContent.type === 'text') {
          lastContent.cache_control = cache_control;
          break;
        }
      }
    }
  }
}

Additional considerations:

  • Need to check minimum token requirements per provider
  • Handle multiple cache_control breakpoints for Anthropic (max 4)
  • Clear cache when workspace files change
  • Add metrics to track cache hits/savings

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions