Skip to content

OpenRouter Grok prompt caching likely misses xAI server-affinity header #22705

@mjh-sakh

Description

@mjh-sakh

Summary

When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.

Why this matters

xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the x-grok-conv-id header; for Responses-style flows, prompt_cache_key is used).

Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.

Current Hermes behavior

From the current call path:

  • run_agent.py
  • agent/transports/chat_completions.py
  • plugins/model-providers/openrouter/__init__.py

Hermes already has a stable session_id, but on the OpenRouter chat completions path it does not appear to be used for Grok cache affinity.

More specifically:

  1. agent/transports/chat_completions.py calls profile.build_api_kwargs_extras(...)
  2. that path does not appear to propagate session_id into the provider extras context
  3. plugins/model-providers/openrouter/__init__.py therefore has no way to derive and attach an OpenRouter/xAI-specific affinity header for Grok models

There is related logic in the Responses/Codex path (prompt_cache_key = session_id), but that does not help the OpenRouter chat-completions path used for x-ai/grok-* models.

Suspected result

For OpenRouter Grok usage, Hermes likely sends repeated requests without x-grok-conv-id, so xAI server affinity is lost and prompt caching underperforms.

Suggested fix

1) Pass session_id through the OpenRouter chat-completions provider hook

In agent/transports/chat_completions.py, include session_id in the context passed to:

  • profile.build_api_kwargs_extras(...)

2) Add Grok-specific affinity logic in the OpenRouter provider profile

In plugins/model-providers/openrouter/__init__.py, when:

  • provider is OpenRouter
  • model is x-ai/grok-* (and possibly xai/grok-*)
  • session_id is present

attach:

extra_headers = {"x-grok-conv-id": session_id}

as top-level request kwargs.

3) Preserve provider-added extra_headers when request overrides are present

After digging further, there appears to be a second issue in the same path:

  • even if the OpenRouter profile returns extra_headers, the final request assembly can still lose them if request_overrides.extra_headers is applied with last-write-wins semantics

So this likely needs a small merge fix in agent/transports/chat_completions.py:

  • merge provider-generated extra_headers with user-supplied request_overrides.extra_headers
  • do not clobber the provider header when overrides are present

Without this second fix, adding x-grok-conv-id in the provider profile may still not survive into the final request kwargs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt builderprovider/openrouterOpenRouter aggregatorprovider/xaixAI (Grok)type/perfPerformance improvement or optimization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions