Summary
When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.
Why this matters
xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the x-grok-conv-id header; for Responses-style flows, prompt_cache_key is used).
Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.
Current Hermes behavior
From the current call path:
run_agent.py
agent/transports/chat_completions.py
plugins/model-providers/openrouter/__init__.py
Hermes already has a stable session_id, but on the OpenRouter chat completions path it does not appear to be used for Grok cache affinity.
More specifically:
agent/transports/chat_completions.py calls profile.build_api_kwargs_extras(...)
- that path does not appear to propagate
session_id into the provider extras context
plugins/model-providers/openrouter/__init__.py therefore has no way to derive and attach an OpenRouter/xAI-specific affinity header for Grok models
There is related logic in the Responses/Codex path (prompt_cache_key = session_id), but that does not help the OpenRouter chat-completions path used for x-ai/grok-* models.
Suspected result
For OpenRouter Grok usage, Hermes likely sends repeated requests without x-grok-conv-id, so xAI server affinity is lost and prompt caching underperforms.
Suggested fix
1) Pass session_id through the OpenRouter chat-completions provider hook
In agent/transports/chat_completions.py, include session_id in the context passed to:
profile.build_api_kwargs_extras(...)
2) Add Grok-specific affinity logic in the OpenRouter provider profile
In plugins/model-providers/openrouter/__init__.py, when:
- provider is OpenRouter
- model is
x-ai/grok-* (and possibly xai/grok-*)
session_id is present
attach:
extra_headers = {"x-grok-conv-id": session_id}
as top-level request kwargs.
3) Preserve provider-added extra_headers when request overrides are present
After digging further, there appears to be a second issue in the same path:
- even if the OpenRouter profile returns
extra_headers, the final request assembly can still lose them if request_overrides.extra_headers is applied with last-write-wins semantics
So this likely needs a small merge fix in agent/transports/chat_completions.py:
- merge provider-generated
extra_headers with user-supplied request_overrides.extra_headers
- do not clobber the provider header when overrides are present
Without this second fix, adding x-grok-conv-id in the provider profile may still not survive into the final request kwargs.
Summary
When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.
Why this matters
xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the
x-grok-conv-idheader; for Responses-style flows,prompt_cache_keyis used).Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.
Current Hermes behavior
From the current call path:
run_agent.pyagent/transports/chat_completions.pyplugins/model-providers/openrouter/__init__.pyHermes already has a stable
session_id, but on the OpenRouter chat completions path it does not appear to be used for Grok cache affinity.More specifically:
agent/transports/chat_completions.pycallsprofile.build_api_kwargs_extras(...)session_idinto the provider extras contextplugins/model-providers/openrouter/__init__.pytherefore has no way to derive and attach an OpenRouter/xAI-specific affinity header for Grok modelsThere is related logic in the Responses/Codex path (
prompt_cache_key = session_id), but that does not help the OpenRouter chat-completions path used forx-ai/grok-*models.Suspected result
For OpenRouter Grok usage, Hermes likely sends repeated requests without
x-grok-conv-id, so xAI server affinity is lost and prompt caching underperforms.Suggested fix
1) Pass
session_idthrough the OpenRouter chat-completions provider hookIn
agent/transports/chat_completions.py, includesession_idin the context passed to:profile.build_api_kwargs_extras(...)2) Add Grok-specific affinity logic in the OpenRouter provider profile
In
plugins/model-providers/openrouter/__init__.py, when:x-ai/grok-*(and possiblyxai/grok-*)session_idis presentattach:
as top-level request kwargs.
3) Preserve provider-added
extra_headerswhen request overrides are presentAfter digging further, there appears to be a second issue in the same path:
extra_headers, the final request assembly can still lose them ifrequest_overrides.extra_headersis applied with last-write-wins semanticsSo this likely needs a small merge fix in
agent/transports/chat_completions.py:extra_headerswith user-suppliedrequest_overrides.extra_headersWithout this second fix, adding
x-grok-conv-idin the provider profile may still not survive into the final request kwargs.