OpenRouter Grok prompt caching likely misses xAI server-affinity header

## Summary

When using Grok models through OpenRouter, Hermes appears to miss xAI's cache-affinity requirement, which likely causes poor prompt-cache hit rates and higher token costs.

## Why this matters

xAI prompt caching is unusually sensitive to server affinity. For repeated requests to hit the same cache, xAI expects a stable conversation identifier to be sent (for chat-completions-style calls, this is typically the `x-grok-conv-id` header; for Responses-style flows, `prompt_cache_key` is used).

Without that affinity signal, requests can be routed to different backend servers, causing frequent cache misses even when the prompt prefix is stable.

## Current Hermes behavior

From the current call path:

- `run_agent.py`
- `agent/transports/chat_completions.py`
- `plugins/model-providers/openrouter/__init__.py`

Hermes already has a stable `session_id`, but on the OpenRouter **chat completions** path it does not appear to be used for Grok cache affinity.

More specifically:

1. `agent/transports/chat_completions.py` calls `profile.build_api_kwargs_extras(...)`
2. that path does not appear to propagate `session_id` into the provider extras context
3. `plugins/model-providers/openrouter/__init__.py` therefore has no way to derive and attach an OpenRouter/xAI-specific affinity header for Grok models

There is related logic in the Responses/Codex path (`prompt_cache_key = session_id`), but that does not help the OpenRouter chat-completions path used for `x-ai/grok-*` models.

## Suspected result

For OpenRouter Grok usage, Hermes likely sends repeated requests without `x-grok-conv-id`, so xAI server affinity is lost and prompt caching underperforms.

## Suggested fix

### 1) Pass `session_id` through the OpenRouter chat-completions provider hook

In `agent/transports/chat_completions.py`, include `session_id` in the context passed to:

- `profile.build_api_kwargs_extras(...)`

### 2) Add Grok-specific affinity logic in the OpenRouter provider profile

In `plugins/model-providers/openrouter/__init__.py`, when:

- provider is OpenRouter
- model is `x-ai/grok-*` (and possibly `xai/grok-*`)
- `session_id` is present

attach:

```python
extra_headers = {"x-grok-conv-id": session_id}
```

as top-level request kwargs.

### 3) Preserve provider-added `extra_headers` when request overrides are present

After digging further, there appears to be a second issue in the same path:

- even if the OpenRouter profile returns `extra_headers`, the final request assembly can still lose them if `request_overrides.extra_headers` is applied with last-write-wins semantics

So this likely needs a small merge fix in `agent/transports/chat_completions.py`:

- merge provider-generated `extra_headers` with user-supplied `request_overrides.extra_headers`
- do not clobber the provider header when overrides are present

Without this second fix, adding `x-grok-conv-id` in the provider profile may still not survive into the final request kwargs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenRouter Grok prompt caching likely misses xAI server-affinity header #22705

Summary

Why this matters

Current Hermes behavior

Suspected result

Suggested fix

1) Pass `session_id` through the OpenRouter chat-completions provider hook

2) Add Grok-specific affinity logic in the OpenRouter provider profile

3) Preserve provider-added `extra_headers` when request overrides are present

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OpenRouter Grok prompt caching likely misses xAI server-affinity header #22705

Description

Summary

Why this matters

Current Hermes behavior

Suspected result

Suggested fix

1) Pass session_id through the OpenRouter chat-completions provider hook

2) Add Grok-specific affinity logic in the OpenRouter provider profile

3) Preserve provider-added extra_headers when request overrides are present

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1) Pass `session_id` through the OpenRouter chat-completions provider hook

3) Preserve provider-added `extra_headers` when request overrides are present