Skip to content

feat(agent): opt-in OpenRouter response caching via env vars#18921

Closed
patp wants to merge 1 commit into
NousResearch:mainfrom
patp:feat/openrouter-response-caching
Closed

feat(agent): opt-in OpenRouter response caching via env vars#18921
patp wants to merge 1 commit into
NousResearch:mainfrom
patp:feat/openrouter-response-caching

Conversation

@patp

@patp patp commented May 2, 2026

Copy link
Copy Markdown

What

Wires OpenRouter's response caching feature through Hermes via two new env vars (off by default):

Variable Effect
HERMES_OPENROUTER_CACHE Truthy (1/true/yes/on, case-insensitive) adds X-OpenRouter-Cache: true to every OpenRouter request. Identical request bodies are served from edge cache at zero token cost.
HERMES_OPENROUTER_CACHE_TTL Integer seconds (1..86400). Emitted as X-OpenRouter-Cache-TTL; ignored when caching is disabled. Out-of-range / non-integer values are dropped silently — OpenRouter's 5-minute default then applies.

Why

OpenRouter response caching is a pure Pareto improvement for hermes deployments that send any kind of repeated traffic — bot title-generation on similar inputs, deterministic skills, system-prompt-only requests, retried calls. Cache hits return in 80–300ms with zero token charges; misses behave exactly as today. Today there's no in-tree way to opt into it, so users either fork hermes or run a header-injecting proxy.

Designed as opt-in (not default-on) on the principle of least surprise, and as env vars rather than a config.yaml schema change to match the existing pattern (OPENROUTER_API_KEY, OPENROUTER_BASE_URL, HERMES_QWEN_BASE_URL, etc.). Settable per-profile via ~/.hermes/.env. Happy to follow up with a config.yaml field if the maintainers prefer that surface.

Implementation

Two new helpers in agent/auxiliary_client.py:

  • openrouter_feature_headers() — env-driven feature headers (cache + TTL today, easy extension point for future OpenRouter features)
  • openrouter_default_headers()_OR_HEADERSopenrouter_feature_headers(); the single API every OpenRouter client construction site should use

The four OpenRouter-targeting client construction sites now route through openrouter_default_headers():

  • agent/auxiliary_client.py::_try_openrouter (sync, both pool + direct branches)
  • agent/auxiliary_client.py::_to_async_client (async conversion)
  • run_agent.py::AIAgent.__init__ (primary client, the inline duplicate at L1424)
  • run_agent.py::AIAgent._apply_client_headers_for_base_url (credential rotation path)

Drive-by cleanup: the inline OpenRouter header dict that was hard-copied at run_agent.py:1424 now uses the same helper as everywhere else, so future header additions only need to touch one place.

How to test

echo 'HERMES_OPENROUTER_CACHE=true' >> ~/.hermes/.env
hermes chat -q "ping"   # First call:  x-openrouter-cache-status: MISS (paid)
hermes chat -q "ping"   # Second call: x-openrouter-cache-status: HIT  (free)

Or with explicit TTL:

echo 'HERMES_OPENROUTER_CACHE=true' >> ~/.hermes/.env
echo 'HERMES_OPENROUTER_CACHE_TTL=3600' >> ~/.hermes/.env

Tests

  • tests/agent/test_openrouter_response_caching.py — 24 new cases covering truthy/falsy parsing, TTL boundary values (1 / 300 / 86400), invalid TTLs (0, 86401, abc, -1, 12.5), and the cache-off-but-TTL-set edge case. Verified locally:
    ============================== 24 passed in 0.06s ==============================
    
  • tests/run_agent/test_provider_attribution_headers.py — extended with two new tests verifying the env var threads through _apply_client_headers_for_base_url and that attribution headers are preserved alongside. Existing tests still pass:
    ======================== 6 passed, 2 warnings in 1.88s =========================
    

Platforms tested

  • Linux (Ubuntu 24.04, agents-vm + oscar) — end-to-end MISS→HIT verified via curl
  • Linux aarch64 (DGX Spark)

Docs

website/docs/reference/environment-variables.md — both vars listed under "LLM Providers" right after OPENROUTER_BASE_URL.

Notes

  • Default off. No behavior change for users who don't set the env var.
  • Cache key includes the API key, so cache entries don't leak between users.
  • Refactor opportunity (not done in this PR to keep scope tight): a generic extra_headers config field would let users add arbitrary headers to any provider — happy to do that as a follow-up if there's interest.

Wire up OpenRouter's response caching feature
(https://openrouter.ai/announcements/response-caching) through a
single helper so it applies uniformly to the main agent loop and
every auxiliary task that calls OpenRouter.

Two new env vars (off by default):

- HERMES_OPENROUTER_CACHE — truthy ("1"/"true"/"yes"/"on") adds
  X-OpenRouter-Cache: true to OpenRouter requests. Identical request
  bodies are then served from edge cache at zero token cost.
- HERMES_OPENROUTER_CACHE_TTL — integer seconds (1..86400), emitted
  as X-OpenRouter-Cache-TTL when caching is enabled. Out-of-range or
  non-integer values are dropped silently and OpenRouter's 5-minute
  default applies.

Implementation introduces openrouter_feature_headers() and
openrouter_default_headers() helpers in agent/auxiliary_client.py
and updates the four OpenRouter-targeting client construction sites
(two in auxiliary_client.py, two in run_agent.py) to use the helper.
Drive-by: collapse the duplicated inline OpenRouter header dict at
run_agent.py:1424 onto the same helper so future header additions
need only one edit.

Tests:
- tests/agent/test_openrouter_response_caching.py — 24 cases covering
  truthy/falsy parsing, TTL boundaries (1 / 300 / 86400), invalid TTL
  handling (0, 86401, "abc", "-1", "12.5"), and the
  cache-off-but-TTL-set edge case.
- tests/run_agent/test_provider_attribution_headers.py — extended to
  verify the env vars thread through _apply_client_headers_for_base_url
  on AIAgent and that attribution headers are preserved alongside.

Docs: HERMES_OPENROUTER_CACHE and HERMES_OPENROUTER_CACHE_TTL listed in
website/docs/reference/environment-variables.md under "LLM Providers".

How to test:
  echo 'HERMES_OPENROUTER_CACHE=true' >> ~/.hermes/.env
  hermes chat -q "ping"   # first call: x-openrouter-cache-status: MISS
  hermes chat -q "ping"   # second:    x-openrouter-cache-status: HIT
@alt-glitch alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder provider/openrouter OpenRouter aggregator P3 Low — cosmetic, nice to have labels May 2, 2026
@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Merged via PR #19132. Your env-var-driven approach (truthy parsing, HERMES_OPENROUTER_CACHE / HERMES_OPENROUTER_CACHE_TTL) was incorporated into the final implementation as env var overrides on top of config.yaml, with your parametrized test patterns for boundary values adopted as well. Thanks for the thorough work @patp!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/openrouter OpenRouter aggregator type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants