feat(agent): opt-in OpenRouter response caching via env vars#18921
Closed
patp wants to merge 1 commit into
Closed
Conversation
Wire up OpenRouter's response caching feature (https://openrouter.ai/announcements/response-caching) through a single helper so it applies uniformly to the main agent loop and every auxiliary task that calls OpenRouter. Two new env vars (off by default): - HERMES_OPENROUTER_CACHE — truthy ("1"/"true"/"yes"/"on") adds X-OpenRouter-Cache: true to OpenRouter requests. Identical request bodies are then served from edge cache at zero token cost. - HERMES_OPENROUTER_CACHE_TTL — integer seconds (1..86400), emitted as X-OpenRouter-Cache-TTL when caching is enabled. Out-of-range or non-integer values are dropped silently and OpenRouter's 5-minute default applies. Implementation introduces openrouter_feature_headers() and openrouter_default_headers() helpers in agent/auxiliary_client.py and updates the four OpenRouter-targeting client construction sites (two in auxiliary_client.py, two in run_agent.py) to use the helper. Drive-by: collapse the duplicated inline OpenRouter header dict at run_agent.py:1424 onto the same helper so future header additions need only one edit. Tests: - tests/agent/test_openrouter_response_caching.py — 24 cases covering truthy/falsy parsing, TTL boundaries (1 / 300 / 86400), invalid TTL handling (0, 86401, "abc", "-1", "12.5"), and the cache-off-but-TTL-set edge case. - tests/run_agent/test_provider_attribution_headers.py — extended to verify the env vars thread through _apply_client_headers_for_base_url on AIAgent and that attribution headers are preserved alongside. Docs: HERMES_OPENROUTER_CACHE and HERMES_OPENROUTER_CACHE_TTL listed in website/docs/reference/environment-variables.md under "LLM Providers". How to test: echo 'HERMES_OPENROUTER_CACHE=true' >> ~/.hermes/.env hermes chat -q "ping" # first call: x-openrouter-cache-status: MISS hermes chat -q "ping" # second: x-openrouter-cache-status: HIT
This was referenced May 3, 2026
Collaborator
|
Merged via PR #19132. Your env-var-driven approach (truthy parsing, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wires OpenRouter's response caching feature through Hermes via two new env vars (off by default):
HERMES_OPENROUTER_CACHE1/true/yes/on, case-insensitive) addsX-OpenRouter-Cache: trueto every OpenRouter request. Identical request bodies are served from edge cache at zero token cost.HERMES_OPENROUTER_CACHE_TTLX-OpenRouter-Cache-TTL; ignored when caching is disabled. Out-of-range / non-integer values are dropped silently — OpenRouter's 5-minute default then applies.Why
OpenRouter response caching is a pure Pareto improvement for hermes deployments that send any kind of repeated traffic — bot title-generation on similar inputs, deterministic skills, system-prompt-only requests, retried calls. Cache hits return in 80–300ms with zero token charges; misses behave exactly as today. Today there's no in-tree way to opt into it, so users either fork hermes or run a header-injecting proxy.
Designed as opt-in (not default-on) on the principle of least surprise, and as env vars rather than a
config.yamlschema change to match the existing pattern (OPENROUTER_API_KEY,OPENROUTER_BASE_URL,HERMES_QWEN_BASE_URL, etc.). Settable per-profile via~/.hermes/.env. Happy to follow up with aconfig.yamlfield if the maintainers prefer that surface.Implementation
Two new helpers in
agent/auxiliary_client.py:openrouter_feature_headers()— env-driven feature headers (cache + TTL today, easy extension point for future OpenRouter features)openrouter_default_headers()—_OR_HEADERS⊕openrouter_feature_headers(); the single API every OpenRouter client construction site should useThe four OpenRouter-targeting client construction sites now route through
openrouter_default_headers():agent/auxiliary_client.py::_try_openrouter(sync, both pool + direct branches)agent/auxiliary_client.py::_to_async_client(async conversion)run_agent.py::AIAgent.__init__(primary client, the inline duplicate at L1424)run_agent.py::AIAgent._apply_client_headers_for_base_url(credential rotation path)Drive-by cleanup: the inline OpenRouter header dict that was hard-copied at
run_agent.py:1424now uses the same helper as everywhere else, so future header additions only need to touch one place.How to test
Or with explicit TTL:
Tests
tests/agent/test_openrouter_response_caching.py— 24 new cases covering truthy/falsy parsing, TTL boundary values (1 / 300 / 86400), invalid TTLs (0, 86401,abc,-1,12.5), and the cache-off-but-TTL-set edge case. Verified locally:tests/run_agent/test_provider_attribution_headers.py— extended with two new tests verifying the env var threads through_apply_client_headers_for_base_urland that attribution headers are preserved alongside. Existing tests still pass:Platforms tested
curlDocs
website/docs/reference/environment-variables.md— both vars listed under "LLM Providers" right afterOPENROUTER_BASE_URL.Notes
extra_headersconfig field would let users add arbitrary headers to any provider — happy to do that as a follow-up if there's interest.