fix: codex_responses prompt caching — session routing headers + cache_write_tokens field by zicochaos · Pull Request #10006 · NousResearch/hermes-agent

zicochaos · 2026-04-15T02:11:57Z

Problem

Prompt caching does not work when using codex_responses API mode with OpenAI-compatible providers (e.g. theclawbay). Every request is a cache miss despite prompt_cache_key being set in the request body.

Root cause

Two issues:

1. Missing session routing headers (`run_agent.py`)

The OpenAI client is initialized without session_id or x-client-request-id headers for codex providers. These headers are required for server-side cache routing — they tell the backend to route requests to the same server that holds the cached prompt prefix.

The official Codex CLI sends these unconditionally. Hermes sets default_headers for OpenRouter, GitHub Copilot, Kimi, and Qwen — but never for Codex/theclawbay.

2. Wrong field name for cache_write_tokens (`agent/usage_pricing.py`)

The codex_responses branch reads cache_creation_tokens (Anthropic naming convention) instead of cache_write_tokens (OpenAI Responses API naming). This means cache write tokens are always reported as 0.

Fix

Patch 1: Session routing headers

After session_id is assigned during __init__, inject session_id and x-client-request-id into default_headers for codex_responses mode. Also applied in _apply_client_headers_for_base_url() so headers survive /model switches.

Patch 2: cache_write_tokens field

Read cache_write_tokens first (OpenAI naming), fall back to cache_creation_tokens for backward compatibility.

Tests

test_codex_responses_reads_cache_write_tokens_field — verifies correct field is read
test_codex_responses_falls_back_to_cache_creation_tokens — backward compat
test_codex_responses_injects_session_routing_headers — verifies headers are set

Affected files

File	Change
`run_agent.py`	Inject session routing headers for `codex_responses` mode (+22 lines)
`agent/usage_pricing.py`	Read `cache_write_tokens` before `cache_creation_tokens` (+3/-1 lines)
`tests/agent/test_usage_pricing.py`	2 new tests
`tests/run_agent/test_run_agent.py`	1 new test

…_write_tokens field

Marcuss2 · 2026-04-16T11:51:08Z

This will likely fix the issue I am encountering, I seem to go trough Codex limits much faster with Hermes than with Kilo.

markojak · 2026-04-29T14:10:01Z

Linking this into the #17459 cache/time-awareness cluster.

This remains relevant for Codex/Responses caching, but should align with the umbrella direction:

stable cached prompt/cacheable prefix;
volatile current time outside the cached system prompt;
cache telemetry/routing should make it easy to verify that fix: prevent stale timestamp perception by injecting current time per-turn #15872/Consolidate live-time PRs around one ephemeral runtime context path #17476 does not regress cache hits.

Related: #16235, #15866, #17335, #17459, #17476.

fix: codex_responses prompt caching — session routing headers + cache…

26c2a01

…_write_tokens field

looplj mentioned this pull request Apr 20, 2026

Draft: Add OpenAI cache identity support looplj/axonhub#1429

Open

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API labels Apr 26, 2026

markojak mentioned this pull request Apr 29, 2026

Rework quiet-hours/time awareness: surface time to agent/tools, don't enforce in control plane #17459

Open

7 tasks

markojak mentioned this pull request Apr 29, 2026

feat(prompt-caching): wire cache_control breakpoints into codex_responses adapter #16235

Closed

JackALaing mentioned this pull request May 15, 2026

fix(codex): bound prompt cache keys #26468

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: codex_responses prompt caching — session routing headers + cache_write_tokens field#10006

fix: codex_responses prompt caching — session routing headers + cache_write_tokens field#10006
zicochaos wants to merge 1 commit into
NousResearch:mainfrom
zicochaos:fix/codex-prompt-caching

zicochaos commented Apr 15, 2026

Uh oh!

Marcuss2 commented Apr 16, 2026

Uh oh!

markojak commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zicochaos commented Apr 15, 2026

Problem

Root cause

1. Missing session routing headers (run_agent.py)

2. Wrong field name for cache_write_tokens (agent/usage_pricing.py)

Fix

Patch 1: Session routing headers

Patch 2: cache_write_tokens field

Tests

Affected files

Uh oh!

Marcuss2 commented Apr 16, 2026

Uh oh!

markojak commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. Missing session routing headers (`run_agent.py`)

2. Wrong field name for cache_write_tokens (`agent/usage_pricing.py`)