Skip to content

fix: codex_responses prompt caching — session routing headers + cache_write_tokens field#10006

Open
zicochaos wants to merge 1 commit into
NousResearch:mainfrom
zicochaos:fix/codex-prompt-caching
Open

fix: codex_responses prompt caching — session routing headers + cache_write_tokens field#10006
zicochaos wants to merge 1 commit into
NousResearch:mainfrom
zicochaos:fix/codex-prompt-caching

Conversation

@zicochaos

Copy link
Copy Markdown
Contributor

Problem

Prompt caching does not work when using codex_responses API mode with OpenAI-compatible providers (e.g. theclawbay). Every request is a cache miss despite prompt_cache_key being set in the request body.

Root cause

Two issues:

1. Missing session routing headers (run_agent.py)

The OpenAI client is initialized without session_id or x-client-request-id headers for codex providers. These headers are required for server-side cache routing — they tell the backend to route requests to the same server that holds the cached prompt prefix.

The official Codex CLI sends these unconditionally. Hermes sets default_headers for OpenRouter, GitHub Copilot, Kimi, and Qwen — but never for Codex/theclawbay.

2. Wrong field name for cache_write_tokens (agent/usage_pricing.py)

The codex_responses branch reads cache_creation_tokens (Anthropic naming convention) instead of cache_write_tokens (OpenAI Responses API naming). This means cache write tokens are always reported as 0.

Fix

Patch 1: Session routing headers

After session_id is assigned during __init__, inject session_id and x-client-request-id into default_headers for codex_responses mode. Also applied in _apply_client_headers_for_base_url() so headers survive /model switches.

Patch 2: cache_write_tokens field

Read cache_write_tokens first (OpenAI naming), fall back to cache_creation_tokens for backward compatibility.

Tests

  • test_codex_responses_reads_cache_write_tokens_field — verifies correct field is read
  • test_codex_responses_falls_back_to_cache_creation_tokens — backward compat
  • test_codex_responses_injects_session_routing_headers — verifies headers are set

Affected files

File Change
run_agent.py Inject session routing headers for codex_responses mode (+22 lines)
agent/usage_pricing.py Read cache_write_tokens before cache_creation_tokens (+3/-1 lines)
tests/agent/test_usage_pricing.py 2 new tests
tests/run_agent/test_run_agent.py 1 new test

@Marcuss2

Copy link
Copy Markdown

This will likely fix the issue I am encountering, I seem to go trough Codex limits much faster with Hermes than with Kilo.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API labels Apr 26, 2026
@markojak

Copy link
Copy Markdown

Linking this into the #17459 cache/time-awareness cluster.

This remains relevant for Codex/Responses caching, but should align with the umbrella direction:

Related: #16235, #15866, #17335, #17459, #17476.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/openai OpenAI / Codex Responses API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants