fix: codex_responses prompt caching — session routing headers + cache_write_tokens field#10006
Open
zicochaos wants to merge 1 commit into
Open
fix: codex_responses prompt caching — session routing headers + cache_write_tokens field#10006zicochaos wants to merge 1 commit into
zicochaos wants to merge 1 commit into
Conversation
…_write_tokens field
|
This will likely fix the issue I am encountering, I seem to go trough Codex limits much faster with Hermes than with Kilo. |
7 tasks
|
Linking this into the #17459 cache/time-awareness cluster. This remains relevant for Codex/Responses caching, but should align with the umbrella direction:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Prompt caching does not work when using
codex_responsesAPI mode with OpenAI-compatible providers (e.g. theclawbay). Every request is a cache miss despiteprompt_cache_keybeing set in the request body.Root cause
Two issues:
1. Missing session routing headers (
run_agent.py)The OpenAI client is initialized without
session_idorx-client-request-idheaders for codex providers. These headers are required for server-side cache routing — they tell the backend to route requests to the same server that holds the cached prompt prefix.The official Codex CLI sends these unconditionally. Hermes sets
default_headersfor OpenRouter, GitHub Copilot, Kimi, and Qwen — but never for Codex/theclawbay.2. Wrong field name for cache_write_tokens (
agent/usage_pricing.py)The
codex_responsesbranch readscache_creation_tokens(Anthropic naming convention) instead ofcache_write_tokens(OpenAI Responses API naming). This means cache write tokens are always reported as 0.Fix
Patch 1: Session routing headers
After
session_idis assigned during__init__, injectsession_idandx-client-request-idintodefault_headersforcodex_responsesmode. Also applied in_apply_client_headers_for_base_url()so headers survive/modelswitches.Patch 2: cache_write_tokens field
Read
cache_write_tokensfirst (OpenAI naming), fall back tocache_creation_tokensfor backward compatibility.Tests
test_codex_responses_reads_cache_write_tokens_field— verifies correct field is readtest_codex_responses_falls_back_to_cache_creation_tokens— backward compattest_codex_responses_injects_session_routing_headers— verifies headers are setAffected files
run_agent.pycodex_responsesmode (+22 lines)agent/usage_pricing.pycache_write_tokensbeforecache_creation_tokens(+3/-1 lines)tests/agent/test_usage_pricing.pytests/run_agent/test_run_agent.py