fix(usage): forward cached_tokens via prompt_tokens_details on chat-completions#82062
Conversation
…ompletions toOpenAiChatCompletionsUsage was burying cacheRead into prompt_tokens without emitting the canonical OpenAI prompt_tokens_details.cached_tokens field. Any chat-completions client (e.g. an iOS app showing per-reply cost) consequently saw cached=0 on every turn even though the upstream OpenAI cache was hitting at 80%+, leading to a ~10x over-statement of cost on cached turns. Adds the missing prompt_tokens_details when cacheRead > 0; omits it when zero so requests with no cache contribution keep the lean shape they have today. Field name and shape match OpenAI's documented chat-completions usage breakdown: https://platform.openai.com/docs/guides/prompt-caching
|
Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 5:46 AM ET / 09:46 UTC. Summary PR surface: Source +2, Tests +37. Total +39 across 5 files. Reproducibility: yes. Source inspection shows current main folds Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the focused additive usage-shape fix once required checks pass, keeping cache-zero responses on the existing lean shape unless maintainers choose a broader OpenAI-shape change later. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows current main folds Is this the best way to solve the issue? Yes. The PR fixes the centralized usage mapper used by both streaming and non-streaming chat-completions responses, rather than duplicating serialization logic at individual call sites. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against c89298f9f800. Label changesLabel justifications:
Evidence reviewedPR surface: Source +2, Tests +37. Total +39 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
…ens_details field
Three call sites in src/gateway/openai-http.test.ts and one in
src/gateway/openai-http.usage.test.ts asserted the pre-patch
usage-frame shape (3 fields). Now that toOpenAiChatCompletionsUsage
emits prompt_tokens_details: { cached_tokens } when cacheRead > 0,
those assertions are extended to include the new field.
Only assertions where the test input had a non-zero cacheRead were
touched; cache-zero assertions stay on the lean 3-field shape and
continue to pass unchanged.
Resolves the src/agents/usage.ts conflict with main's new
completion_tokens_details (reasoning_tokens) addition. The two changes
touched the same toOpenAiChatCompletionsUsage return shape; combined so
both detail fields are emitted independently — prompt_tokens_details
{ cached_tokens } when cacheRead > 0, completion_tokens_details
{ reasoning_tokens } when present — using main's spread idiom. Both test
suites' expectations are satisfied.
|
ClawSweeper PR egg ✨ Hatched: 🌱 uncommon Neon Patch Peep Hatch commandComment Hatchability rules:
Rarity: 🌱 uncommon. What is this egg doing here?
|
|
Landing proof for the maintainer fixup on this PR. Behavior addressed: forwards provider cache-read token counts through the OpenAI-compatible chat-completions usage shape as prompt_tokens_details.cached_tokens, while keeping completion and total token counts unchanged. Real environment tested: local OpenClaw checkout on Node/pnpm; Blacksmith Testbox via check:changed; live OpenAI cache behavior using a 1Password-injected OpenAI key; GitHub Actions CI for PR head b511271. Exact steps or command run after this patch: pnpm test src/agents/usage.test.ts src/gateway/openai-http.usage.test.ts src/gateway/openai-http.test.ts -- --reporter=verbose
pnpm check:changed
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_CACHE_TEST=1 OPENCLAW_LIVE_TEST_QUIET=0 node scripts/run-vitest.mjs run --config test/vitest/vitest.live.config.ts --configLoader runner src/agents/pi-embedded-runner.cache.live.test.ts -t "hits the expected OpenAI cache plateau" --reporter=verbose
gh run rerun 26503290013 --failed --repo openclaw/openclaw
gh pr checks 82062 --repo openclaw/openclawEvidence after fix: focused Vitest passed 56 tests; check:changed passed in Testbox tbx_01ksmcbxrytbsmwscr121raa4t; live OpenAI cache test passed with cacheRead=4864, input=143, rate=0.971 on the cache-hit turn; rerun CI passed, including build-artifacts job 78053475728. Observed result after fix: OpenAI-compatible gateway usage can expose cached prompt tokens via prompt_tokens_details.cached_tokens when cacheRead is present. What was not tested: full packaged-install live HTTP gateway run against OpenAI. The PR's Real behavior proof covers the patched gateway path, and the focused gateway tests cover the wire-shape mapping. |
…026.5.27) (#698) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.26` → `2026.5.27` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.27`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026527) [Compare Source](openclaw/openclaw@v2026.5.26...v2026.5.27) ##### Highlights - Safer local/runtime boundaries: OpenClaw now rejects unsafe command wrappers, malformed CLI numeric options, unsafe Node runtime env overrides, no-auth Tailscale exposure, and non-admin device-role pairing approvals before they can affect live runs. ([#​87308](openclaw/openclaw#87308), [#​87305](openclaw/openclaw#87305), [#​87292](openclaw/openclaw#87292), [#​87146](openclaw/openclaw#87146)) - Matrix and auto-reply delivery are steadier: mention previews stay inert, final mention replies deliver normally, shared-DM notices are awaited, MXID parsing ignores filenames, and reasoning-prefixed `NO_REPLY` responses stay suppressed. - Provider and agent reliability improved across OpenAI-compatible embeddings, cached token usage, Anthropic/Codex/Claude runtime state, unsupported tool-schema quarantine, heartbeat templates, and session fallback errors. ([#​85269](openclaw/openclaw#85269), [#​82062](openclaw/openclaw#82062), [#​85416](openclaw/openclaw#85416), [#​86855](openclaw/openclaw#86855)) - Plugin and package release paths got tighter: Pixverse ships as an external video plugin with region selection, package exclusions and shrinkwrap inventory match the published npm shape, and release/package smoke commands fail bounded instead of hanging. - Gateway hot paths do less rediscovery by reusing current plugin metadata fingerprints, stable plugin index fingerprints, read-only session metadata, active working stores, status fast paths, and auth/env snapshots. ([#​86439](openclaw/openclaw#86439)) ##### Changes - Memory: add a core OpenAI-compatible embedding provider for local and hosted OpenAI-style endpoints, with config, doctor, and docs support. ([#​85269](openclaw/openclaw#85269)) Thanks [@​dutifulbob](https://github.com/dutifulbob). - Plugin SDK: mark memory-specific embedding provider registration as deprecated compatibility and surface non-bundled usage in plugin compatibility diagnostics. ([#​85072](openclaw/openclaw#85072)) Thanks [@​mbelinky](https://github.com/mbelinky). - Pixverse: add video generation provider support, API region selection, and external plugin publishing. - Plugins: expose approval action metadata for plugin-driven approval surfaces. ##### Fixes - Security/CLI/runtime: harden hostname normalization for repeated trailing dots, block side-effecting command wrappers, reject unsafe Node runtime env overrides, reject loose numeric CLI and gateway options, require admin approval for node device-role pairing, and reject no-auth Tailscale exposure. ([#​87305](openclaw/openclaw#87305), [#​87292](openclaw/openclaw#87292), [#​87308](openclaw/openclaw#87308), [#​87146](openclaw/openclaw#87146)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Doctor: validate runtime tool schemas for every configured embedded agent while skipping ACP-only profiles, so bad non-default plugin or MCP tools are reported before assistant turns. - Telegram: route `sendMessage` action replies through durable outbound delivery so completed agent responses remain retryable when the gateway send path times out. ([#​87261](openclaw/openclaw#87261)) Thanks [@​mbelinky](https://github.com/mbelinky). - Matrix/auto-reply: keep draft previews mention-inert, preserve final mention delivery, send mention finals normally, await shared DM notices, ignore filename-embedded MXIDs, and suppress reasoning-prefixed `NO_REPLY` responses. - Agents/providers: add OpenAI-compatible cache retention, forward cached token usage in chat completions, preserve runtime context before active user turns, strip stale Anthropic thinking, load Claude CLI OAuth for Pi auth profiles, avoid false Codex runtime live switches, and quarantine unsupported tool schemas. ([#​82062](openclaw/openclaw#82062), [#​87167](openclaw/openclaw#87167), [#​86855](openclaw/openclaw#86855)) - Gateway/performance: cache plugin metadata fingerprints and stable plugin index fingerprints, borrow read-only session metadata safely, keep the active session working store hot, keep status on a bounded fast path, and preserve model auth profile suffixes. ([#​86439](openclaw/openclaw#86439)) - Package/install/release: align npm package exclusions and inventory, omit unpacked test helpers, skip Homebrew until macOS packages need it, cap tsdown heap in containers, bound install/release smoke waits, and harden post-publish verification. - Codex/Auth: bound ChatGPT OAuth token exchange and refresh requests, and honor cancellation across Codex and Anthropic OAuth login flows. - QA/E2E/CI: bound Telegram, kitchen-sink, Open WebUI, ClawHub, MCP, Discord, realtime, labeler, and GitHub API waits; fail empty explicit test, live-media, gateway CPU, startup benchmark, plugin gauntlet, and beta-smoke runs instead of false-greening. - Agents/Codex: keep spawned agent bootstrap files rooted in the agent workspace while running task commands, transcripts, and compaction from the requested cwd. ([#​87218](openclaw/openclaw#87218)) Thanks [@​mbelinky](https://github.com/mbelinky). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/698
Summary
toOpenAiChatCompletionsUsagewas buryingcacheReadintoprompt_tokenswithout emitting the canonical OpenAIprompt_tokens_details.cached_tokensfield. Any chat-completionsclient (e.g. an iOS app showing per-reply cost) saw
cached=0on every turn even though the upstream OpenAI cache was hitting
80%+. Per-turn cost reporting was ~10x over-stated on cached
turns.
This PR adds the missing field when
cacheRead > 0and omits itotherwise, keeping the wire shape compatible for non-cached
responses.
Field name and shape match OpenAI's documented chat-completions
usage breakdown:
https://platform.openai.com/docs/guides/prompt-caching
Real behavior proof
Behavior or issue addressed: When OpenAI's prompt cache
hit on a chat-completions request through OpenClaw,
cacheReadwas being folded into
prompt_tokensand the canonicalprompt_tokens_details.cached_tokensfield was NOT emitted onthe SSE wire. iOS / dashboard clients that compute per-reply
cost using the OpenAI-documented usage shape treat all of
prompt_tokensas uncached → ~10x over-statement on acached-heavy chat session. The OpenAI dashboard billed
correctly (cache hit); the OpenClaw chat-completions wire
shape just dropped the breakdown.
Real environment tested: OpenClaw gateway
openclaw@2026.5.12running on macOS Sequoia (real install at
~/Library/pnpm/global/...). Upstream modelopenai/gpt-5.4via the
mainagent. Client is a custom iOS chat clientposting
POST /v1/chat/completionswithstream_options: { include_usage: true }and consuming theSSE stream including the final usage chunk. Single-user
setup, no fixtures.
Exact steps or command run after this patch: (1) Patched
the compiled
usage-*.jsin the locally-installedopenclaw@2026.5.12package with the same edit this PR makesto
src/agents/usage.ts. (2) Ranopenclaw gateway restartto load the patched module. (3) From the iOS client, sent
two consecutive casual chat turns ~9 seconds apart against
the
mainagent. First turn establishes the OpenAIprompt-cache prefix; second turn hits it. (4) Captured the
raw SSE
usageframe from the iOS client's debug log.Evidence after fix: Raw SSE
usageframe as observed atthe iOS client (warm-cache turn, copied verbatim from the
live debug log on the patched gateway):
For comparison, the same client on the unpatched gateway
emitted (same chat, prior turn that warmed the cache):
The OpenAI billing dashboard for the same period independently
confirms ~98% of input tokens were cache reads, matching the
30,848 / 31,442 figure now reaching the client.
Observed result after fix: The chat-completions SSE final
chunk now includes
prompt_tokens_details: { cached_tokens: N }when the gateway has a non-zero
cacheReadfor the turn.Per-reply cost calculation on the iOS client dropped from the
inflated cache-blind figure (
$0.08 USD) to the cache-aware$0.01 USD) — matching the amount the OpenAIblended figure (
dashboard actually billed. Non-cache-hit responses (e.g. the
first turn after a >5min gap) continue to emit the existing
lean shape with no
prompt_tokens_detailskey, verified byinspecting the cold-cache turn in the same session.
What was not tested: Anthropic / Vertex provider paths.
Their usage shapes go through
normalizeUsagefirst and thenthe same
toOpenAiChatCompletionsUsage, so the fix appliesuniformly to them per the source, but I did not end-to-end
verify those paths in a real OpenClaw setup. The non-stream
chat-completions code path was also not end-to-end verified
(the request in question used
stream: true); the samefunction is reached via the non-stream path per the source.
AI-assisted
This patch was prepared with the assistance of an AI coding
agent (Claude Code) driven from a real cost-reporting
investigation on my OpenClaw instance. The "Evidence after fix"
and "Observed result after fix" blocks above are copied
verbatim from my own iOS client's live debug log on the
patched gateway, not generated.