feat: track image_generation tokens separately + real-upstream stress test#422
Merged
icebear0828 merged 2 commits intodevfrom Apr 27, 2026
Merged
Conversation
… test
Upstream's response.completed carries `tool_usage.image_gen.{input_tokens,
output_tokens}` distinct from host-model `usage`, but the proxy was dropping
it on the floor. Plumb it through the full pipeline so image-gen cost no
longer pollutes host-model token charts:
- parse `tool_usage.image_gen` in CodexResponseData / UsageInfo and the
/v1/responses route's extractResponseUsage path
- accumulate `image_input_tokens` / `image_output_tokens` (lifetime + window)
on AccountUsage; surface in /admin/usage-stats/summary as
`total_image_input_tokens` / `total_image_output_tokens` and persist in
UsageSnapshot/Baseline (old usage-history.json reads as 0, forward-compat)
- Dashboard: new "Image Tokens (in/out)" Summary card; AccountCard adds a
window-image-tokens row when the account has image consumption
Plus a real-upstream stress test for image generation:
- tests/real/image-generation.test.ts — matrix of {gpt-5.4-mini, gpt-5.5} ×
{1024x1024, 3840x2160}, 2 concurrent × 2 rounds per combo, asserts SSE
lifecycle + image base64 size + tool_usage.image_gen.output_tokens > 0,
ends with /admin/usage-stats/summary delta check
- tests/bench/image-gen-bench.ts — same matrix, prints p50/p95/min/max +
token averages for human comparison
Self-review followups on PR #422: - Add unit tests for the two new parser paths (parseResponseData's tool_usage.image_gen branch in codex-events.ts, and the new extractImageGenUsage helper in routes/responses.ts) — covers happy path, zero-counts, missing tool_usage, missing image_gen sub-block, non-numeric values, and the partial-counts case - Real-test correctness: replace the size-based heuristic with a file-magic check (PNG / JPEG / WebP / GIF) so an account that silently downgrades to an SVG-text fallback is caught regardless of payload size; relax the size threshold to a more conservative floor (50 KB / 500 KB) since flat-color PNGs can compress smaller than the prior 200 KB / 1.5 MB assumed
Merged
5 tasks
icebear0828
added a commit
that referenced
this pull request
Apr 27, 2026
… auto-precision display (#423) * fix: cache_tokens extraction in non-codex upstreams + image_generation request counters Three orthogonal dashboard accuracy issues, fixed together: 1) **Cache hit rate stuck at 0%** — OpenAI / Anthropic / Gemini upstream adapters were synthesizing `response.completed` with hardcoded `input_tokens_details: {}`, dropping the cache hit info that the native APIs do return. Now extract `prompt_tokens_details.cached_tokens` (OpenAI), `cache_read_input_tokens` (Anthropic message_start + message_delta), and `cachedContentTokenCount` (Gemini explicit caches), and surface them under the standard Codex shape so the existing parsers pick them up. 2) **No image generation request counter** — PR #422 added image token tracking but no count of image requests (success vs failure). Now detect `tools[].type === "image_generation"` at request parse time, propagate `expectsImageGen` through proxy-handler, and on every release call site (success / EmptyResponse / upstream errors) classify as success (image_output_tokens > 0) or failed. Adds: - AccountUsage: image_request_count, image_request_failed_count (+ window mirrors) - UsageSnapshot/Baseline/DataPoint/Summary: same - /admin/usage-stats/summary: total_image_request_count, total_image_request_failed_count - Dashboard "Image Requests" card showing N ok · M failed - AccountCard window image requests row when activity present 3) **Hit rate display rounded sub-0.05% values to 0.0%** — formatHitRate now uses auto precision: ≥1% one decimal, 0.01-1% two decimals, >0 but <0.01% shows "<0.01%", =0 shows "0%". Backward compat: old usage-history.json snapshots without the new image_request fields read as 0 via `?? 0`. Tests: - 7 new unit tests for upstream cache extraction (OpenAI/Anthropic/Gemini) - 5 new unit tests for image_request counter logic in recordUsage - 2 new unit tests for usage-stats image_request aggregation - tests/real/image-generation.test.ts e2e block now asserts total_image_request_count == +1 after a single successful gen * fix: harden cache-token extraction + count compact-route image_gen attempts Self-review followups on PR #423: - Anthropic message_delta: take Math.max(start, delta) for cache_read_input_tokens so a future API change emitting 0 in delta can't clobber a real hit reported in message_start. Adds defensive unit test covering the regression. - /v1/responses/compact (handleCompact): detect image_generation tool at request time and synthesize a `image_request_attempted=true, image_request_succeeded=false` usage on every release site. Compact doesn't surface tool_usage.image_gen, so any image_generation tool forwarded here is always classified as failed — at least the dashboard now catches accidental misuse rather than silently dropping the signal. --------- Co-authored-by: icebear0828 <icebear0828@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
response.completed.response.tool_usage.image_gen.{input_tokens, output_tokens}此前一直被 proxy 丢弃。本 PR 全链路接入图像 token 计数(parser → AccountUsage → 持久化 → admin API → 仪表盘),与现有 host-model token 完全分离,不污染原有图表{gpt-5.4-mini, gpt-5.5} × {1024×1024, 3840×2160},2 并发 × 2 轮 = 16 张图,断言 SSE 完整事件链 + base64 长度阈值 + image_gen 输出 token > 0,末尾校验/admin/usage-stats/summary的 image token 增量tests/bench/image-gen-bench.ts输出 p50/p95/min/max + 主模型/图像 token 均值的 markdown 表改动
解析与类型
src/types/codex-events.ts—parseResponseData提取tool_usage.image_gen,CodexResponseData.usage加image_input_tokens/image_output_tokenssrc/translation/codex-event-extractor.ts—UsageInfo同步src/routes/responses.ts— 新增extractImageGenUsage,/v1/responses路由的 stream / collect 两条路径合并 host + image usage累加与持久化
src/auth/types.ts—AccountUsage加图像 token 字段(总 + window)src/auth/account-registry.ts/account-lifecycle.ts/account-pool.ts—recordUsage/release入参扩展src/auth/usage-stats.ts—UsageSnapshot/Baseline/DataPoint/Summary全部加图像维度;poolTotals/recoverBaseline/recordSnapshot的 baseline drop-detection /bucketize同步;老usage-history.json缺新字段读为 0,向后兼容src/routes/shared/proxy-handler.ts— 流式 / 收集两条路径的 usage 类型扩展;请求结束日志多打image=<in>/<out>前端
shared/hooks/use-usage-stats.ts+shared/types.ts— TS 镜像同步shared/i18n/translations.ts— 新增imageTokens/imageTokensHint/windowImageTokens(中英)web/src/pages/UsageStats.tsx— Summary grid 5 列改 6 列,新增 "Image Tokens (in/out)" 卡片web/src/components/AccountCard.tsx— 该账号有图像消费时多显示窗口图像 token 行 + 累计行追加· N img客户端协议
/v1/responses本身就透传 SSE,客户端已能直接看到tool_usage.image_gen;OpenAI/Anthropic/Gemini 翻译层无对应位,保持原状文档与测试
API.md/API_CN.md— image_generation 段落补图像 token 计费说明CHANGELOG.md— Added 两条tests/real/image-generation.test.ts— 4 组合 × 2 并发 × 2 轮矩阵 + summary 增量端到端校验tests/bench/image-gen-bench.ts— 矩阵 markdown 表输出tests/unit/auth/usage-stats.test.ts— 老断言补 image 维度Test plan
npx tsc --noEmit0 错npm test— 1618 passed / 1 skippednpm run dev起本地代理 →npm run test:real -- image-generation16 张图全过 + summary delta 验证通过npx tsx tests/bench/image-gen-bench.ts 2 1正常输出延迟 + token 表格curl /admin/usage-stats/summary看到total_image_input_tokens/total_image_output_tokens非零且与 host-model token 字段并列