feat: track image_generation tokens separately + real-upstream stress test by icebear0828 · Pull Request #422 · icebear0828/codex-proxy

icebear0828 · 2026-04-27T18:58:10Z

Summary

上游 response.completed.response.tool_usage.image_gen.{input_tokens, output_tokens} 此前一直被 proxy 丢弃。本 PR 全链路接入图像 token 计数(parser → AccountUsage → 持久化 → admin API → 仪表盘),与现有 host-model token 完全分离,不污染原有图表
新增真实上游压力测试矩阵 {gpt-5.4-mini, gpt-5.5} × {1024×1024, 3840×2160},2 并发 × 2 轮 = 16 张图,断言 SSE 完整事件链 + base64 长度阈值 + image_gen 输出 token > 0,末尾校验 /admin/usage-stats/summary 的 image token 增量
配套 bench 脚本 tests/bench/image-gen-bench.ts 输出 p50/p95/min/max + 主模型/图像 token 均值的 markdown 表

改动

解析与类型

src/types/codex-events.ts — parseResponseData 提取 tool_usage.image_gen,CodexResponseData.usage 加 image_input_tokens / image_output_tokens
src/translation/codex-event-extractor.ts — UsageInfo 同步
src/routes/responses.ts — 新增 extractImageGenUsage,/v1/responses 路由的 stream / collect 两条路径合并 host + image usage

累加与持久化

src/auth/types.ts — AccountUsage 加图像 token 字段(总 + window)
src/auth/account-registry.ts / account-lifecycle.ts / account-pool.ts — recordUsage / release 入参扩展
src/auth/usage-stats.ts — UsageSnapshot/Baseline/DataPoint/Summary 全部加图像维度;poolTotals / recoverBaseline / recordSnapshot 的 baseline drop-detection / bucketize 同步;老 usage-history.json 缺新字段读为 0,向后兼容
src/routes/shared/proxy-handler.ts — 流式 / 收集两条路径的 usage 类型扩展;请求结束日志多打 image=<in>/<out>

前端

shared/hooks/use-usage-stats.ts + shared/types.ts — TS 镜像同步
shared/i18n/translations.ts — 新增 imageTokens / imageTokensHint / windowImageTokens(中英)
web/src/pages/UsageStats.tsx — Summary grid 5 列改 6 列,新增 "Image Tokens (in/out)" 卡片
web/src/components/AccountCard.tsx — 该账号有图像消费时多显示窗口图像 token 行 + 累计行追加 · N img

客户端协议

不动:/v1/responses 本身就透传 SSE,客户端已能直接看到 tool_usage.image_gen;OpenAI/Anthropic/Gemini 翻译层无对应位,保持原状

文档与测试

API.md / API_CN.md — image_generation 段落补图像 token 计费说明
CHANGELOG.md — Added 两条
tests/real/image-generation.test.ts — 4 组合 × 2 并发 × 2 轮矩阵 + summary 增量端到端校验
tests/bench/image-gen-bench.ts — 矩阵 markdown 表输出
tests/unit/auth/usage-stats.test.ts — 老断言补 image 维度

Test plan

npx tsc --noEmit 0 错
npm test — 1618 passed / 1 skipped
npm run dev 起本地代理 → npm run test:real -- image-generation 16 张图全过 + summary delta 验证通过
- gpt-5.4-mini × 1024×1024: 4/4 ok, 21459/22341/24563 ms (min/avg/max), image tokens avg in/out = 31/196
- gpt-5.4-mini × 3840×2160: 4/4 ok, 33006/36742/45462 ms, 32/371
- gpt-5.5 × 1024×1024: 4/4 ok, 22643/27368/30208 ms, 31/196
- gpt-5.5 × 3840×2160: 4/4 ok, 32781/45610/53062 ms, 31/1112
npx tsx tests/bench/image-gen-bench.ts 2 1 正常输出延迟 + token 表格
直接 curl /admin/usage-stats/summary 看到 total_image_input_tokens / total_image_output_tokens 非零且与 host-model token 字段并列

… test Upstream's response.completed carries `tool_usage.image_gen.{input_tokens, output_tokens}` distinct from host-model `usage`, but the proxy was dropping it on the floor. Plumb it through the full pipeline so image-gen cost no longer pollutes host-model token charts: - parse `tool_usage.image_gen` in CodexResponseData / UsageInfo and the /v1/responses route's extractResponseUsage path - accumulate `image_input_tokens` / `image_output_tokens` (lifetime + window) on AccountUsage; surface in /admin/usage-stats/summary as `total_image_input_tokens` / `total_image_output_tokens` and persist in UsageSnapshot/Baseline (old usage-history.json reads as 0, forward-compat) - Dashboard: new "Image Tokens (in/out)" Summary card; AccountCard adds a window-image-tokens row when the account has image consumption Plus a real-upstream stress test for image generation: - tests/real/image-generation.test.ts — matrix of {gpt-5.4-mini, gpt-5.5} × {1024x1024, 3840x2160}, 2 concurrent × 2 rounds per combo, asserts SSE lifecycle + image base64 size + tool_usage.image_gen.output_tokens > 0, ends with /admin/usage-stats/summary delta check - tests/bench/image-gen-bench.ts — same matrix, prints p50/p95/min/max + token averages for human comparison

Self-review followups on PR #422: - Add unit tests for the two new parser paths (parseResponseData's tool_usage.image_gen branch in codex-events.ts, and the new extractImageGenUsage helper in routes/responses.ts) — covers happy path, zero-counts, missing tool_usage, missing image_gen sub-block, non-numeric values, and the partial-counts case - Real-test correctness: replace the size-based heuristic with a file-magic check (PNG / JPEG / WebP / GIF) so an account that silently downgrades to an SVG-text fallback is caught regardless of payload size; relax the size threshold to a more conservative floor (50 KB / 500 KB) since flat-color PNGs can compress smaller than the prior 200 KB / 1.5 MB assumed

… auto-precision display (#423) * fix: cache_tokens extraction in non-codex upstreams + image_generation request counters Three orthogonal dashboard accuracy issues, fixed together: 1) **Cache hit rate stuck at 0%** — OpenAI / Anthropic / Gemini upstream adapters were synthesizing `response.completed` with hardcoded `input_tokens_details: {}`, dropping the cache hit info that the native APIs do return. Now extract `prompt_tokens_details.cached_tokens` (OpenAI), `cache_read_input_tokens` (Anthropic message_start + message_delta), and `cachedContentTokenCount` (Gemini explicit caches), and surface them under the standard Codex shape so the existing parsers pick them up. 2) **No image generation request counter** — PR #422 added image token tracking but no count of image requests (success vs failure). Now detect `tools[].type === "image_generation"` at request parse time, propagate `expectsImageGen` through proxy-handler, and on every release call site (success / EmptyResponse / upstream errors) classify as success (image_output_tokens > 0) or failed. Adds: - AccountUsage: image_request_count, image_request_failed_count (+ window mirrors) - UsageSnapshot/Baseline/DataPoint/Summary: same - /admin/usage-stats/summary: total_image_request_count, total_image_request_failed_count - Dashboard "Image Requests" card showing N ok · M failed - AccountCard window image requests row when activity present 3) **Hit rate display rounded sub-0.05% values to 0.0%** — formatHitRate now uses auto precision: ≥1% one decimal, 0.01-1% two decimals, >0 but <0.01% shows "<0.01%", =0 shows "0%". Backward compat: old usage-history.json snapshots without the new image_request fields read as 0 via `?? 0`. Tests: - 7 new unit tests for upstream cache extraction (OpenAI/Anthropic/Gemini) - 5 new unit tests for image_request counter logic in recordUsage - 2 new unit tests for usage-stats image_request aggregation - tests/real/image-generation.test.ts e2e block now asserts total_image_request_count == +1 after a single successful gen * fix: harden cache-token extraction + count compact-route image_gen attempts Self-review followups on PR #423: - Anthropic message_delta: take Math.max(start, delta) for cache_read_input_tokens so a future API change emitting 0 in delta can't clobber a real hit reported in message_start. Adds defensive unit test covering the regression. - /v1/responses/compact (handleCompact): detect image_generation tool at request time and synthesize a `image_request_attempted=true, image_request_succeeded=false` usage on every release site. Compact doesn't surface tool_usage.image_gen, so any image_generation tool forwarded here is always classified as failed — at least the dashboard now catches accidental misuse rather than silently dropping the signal. --------- Co-authored-by: icebear0828 <icebear0828@users.noreply.github.com>

icebear0828 added 2 commits April 27, 2026 11:57

icebear0828 merged commit 14b9fe4 into dev Apr 27, 2026
1 check passed

icebear0828 deleted the feat/image-gen-token-counter-and-stress-test branch April 27, 2026 19:24

icebear0828 mentioned this pull request Apr 27, 2026

fix: cache hit rate stuck at 0% + image_generation request counters + auto-precision display #423

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: track image_generation tokens separately + real-upstream stress test#422

feat: track image_generation tokens separately + real-upstream stress test#422
icebear0828 merged 2 commits intodevfrom
feat/image-gen-token-counter-and-stress-test

icebear0828 commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icebear0828 commented Apr 27, 2026

Summary

改动

解析与类型

累加与持久化

前端

客户端协议

文档与测试

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant