Skip to content

feat: track image_generation tokens separately + real-upstream stress test#422

Merged
icebear0828 merged 2 commits intodevfrom
feat/image-gen-token-counter-and-stress-test
Apr 27, 2026
Merged

feat: track image_generation tokens separately + real-upstream stress test#422
icebear0828 merged 2 commits intodevfrom
feat/image-gen-token-counter-and-stress-test

Conversation

@icebear0828
Copy link
Copy Markdown
Owner

Summary

  • 上游 response.completed.response.tool_usage.image_gen.{input_tokens, output_tokens} 此前一直被 proxy 丢弃。本 PR 全链路接入图像 token 计数(parser → AccountUsage → 持久化 → admin API → 仪表盘),与现有 host-model token 完全分离,不污染原有图表
  • 新增真实上游压力测试矩阵 {gpt-5.4-mini, gpt-5.5} × {1024×1024, 3840×2160},2 并发 × 2 轮 = 16 张图,断言 SSE 完整事件链 + base64 长度阈值 + image_gen 输出 token > 0,末尾校验 /admin/usage-stats/summary 的 image token 增量
  • 配套 bench 脚本 tests/bench/image-gen-bench.ts 输出 p50/p95/min/max + 主模型/图像 token 均值的 markdown 表

改动

解析与类型

  • src/types/codex-events.tsparseResponseData 提取 tool_usage.image_gen,CodexResponseData.usageimage_input_tokens / image_output_tokens
  • src/translation/codex-event-extractor.tsUsageInfo 同步
  • src/routes/responses.ts — 新增 extractImageGenUsage,/v1/responses 路由的 stream / collect 两条路径合并 host + image usage

累加与持久化

  • src/auth/types.tsAccountUsage 加图像 token 字段(总 + window)
  • src/auth/account-registry.ts / account-lifecycle.ts / account-pool.tsrecordUsage / release 入参扩展
  • src/auth/usage-stats.tsUsageSnapshot/Baseline/DataPoint/Summary 全部加图像维度;poolTotals / recoverBaseline / recordSnapshot 的 baseline drop-detection / bucketize 同步;usage-history.json 缺新字段读为 0,向后兼容
  • src/routes/shared/proxy-handler.ts — 流式 / 收集两条路径的 usage 类型扩展;请求结束日志多打 image=<in>/<out>

前端

  • shared/hooks/use-usage-stats.ts + shared/types.ts — TS 镜像同步
  • shared/i18n/translations.ts — 新增 imageTokens / imageTokensHint / windowImageTokens(中英)
  • web/src/pages/UsageStats.tsx — Summary grid 5 列改 6 列,新增 "Image Tokens (in/out)" 卡片
  • web/src/components/AccountCard.tsx — 该账号有图像消费时多显示窗口图像 token 行 + 累计行追加 · N img

客户端协议

  • 不动:/v1/responses 本身就透传 SSE,客户端已能直接看到 tool_usage.image_gen;OpenAI/Anthropic/Gemini 翻译层无对应位,保持原状

文档与测试

  • API.md / API_CN.md — image_generation 段落补图像 token 计费说明
  • CHANGELOG.md — Added 两条
  • tests/real/image-generation.test.ts — 4 组合 × 2 并发 × 2 轮矩阵 + summary 增量端到端校验
  • tests/bench/image-gen-bench.ts — 矩阵 markdown 表输出
  • tests/unit/auth/usage-stats.test.ts — 老断言补 image 维度

Test plan

  • npx tsc --noEmit 0 错
  • npm test — 1618 passed / 1 skipped
  • npm run dev 起本地代理 → npm run test:real -- image-generation 16 张图全过 + summary delta 验证通过
    • gpt-5.4-mini × 1024×1024: 4/4 ok, 21459/22341/24563 ms (min/avg/max), image tokens avg in/out = 31/196
    • gpt-5.4-mini × 3840×2160: 4/4 ok, 33006/36742/45462 ms, 32/371
    • gpt-5.5 × 1024×1024: 4/4 ok, 22643/27368/30208 ms, 31/196
    • gpt-5.5 × 3840×2160: 4/4 ok, 32781/45610/53062 ms, 31/1112
  • npx tsx tests/bench/image-gen-bench.ts 2 1 正常输出延迟 + token 表格
  • 直接 curl /admin/usage-stats/summary 看到 total_image_input_tokens / total_image_output_tokens 非零且与 host-model token 字段并列

… test

Upstream's response.completed carries `tool_usage.image_gen.{input_tokens,
output_tokens}` distinct from host-model `usage`, but the proxy was dropping
it on the floor. Plumb it through the full pipeline so image-gen cost no
longer pollutes host-model token charts:

- parse `tool_usage.image_gen` in CodexResponseData / UsageInfo and the
  /v1/responses route's extractResponseUsage path
- accumulate `image_input_tokens` / `image_output_tokens` (lifetime + window)
  on AccountUsage; surface in /admin/usage-stats/summary as
  `total_image_input_tokens` / `total_image_output_tokens` and persist in
  UsageSnapshot/Baseline (old usage-history.json reads as 0, forward-compat)
- Dashboard: new "Image Tokens (in/out)" Summary card; AccountCard adds a
  window-image-tokens row when the account has image consumption

Plus a real-upstream stress test for image generation:
- tests/real/image-generation.test.ts — matrix of {gpt-5.4-mini, gpt-5.5} ×
  {1024x1024, 3840x2160}, 2 concurrent × 2 rounds per combo, asserts SSE
  lifecycle + image base64 size + tool_usage.image_gen.output_tokens > 0,
  ends with /admin/usage-stats/summary delta check
- tests/bench/image-gen-bench.ts — same matrix, prints p50/p95/min/max +
  token averages for human comparison
Self-review followups on PR #422:

- Add unit tests for the two new parser paths (parseResponseData's
  tool_usage.image_gen branch in codex-events.ts, and the new
  extractImageGenUsage helper in routes/responses.ts) — covers happy path,
  zero-counts, missing tool_usage, missing image_gen sub-block, non-numeric
  values, and the partial-counts case
- Real-test correctness: replace the size-based heuristic with a file-magic
  check (PNG / JPEG / WebP / GIF) so an account that silently downgrades to
  an SVG-text fallback is caught regardless of payload size; relax the size
  threshold to a more conservative floor (50 KB / 500 KB) since flat-color
  PNGs can compress smaller than the prior 200 KB / 1.5 MB assumed
@icebear0828 icebear0828 merged commit 14b9fe4 into dev Apr 27, 2026
1 check passed
@icebear0828 icebear0828 deleted the feat/image-gen-token-counter-and-stress-test branch April 27, 2026 19:24
icebear0828 added a commit that referenced this pull request Apr 27, 2026
… auto-precision display (#423)

* fix: cache_tokens extraction in non-codex upstreams + image_generation request counters

Three orthogonal dashboard accuracy issues, fixed together:

1) **Cache hit rate stuck at 0%** — OpenAI / Anthropic / Gemini upstream
   adapters were synthesizing `response.completed` with hardcoded
   `input_tokens_details: {}`, dropping the cache hit info that the native
   APIs do return. Now extract `prompt_tokens_details.cached_tokens`
   (OpenAI), `cache_read_input_tokens` (Anthropic message_start +
   message_delta), and `cachedContentTokenCount` (Gemini explicit caches),
   and surface them under the standard Codex shape so the existing parsers
   pick them up.

2) **No image generation request counter** — PR #422 added image token
   tracking but no count of image requests (success vs failure). Now
   detect `tools[].type === "image_generation"` at request parse time,
   propagate `expectsImageGen` through proxy-handler, and on every
   release call site (success / EmptyResponse / upstream errors)
   classify as success (image_output_tokens > 0) or failed. Adds:
   - AccountUsage: image_request_count, image_request_failed_count
     (+ window mirrors)
   - UsageSnapshot/Baseline/DataPoint/Summary: same
   - /admin/usage-stats/summary: total_image_request_count,
     total_image_request_failed_count
   - Dashboard "Image Requests" card showing N ok · M failed
   - AccountCard window image requests row when activity present

3) **Hit rate display rounded sub-0.05% values to 0.0%** — formatHitRate
   now uses auto precision: ≥1% one decimal, 0.01-1% two decimals,
   >0 but <0.01% shows "<0.01%", =0 shows "0%".

Backward compat: old usage-history.json snapshots without the new
image_request fields read as 0 via `?? 0`. Tests:
- 7 new unit tests for upstream cache extraction (OpenAI/Anthropic/Gemini)
- 5 new unit tests for image_request counter logic in recordUsage
- 2 new unit tests for usage-stats image_request aggregation
- tests/real/image-generation.test.ts e2e block now asserts
  total_image_request_count == +1 after a single successful gen

* fix: harden cache-token extraction + count compact-route image_gen attempts

Self-review followups on PR #423:

- Anthropic message_delta: take Math.max(start, delta) for
  cache_read_input_tokens so a future API change emitting 0 in delta
  can't clobber a real hit reported in message_start. Adds defensive
  unit test covering the regression.
- /v1/responses/compact (handleCompact): detect image_generation tool
  at request time and synthesize a `image_request_attempted=true,
  image_request_succeeded=false` usage on every release site. Compact
  doesn't surface tool_usage.image_gen, so any image_generation tool
  forwarded here is always classified as failed — at least the
  dashboard now catches accidental misuse rather than silently dropping
  the signal.

---------

Co-authored-by: icebear0828 <icebear0828@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant