Skip to content

feat(dashboard): range hit-rate card + chart panel + 5min granularity, proxy diag logs#439

Merged
icebear0828 merged 2 commits intodevfrom
feat/dashboard-hit-rate-and-cache-diagnostics
May 4, 2026
Merged

feat(dashboard): range hit-rate card + chart panel + 5min granularity, proxy diag logs#439
icebear0828 merged 2 commits intodevfrom
feat/dashboard-hit-rate-and-cache-diagnostics

Conversation

@icebear0828
Copy link
Copy Markdown
Owner

Summary

  • Range Hit Rate 卡片:在 Dashboard 用量页加一张卡,基于当前选中的时间窗口聚合 cached_tokens / input_tokens,与原本的全局累计「Cache Hit Rate」并列显示,方便对比近窗口与历史命中率。
  • Hit Rate Over Time 面板:UsageChart 新增第三块独立面板渲染命中率折线 + 每个 bucket 的 dot 标记,hover 可见 cached / inputinput=0 的 bucket 跳过(不渲染 0% 假命中),单数据点也能可见。
  • 5 分钟粒度 + 更短时间窗:后端新增 five_min granularity(与默认 5min snapshot 对齐),前端新增「5 min」选项与「Last 1h / 6h」时间窗,切粒度时自动收敛到兼容窗口。
  • proxy-handler 诊断日志:入口行加 rid / conv / key / prev=<src>:<tail8> / tools=N,Usage 行加 ridhit=X.X%,用于排查 prompt-cache 命中率偏低(同一会话是否落到同一 cache key、是否走到 implicit resume 等)。
  • 共享纯函数 + 单测formatHitRate / sumWindow / formatUsageNumber 抽到 shared/utils/usage-stats.ts,配 vitest 单测覆盖边界(input=0 → "—"、<0.01% 截断、windowed 求和、空数组)。

Test Plan

  • npm test — 1663 passed (+1 skipped)
  • npx tsc --noEmit — clean
  • npm run build — vite + tsc OK
  • 浏览器手测:dashboard 显示新卡片与第三块面板,5 min / 1h / 6h 选项可切,proxy 日志可见 rid/conv/key/prev/hit 字段

Notes

  • 不动既有 hourly/daily 的桶大小、不破坏 cacheHitRate 全局卡片,老调用方零变更。
  • web/src/components/UsageChart.tsxformatNumber 重新导出指向 formatUsageNumber,对外签名兼容。
  • Diff 不含已存在的 tests/unit/auth/usage-stats.test.ts —— 未涉及 store 行为变化,只在 routes 层加了 five_min 用例。

…, proxy diag logs

- UsageStats 新增「时段命中率」卡片(按当前时间窗口聚合 cached/input)
- UsageChart 新增「Hit Rate Over Time」面板:折线 + 数据点 dot,input=0 自动跳过
- 后端 + 前端支持 `five_min` granularity(5 分钟一桶,配合默认 5min snapshot)
- 新增 Last 1h / Last 6h 时间窗,granularity 切换时自动收敛兼容窗口
- shared/utils/usage-stats.ts 抽取 formatHitRate / sumWindow / formatUsageNumber,
  补单测覆盖边界(input=0、<0.01% 截断、windowed 求和)
- proxy-handler 入口/Usage 日志补诊断字段:rid / conv / key / prev=<src>:tail8 /
  tools=N / hit=X.X%,便于排查 prompt-cache 命中率为何偏低
…e routing

- 上游请求新增 `x-codex-installation-id` header + body 内 `client_metadata`
  对齐真实 codex CLI(core/src/client.rs:874)。优先复用 `~/.codex/installation_id`,
  否则在 `data/installation_id` 持久化生成的 UUID。HTTP / WS / compact 三条路径都带。
- proxy-handler 入口日志加 `resume=on|off:<reason>` 字段,区分 implicit resume
  实际激活 vs 因 instr_diff / acct_mismatch / missing_tool_calls 等被否决的情况。
  `evaluateImplicitResume()` 替代 `shouldActivateImplicitResume()` 作为内部判定,
  保留旧 API 作为薄包装。
- 实测:installation_id 单独不能修复缓存命中率抖动(见后续 WS 池修复 PR)。
@icebear0828 icebear0828 merged commit f62a4ac into dev May 4, 2026
1 check passed
@icebear0828 icebear0828 deleted the feat/dashboard-hit-rate-and-cache-diagnostics branch May 4, 2026 02:05
icebear0828 added a commit that referenced this pull request May 5, 2026
The soak check measures `now - dev_HEAD_timestamp >= 24h`, which means
every new merge into dev resets the clock. Under any non-trivial merge
cadence, dev never satisfies the soak gate and master stagnates: PRs
#437/#438/#439/#440/#442 all stacked on dev for a week with no
promotion.

Add a `force_skip_soak` boolean input to workflow_dispatch (default
false). Schedule cron remains untouched and continues to enforce the
24h rule. Only manual triggers can bypass, and only when the operator
explicitly sets the input to true — intended for sync-back / merge
commits whose content has actually been on dev long enough but whose
HEAD timestamp is misleadingly fresh.

Test plan: yaml syntax verified via js-yaml. Functional verification
will be the next manual workflow_dispatch run with the input set.

Co-authored-by: icebear0828 <icebear0828@users.noreply.github.com>
icebear0828 added a commit that referenced this pull request May 5, 2026
…, proxy diag logs (#439)

* feat(dashboard): range hit-rate card + chart panel + 5min granularity, proxy diag logs

- UsageStats 新增「时段命中率」卡片(按当前时间窗口聚合 cached/input)
- UsageChart 新增「Hit Rate Over Time」面板:折线 + 数据点 dot,input=0 自动跳过
- 后端 + 前端支持 `five_min` granularity(5 分钟一桶,配合默认 5min snapshot)
- 新增 Last 1h / Last 6h 时间窗,granularity 切换时自动收敛兼容窗口
- shared/utils/usage-stats.ts 抽取 formatHitRate / sumWindow / formatUsageNumber,
  补单测覆盖边界(input=0、<0.01% 截断、windowed 求和)
- proxy-handler 入口/Usage 日志补诊断字段:rid / conv / key / prev=<src>:tail8 /
  tools=N / hit=X.X%,便于排查 prompt-cache 命中率为何偏低

* fix(proxy): send x-codex-installation-id + log resume reason for cache routing

- 上游请求新增 `x-codex-installation-id` header + body 内 `client_metadata`
  对齐真实 codex CLI(core/src/client.rs:874)。优先复用 `~/.codex/installation_id`,
  否则在 `data/installation_id` 持久化生成的 UUID。HTTP / WS / compact 三条路径都带。
- proxy-handler 入口日志加 `resume=on|off:<reason>` 字段,区分 implicit resume
  实际激活 vs 因 instr_diff / acct_mismatch / missing_tool_calls 等被否决的情况。
  `evaluateImplicitResume()` 替代 `shouldActivateImplicitResume()` 作为内部判定,
  保留旧 API 作为薄包装。
- 实测:installation_id 单独不能修复缓存命中率抖动(见后续 WS 池修复 PR)。

---------

Co-authored-by: icebear0828 <icebear0828@users.noreply.github.com>
icebear0828 added a commit that referenced this pull request May 5, 2026
The soak check measures `now - dev_HEAD_timestamp >= 24h`, which means
every new merge into dev resets the clock. Under any non-trivial merge
cadence, dev never satisfies the soak gate and master stagnates: PRs
#437/#438/#439/#440/#442 all stacked on dev for a week with no
promotion.

Add a `force_skip_soak` boolean input to workflow_dispatch (default
false). Schedule cron remains untouched and continues to enforce the
24h rule. Only manual triggers can bypass, and only when the operator
explicitly sets the input to true — intended for sync-back / merge
commits whose content has actually been on dev long enough but whose
HEAD timestamp is misleadingly fresh.

Test plan: yaml syntax verified via js-yaml. Functional verification
will be the next manual workflow_dispatch run with the input set.

Co-authored-by: icebear0828 <icebear0828@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant