Skip to content

fix(stats): dedup usage records by sessionId and skip in-progress writes#4995

Merged
wenshao merged 2 commits into
QwenLM:mainfrom
BenGuanRan:fix/stats-session-double-count
Jun 12, 2026
Merged

fix(stats): dedup usage records by sessionId and skip in-progress writes#4995
wenshao merged 2 commits into
QwenLM:mainfrom
BenGuanRan:fix/stats-session-double-count

Conversation

@BenGuanRan

Copy link
Copy Markdown
Contributor

Summary

Closes #4994.

Opening /stats during the first-ever turn followed by /clear (or process exit) used to write the same sessionId twice into ~/.qwen/usage_record.jsonl, permanently inflating every /stats aggregate (sessions / tokens / durations / tool counts / heatmap / projects) for that session. The mechanism, deterministic repro, and code-level walkthrough are in #4994 (the bug is real on main @ ac040d0a6 — repro script in that issue).

This is also the same aggregateUsage-dedup Critical raised by @wenshao in the review of #4779.

Fix — defense in depth

  1. Read side (usageHistoryService.tsloadUsageHistory): records are deduped by sessionId with last-wins before being returned. Users whose usage_record.jsonl already contains duplicates from this bug stop seeing inflated aggregates immediately, without any file rewrite.
  2. Write side (usageHistoryService.tsrebuildFromSessionJsonl(currentSessionId?)): when called with the in-progress session's id, that session is skipped during the file write. persistSessionUsage() will write its authoritative record at /clear / exit, so the in-progress snapshot would only ever be a transient duplicate. statsDataService.ts threads currentSession?.sessionId through.

Both fixes are needed: read-side alone leaves the file growing duplicates forever; write-side alone doesn't help users whose files are already corrupted by the bug.

Test plan

Notes

  • loadUsageHistory() is now loadUsageHistory(currentSessionId?: string). The parameter is optional and defaults to undefined, so the only behavior change for callers that don't pass it is the dedup pass (which is what we want).
  • The read-side dedup also covers the aggregateUsage Critical raised on feat(stats): add interactive /stats dashboard with cross-session tracking #4779 by @wenshao without requiring any change to aggregateUsage itself.
中文小结

修复:PR #4779 引入的 /stats 会话双计 bug(issue #4994)。在首次 turn 期间打开过一次 /stats/clear/退出,该会话会被永久双计。

两侧防御

  1. 读侧 —— loadUsageHistorysessionId 做 last-wins 去重,立即修复磁盘已脏的存量用户。
  2. 写侧 —— rebuildFromSessionJsonl(currentSessionId?) 跳过进行中的会话,避免新增重复;statsDataService 把当前 sessionId 透传过来。

覆盖:新增 3 个回归测试还原 issue #4994 复现路径,全部相关测试 45/45 通过,typecheck/lint/format 均干净,standalone repro 脚本验证从 🐛 sessions=2, tokens=3200 变为 ✅ no double count

…tes (QwenLM#4994)

Opening /stats during the first-ever turn followed by /clear (or exit) used
to write the same sessionId twice into ~/.qwen/usage_record.jsonl, permanently
inflating every aggregate (sessions / tokens / durations / tools / heatmap /
projects) 2x for that session. Closes QwenLM#4994.

Defense in depth:
- Read side: loadUsageHistory dedups records by sessionId (last-wins), so any
  duplicates already on disk from this bug stop inflating future aggregates.
- Write side: rebuildFromSessionJsonl skips the in-progress session when its
  sessionId is passed in; statsDataService threads currentSession.sessionId
  through. New duplicates are no longer created at the source.

Regression coverage in usageHistoryService.test.ts mirrors the exact bug
sequence (open /stats during first turn -> /clear -> re-open /stats) and
asserts sessionCount=1, totalTokens=1600 for a single ~1.6k-token session.
@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@wenshao @DragonnZhang help cr please

}

export async function loadUsageHistory(): Promise<UsageSummaryRecord[]> {
function dedupBySessionId(records: UsageSummaryRecord[]): UsageSummaryRecord[] {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] dedupBySessionId silently discards duplicate sessionId records with no diagnostic trace. When duplicates are present (evidence of the #4994 bug having affected a user's data), there is no way to measure how many users were affected or detect recurrence.

Suggested change
function dedupBySessionId(records: UsageSummaryRecord[]): UsageSummaryRecord[] {
function dedupBySessionId(records: UsageSummaryRecord[]): UsageSummaryRecord[] {
const map = new Map<string, UsageSummaryRecord>();
for (const r of records) map.set(r.sessionId, r);
if (map.size < records.length) {
debugLogger.debug(
`dedupBySessionId: removed ${records.length - map.size} duplicate record(s)`,
);
}
return [...map.values()];
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted in a7c7206 — added a debugLogger.debug line that fires only when duplicates were actually removed:

if (map.size < records.length) {
  debugLogger.debug(
    `dedupBySessionId: removed ${records.length - map.size} duplicate record(s)`,
  );
}

This gives us a way to measure #4994's blast radius on real user data without changing aggregate behavior. Thanks for the catch.

return [...map.values()];
}

export async function loadUsageHistory(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The new currentSessionId parameter in loadUsageHistory is silently ignored on the hot read path — it only takes effect in the rebuildFromSessionJsonl fallback (when usage_record.jsonl doesn't exist or has no v1 records). On the main path, the parameter is passed but the current session is still returned. loadStatsData handles its own filtering (line 251), so correctness is fine, but the API is misleading: a future caller passing currentSessionId would reasonably expect the session to be excluded from results regardless of file state.

Consider either: (a) applying the skip consistently on both paths inside loadUsageHistory and removing the redundant filter from loadStatsData, or (b) renaming the parameter to clarify its limited scope (e.g., skipSessionInRebuild).

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — the parameter was misleading. Went with renaming over consolidating, in commit a7c7206:

  • currentSessionIdskipSessionInRebuild in both loadUsageHistory and rebuildFromSessionJsonl
  • Zero behavior change, zero test-assertion change

I considered option (a) — apply the skip on both paths and drop the filter in loadStatsData — but kept (b) because:

  1. loadStatsData does defense-in-depth: it strips the persisted current-session record and pushes the live in-memory currentSession (with up-to-date metrics) back. Even if loadUsageHistory filtered the current session out on the read path, the caller would still need to push the live record. So option (a) wouldn't actually let us delete the filter, only relocate it.
  2. The existing read-side dedupBySessionId already protects against a stale persisted current-session record being double-counted (last-wins). So the file-path skip in option (a) would be redundant with dedup.
  3. Option (b) is a pure rename with no test-assertion diff and no public-API surface change.

Let me know if you'd still prefer (a) — happy to follow up.

…ip param

Why:
- dedupBySessionId silently dropped duplicate records, making it
  impossible to observe how many users were affected by the QwenLM#4994 bug.
  Now logs the removed count at debug level.
- The new loadUsageHistory parameter was named currentSessionId but only
  controls the write-side skip during rebuildFromSessionJsonl — the read
  path ignores it. Renaming to skipSessionInRebuild makes the limited
  scope explicit; statsDataService still strips/re-pushes the live
  current session for defense-in-depth.

PR review feedback on QwenLM#4995.
@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Real-Run Verification Report (PR #4995, head a7c7206)

Verified on macOS (Darwin 25.5.0, Node v22.22.2) at three levels: the standalone repro from issue #4994 against built core dists (before/after, three caller modes), a real-TUI end-to-end in tmux against a deterministic mock model server (fixed usage numbers, fully isolated via QWEN_HOME), and an existing-victim scenario with a genuinely corrupted usage_record.jsonl produced by the live bug. The head includes this morning's a7c7206 (review-response commit); both points from my 2026-06-11 review are implemented exactly as described in the replies — the dedup diagnostic (debugLogger.debug fires only when duplicates were removed) and the currentSessionIdskipSessionInRebuild rename in both signatures.

1. Standalone repro (issue #4994 steps, built packages/core/dist)

Same script, fresh QWEN_HOME per run; plants one in-progress chat JSONL (1600 tokens), then: ① loadUsageHistory (first /stats), ② persistSessionUsage same id (/clear/exit), ③ loadUsageHistory + aggregateUsage('all').

Build Caller File after ① File after ② ③ Aggregate
origin/main @ 9b4ba60e7 legacy (no param) 1 line 2 lines, same sessionId 🐛 sessions=2, tokens=3200
PR new statsDataService (passes session id) 0 lines (write-side skip) 1 line ✅ sessions=1, tokens=1600
PR legacy (no param) 1 line 2 lines (dup written) ✅ sessions=1, tokens=1600 (read-side last-wins dedup)

The bug reproduces on main exactly as #4994 documents, and each defense layer holds independently: the write-side skip keeps the file clean for the new caller; the read-side dedup still yields correct aggregates even when a legacy caller lets the duplicate reach the file.

2. Real TUI E2E (tmux, mock model, fresh QWEN_HOME, identical flow per build)

Flow: one chat turn → /stats (first open → rebuild path) → Esc → /clear/stats → Activity tab.

Checkpoint before (origin/main) after (PR)
usage_record.jsonl after first /stats 1 line (in-progress session written by rebuild) 0 lines
after /clear 2 lines, byte-identical sessionId 1 line
/stats → Activity tab Sessions 2 … Tokens 6.4k Sessions 1 … Tokens 3.2k

Same prompt, same mock usage — the dashboard itself shows the 2× inflation on main and the correct numbers on the PR build. (Per-session totals are 3.2k rather than 1.6k because each turn issues two completions against the mock — the chat response plus the auto-title call — each reporting 1600; the before/after relationship is the controlled variable.)

3. Existing-victim scenario (read-side fix for already-corrupted files)

Planted the real corrupted file from the before run (2 duplicate lines) into a fresh QWEN_HOME and opened /stats on the PR build: Activity shows Sessions 1 / Tokens 3.2k immediately, and the file is left untouched (still 2 lines — no rewrite), matching the PR's stated behavior for affected users.

Static checks

Notes (non-blocking)

  • In rebuildFromSessionJsonl the skip applies only to the file write; the in-progress record is still returned to the caller, and loadStatsData continues to filter the live session for display before re-adding its live snapshot — the layering is consistent.
  • The read-side dedup also de-duplicates inside the rebuild path itself (dedupBySessionId wraps both returns), so even pathological chat-dir layouts that produce same-id records across project dirs aggregate correctly.

Verdict

Verified — recommend merge. The main bug reproduces deterministically and is fixed at both layers; the real dashboard goes from Sessions 2 / 6.4k to Sessions 1 / 3.2k on an identical flow; already-corrupted user files aggregate correctly with no rewrite; both review points from 06-11 are addressed on the head; tests, lint, typecheck, and full CI are green. No blockers from my side.

Repro
gh pr checkout 4995 && npm install && npm run build
git worktree add /tmp/before origin/main --detach   # + npm install && npm run build

# standalone (issue #4994 steps): QWEN_HOME=<fresh> node repro.mjs <core-pkg> <main|skip|noskip>
#   plants one in-progress chat JSONL (qwen-code.api_response uiEvent, 1600 tokens),
#   then loadUsageHistory -> persistSessionUsage(same id) -> loadUsageHistory+aggregateUsage

# TUI E2E (tmux): mock OpenAI server with fixed usage {1000,600,1600};
#   QWEN_HOME=<fresh> OPENAI_API_KEY/BASE_URL/MODEL -> one turn -> /stats -> /clear -> /stats(Tab)
#   inspect $QWEN_HOME/usage_record.jsonl line count + sessionIds at each step

# victim: cp the before-run's corrupted usage_record.jsonl into a fresh QWEN_HOME,
#   open /stats on the PR build -> Sessions 1, file still 2 lines
中文版(完整翻译)

真实运行验证报告(PR #4995,head a7c7206

在 macOS(Darwin 25.5.0,Node v22.22.2)上做了三层验证:针对构建产物 core dist 的 issue #4994 standalone 复现(before/after、三种调用方式)、tmux 中对确定性 mock 模型服务器的真实 TUI 端到端(固定 usage 数字,QWEN_HOME 完全隔离)、以及用真实 bug 产生的脏 usage_record.jsonl 做的存量受害用户场景。head 包含今早的 a7c7206(评审回应提交);我 2026-06-11 review 的两点都按回复所述落实——去重诊断日志(仅在确实移除重复时输出 debugLogger.debug)和两处签名的 currentSessionIdskipSessionInRebuild 改名。

1. Standalone 复现(issue #4994 步骤,构建产物 packages/core/dist

同一脚本、每轮全新 QWEN_HOME;植入一个进行中的 chat JSONL(1600 token),然后:① loadUsageHistory(首次 /stats)、② 同一 id 的 persistSessionUsage/clear/退出)、③ loadUsageHistory + aggregateUsage('all')

构建 调用方式 ① 后文件 ② 后文件 ③ 聚合
origin/main @ 9b4ba60e7 旧调用(无参数) 1 行 2 行、同一 sessionId 🐛 sessions=2, tokens=3200
PR statsDataService(传入 session id) 0 行(写侧跳过) 1 行 ✅ sessions=1, tokens=1600
PR 旧调用(无参数) 1 行 2 行(写入了重复) ✅ sessions=1, tokens=1600(读侧 last-wins 去重

bug 在 main 上按 #4994 的描述确定性复现;两层防御各自独立成立:写侧跳过让新调用方的文件保持干净;读侧去重保证即使旧调用方让重复进了文件,聚合仍然正确。

2. 真实 TUI 端到端(tmux、mock 模型、全新 QWEN_HOME、两构建同一流程)

流程:一轮对话 → /stats(首次打开 → rebuild 路径)→ Esc → /clear/stats → Activity 标签页。

检查点 beforeorigin/main after(PR)
首次 /stats 后的 usage_record.jsonl 1 行(rebuild 写入进行中会话) 0 行
/clear 2 行、sessionId 完全相同 1 行
/stats → Activity 页 Sessions 2 … Tokens 6.4k Sessions 1 … Tokens 3.2k

同样的 prompt、同样的 mock usage——main 上仪表盘本身显示 2× 膨胀,PR 构建数字正确。(单会话总量是 3.2k 而非 1.6k,是因为每轮对 mock 发起两次补全——对话响应 + 自动标题调用——各报 1600;before/after 的相对关系才是受控变量。)

3. 存量受害用户场景(读侧修复已脏文件)

before 运行产生的真实脏文件(2 行重复)植入全新 QWEN_HOME,用 PR 构建打开 /stats:Activity 立即显示 Sessions 1 / Tokens 3.2k,且文件保持原样(仍 2 行——不重写),与 PR 对受影响用户的声明一致。

静态检查

备注(非阻塞)

  • rebuildFromSessionJsonl 的跳过只作用于文件写入;进行中的记录仍会返回给调用方,loadStatsData 继续在展示前过滤活动会话并单独加入其实时快照——分层自洽。
  • 读侧去重同样包住了 rebuild 路径的返回值(dedupBySessionId 包裹两个 return),即使病态的 chats 目录布局在不同项目目录下产生同 id 记录,聚合也正确。

结论

验证通过——建议合并。 main 上的 bug 可确定性复现且两层修复都有效;同一流程下真实仪表盘从 Sessions 2 / 6.4k 变为 Sessions 1 / 3.2k;已脏的用户文件无需重写即可正确聚合;我 06-11 的两条评审意见已在 head 上落实;测试、lint、typecheck 与完整 CI 全绿。我这边没有阻塞项。

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@wenshao @DragonnZhang request re-review please

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical-only review: no critical issues found. The session usage dedup/write-skip fix is scoped to stats history handling, prior review feedback was addressed on a7c7206, and CI is green across lint, CodeQL, and all OS test jobs.

@wenshao wenshao merged commit e07d069 into QwenLM:main Jun 12, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/stats permanently double-counts a session if /stats is opened during the first-ever turn (introduced by #4779)

3 participants