feat(stats): add interactive /stats dashboard with cross-session tracking by BenGuanRan · Pull Request #4779 · QwenLM/qwen-code

BenGuanRan · 2026-06-04T12:36:39Z

What this PR does

Adds an interactive /stats dashboard that tracks and visualizes usage across all CLI sessions. The dashboard has three tabs: Session (live current-session metrics), Activity (usage trends and patterns over time), and Efficiency (performance metrics and tool analysis). Usage data is automatically persisted to ~/.qwen/usage_record.jsonl at session exit and on /clear, with a fallback rebuild from chat history if the file is missing.

Why it's needed

Previously /stats only showed a flat text dump of the current session's token counts. Users had no way to see cross-session trends, identify which tools or models they use most, track their coding streaks, or compare performance across time periods. This dashboard gives users actionable visibility into their usage patterns — cache hit rates, tool success rates, latency trends, and project-level breakdowns — all without leaving the terminal.

Reviewer Test Plan

How to verify

Run npm run dev to start the CLI
Type /stats to open the dashboard dialog
Verify three tabs render: Session, Activity, Efficiency
Press Tab / Shift+Tab to switch between tabs
Press r to cycle time ranges (All → Month → Week → Today)
On the Activity tab with range=All, press ← / → to navigate months in the token trend chart
Press Esc to close the dialog
Run /clear then /stats again — previous session should appear in the historical data
Run in headless mode: qwen -p "hello" && qwen -p "/stats" — verify the first session's data is included

Activity tab should show: KPI row with delta arrows, GitHub-style contribution heatmap, braille line chart for token trend, top-5 project ranking table.

Efficiency tab should show: Cache Hit Rate / Tool Success / Avg Latency cards with deltas, tool leaderboard with success rate bars, model comparison table with per-model latency and cache %, code impact line counts.

Evidence (Before & After)

Before

After

Mock

Tested on

OS	Status
🍏 macOS	✅
🪟 Windows	⚠️
🐧 Linux	⚠️

Environment (optional)

npm run dev with local Qwen API key.

Risk & Scope

Main risk or tradeoff: The usage_record.jsonl file grows unboundedly over time. For heavy users this could eventually become large, but at ~200 bytes per session even 10,000 sessions is only ~2MB.
Not validated / out of scope: Cost estimation (requires user-configured pricing), per-file change tracking, context window compression metrics, interactive drill-down into individual sessions.
Breaking changes / migration notes: None. The UsageSummaryRecord schema is backward-compatible — old records without totalLatencyMs display "—" for latency. The /stats command now opens a dialog in interactive mode instead of printing inline stats (use /stats model or /stats tools for inline output).

Linked Issues

Closes #4597

中文说明

这个 PR 做了什么

添加了一个交互式 /stats 仪表盘，用于跟踪和可视化所有 CLI 会话的使用情况。仪表盘包含三个标签页：会话（当前会话实时指标）、概览（跨会话的使用趋势和模式）、性能（性能指标和工具分析）。使用数据在会话退出和 /clear 时自动持久化到 ~/.qwen/usage_record.jsonl，如果文件丢失则从聊天历史自动重建。

为什么需要

之前 /stats 只显示当前会话的纯文本 token 计数。用户无法查看跨会话趋势、识别最常用的工具或模型、跟踪编码连续天数，或比较不同时间段的性能。这个仪表盘为用户提供了可操作的使用洞察——缓存命中率、工具成功率、延迟趋势和项目级分解——全部在终端内完成。

审查测试计划

运行 npm run dev 启动 CLI
输入 /stats 打开仪表盘对话框
验证三个标签页渲染：会话、概览、性能
按 Tab / Shift+Tab 切换标签
按 r 循环时间范围（全部 → 月 → 周 → 今天）
在概览标签的 range=All 下，按 ← / → 浏览趋势图月份
按 Esc 关闭对话框
运行 /clear 后再次 /stats — 之前的会话应出现在历史数据中

风险与范围

主要风险：usage_record.jsonl 文件随时间无限增长，但每个会话约 200 字节，10,000 个会话也只有约 2MB。
不在范围内：成本估算、逐文件变更跟踪、上下文窗口压缩指标、单会话钻取。
破坏性变更：无。UsageSummaryRecord schema 向后兼容。/stats 命令在交互模式下从内联输出改为打开对话框（内联输出可用 /stats model 或 /stats tools）。

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add usageHistoryService to core with JSONL-based persistence, session replay from chat history with sessionId deduplication, time-range aggregation, and per-model/tool/file breakdown including latency fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add statsDataService for delta calculations, efficiency metrics, tool leaderboard, and heatmap/trend data. Add asciiCharts with braille line chart (Bresenham rendering) and GitHub-style contribution heatmap. Includes 38 unit tests covering both modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add three-tab dialog: Session (live metrics), Activity (KPIs, heatmap, braille token trend chart, project ranking), and Efficiency (cache rate, tool success, latency cards, tool leaderboard, model comparison table). Supports tab/shift-tab navigation, r to cycle time ranges (all/month/ week/today), left/right to pan months in the trend chart, esc to close. Persist usage on /clear for accurate cross-session tracking. Update statsCommand tests for new dialog behavior and clearCommand tests for telemetry mock. Update /stats documentation in commands.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add translations for stats dashboard UI strings in zh, zh-TW, ca, de, fr, ja, pt, ru. Add stats keys to en.js baseline and mustTranslateKeys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Intensity 0 cells (no activity) now render without backgroundColor, inheriting the terminal's native background instead of a hardcoded color that renders incorrectly across different terminal themes. Also fix green gradient direction: brighter = more activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Inactive cells render as '··' with no background color instead of colored blocks, matching common contribution graph designs. Active cells keep their green gradient backgrounds. Fix gradient direction so brighter green = more activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add back Session ID, Success Rate with color thresholds, User Agreement rate, Performance breakdown (Wall Time, Agent Active, API Time %, Tool Time %), and full token counts that were present in the original exit screen but missing from the new Session tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix monthOffset overflow: setDate(1) before subtracting months to prevent day-count overflow (e.g. Mar 31 → Feb) - Fix UTC date-parse off-by-one: append 'T00:00:00' to date-only strings in calculateStreaks and HeatmapView fmtDate - Fix current session double-counted after rebuild: deduplicate by sessionId when injecting live session into loadStatsData - Remove unused bodyWidth prop from SessionTab - Remove 13 unused i18n keys (Overview, Favorite model, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wenshao

R2: New findings after R1 fixes verified ✅

All 6 R1 issues have been correctly addressed in commit 7ed5dbe03. This round found 3 new Critical issues (data corruption, crash, memory/UX) and 2 Suggestions.

Needs Human Review (low confidence):

Test coverage gaps: /clear hasActivity branch never exercised (mock always returns empty models), loadStatsData currentSession dedup untested, calculateStreaks has no dedicated edge-case tests, loadUsageHistory version filter + rebuild fallback untested

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T06:37:15Z

+
+        if (!hasEvents) continue;
+
+        const startTime = new Date(firstRecord.timestamp).getTime();


[Critical] NaN timestamp from malformed chat records propagates to usage_record.jsonl as permanent phantom records

If firstRecord.timestamp is missing, empty, or malformed (which occurs naturally via corruption, legacy records, or interrupted writes), new Date(undefined).getTime() returns NaN. This NaN flows into metricsToUsageRecord as timestamp, startTime, and durationMs. JSON.stringify serializes NaN → null, so the record is persisted as "timestamp":null.

On subsequent reads, the null-timestamp record passes the version === 1 filter and enters aggregateUsage. For the 'all' range, null >= 0 evaluates to true in JavaScript, so the phantom record inflates sessionCount. The projectMap accumulates an entry keyed on undefined (from missing cwd), producing a mystery unnamed project in the dashboard.

This corruption persists until the user manually deletes usage_record.jsonl.

Suggested change

const startTime = new Date(firstRecord.timestamp).getTime();

const startTime = new Date(firstRecord.timestamp).getTime();

const lastRecord = records[records.length - 1]!;

const endTime = new Date(lastRecord.timestamp).getTime();

if (isNaN(startTime) || isNaN(endTime) || !sessionId) continue;

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T06:37:15Z

+      config.getProjectRoot(),
+      stats.metrics,
+    );
+    loadStatsData(range, liveRecord).then((d) => {


[Critical] Unhandled promise rejection crashes the TUI

loadStatsData(range, liveRecord).then(...) has no .catch() handler. If loadStatsData throws (e.g., from rebuildFromSessionJsonl encountering an unexpected shape, or aggregateUsage propagating NaN), the rejected promise is unhandled.

There is no React error boundary (componentDidCatch) anywhere in the CLI UI tree — I searched packages/cli/src and found zero matches. An unhandled rejection in Ink/React crashes the entire process. The user types /stats, the TUI dies silently.

Additionally, there's no error state in the UI. If the load fails, the dialog would either show "Loading stats..." forever or render an empty box.

Suggested change

loadStatsData(range, liveRecord).then((d) => {

loadStatsData(range, liveRecord)

.then((d) => {

setData(d);

setLoading(false);

})

.catch((err) => {

console.error('Failed to load stats:', err);

setLoading(false);

});

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T06:37:15Z

+  };
+}
+
+async function rebuildFromSessionJsonl(): Promise<UsageSummaryRecord[]> {


[Critical] rebuildFromSessionJsonl loads entire chat JSONL files into memory and uses synchronous I/O on the main thread

Two compounding issues:

Memory spike: jsonl.read<ChatRecord>(filePath) loads the entire file into memory. Each ChatRecord contains full Content objects (assistant text, tool results with file contents). A single chat JSONL for a session with heavy tool use can exceed 100 MB. The function iterates ALL project directories and ALL chat files.

TUI freeze: fs.readdirSync, fs.existsSync, and writeLineSync (which wraps fs.appendFileSync) are all synchronous. For a user with hundreds of sessions, the rebuild blocks the Node.js event loop for 10+ seconds. The only feedback is "Loading stats..." text. A user may kill the process thinking it's hung, potentially interrupting writeLineSync mid-write and leaving usage_record.jsonl partially written.

Consider streaming records and discarding non-telemetry entries immediately, or capping the backfill to the N most recent files by modification time.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T06:37:15Z

+  const safeWidth = Math.max(72, width ?? 100);
+  const bodyWidth = safeWidth - 6;
+
+  useEffect(() => {


[Suggestion] useEffect has no cancellation/staleness guard + re-reads file on every range change

Two related issues:

Race condition: Pressing r rapidly to cycle ranges fires multiple loadStatsData calls concurrently. A slower earlier request can resolve after a faster later one, briefly displaying data for the wrong range.

Redundant I/O: Each range change triggers loadUsageHistory() which reads and parses the entire usage_record.jsonl from disk. Cycling through 4 ranges = 4 full file reads, even though the underlying data hasn't changed — only the filter window differs.

Suggested change

useEffect(() => {

useEffect(() => {

let stale = false;

setLoading(true);

const liveRecord = buildCurrentSessionRecord(

stats.sessionId,

stats.sessionStartTime,

config.getProjectRoot(),

stats.metrics,

);

loadStatsData(range, liveRecord).then((d) => {

if (!stale) {

setData(d);

setLoading(false);

}

});

return () => { stale = true; };

// eslint-disable-next-line react-hooks/exhaustive-deps -- only reload on range/session change, not every metrics tick

}, [range, stats.sessionId]);

For the redundant I/O, consider caching raw records in a useRef and re-aggregating in memory when only the range changes.

— qwen3.7-max via Qwen Code /review

触发条件极其罕见 — 只在第一次使用/stats指令会基于历史session数据构建 usage_record.jsonl ，只有用户手动删除文件才会 rebuild。

实际数据量有限 — 本地 rebuild 跑过mock测试，71-81 个 session，耗时 < 2 秒。100MB 单文件的 session
是极端边界。

已有 try/catch — 单个文件解析失败不会中断整体 rebuild。

修复成本高 — 流式读取需要改 jsonl.read 工具函数的接口，影响面大。

- Guard against malformed chat records with NaN timestamps in rebuild - Add .catch() to loadStatsData promise to prevent TUI crash - Add stale flag to useEffect to prevent race on rapid range cycling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add 19 missing translations to zh-TW.js, remove extra keys from zh.js and zh-TW.js that were deleted from en.js in prior cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wenshao

A few cross-cutting items that don't map to a single line (all non-blocking):

Concurrent rebuild can duplicate history. loadUsageHistory writes the rebuilt records back to usage_record.jsonl on the read path whenever the file is empty/missing, with no lock. Two CLI instances opening /stats (or exiting) before either has written will each append the full reconstructed history → permanently doubled totals; large records can also tear under concurrent appendFileSync. Consider not persisting on the read path, or writing rebuilt data atomically under a lock.
Silent blank dashboard on load failure. StatsDialog's loadStatsData(...).catch(() => setLoading(false)) swallows the error with no log and leaves data null, so the Activity/Efficiency body renders empty with no message — indistinguishable from "no data" and undiagnosable. Surface an error state and log via the debug logger.
Render-path recomputation. buildBrailleLineChart / buildHeatmapData and the per-day aggregation re-run on every dialog re-render/keypress (no useMemo); memoizing on their inputs removes the redundant work.
Heatmap month coupling. StatsActivityTab passes the token-chart's data-month index (clampedOffset) straight to HeatmapView as a calendar-month offset, and it isn't reset when leaving range='all'; gating it (range === 'all' ? clampedOffset : 0) avoids the heatmap desyncing from the rest of the tab.
CI Lint is red (the check-i18n step). It looks largely pre-existing (most of the 203 unused keys are unrelated auth strings), but the new locale changes add to it — worth running npm run check-i18n locally to confirm.
Committed design docs. .qwen/plans/…md (1337 lines) and .qwen/specs/…md (270 lines) appear force-added past .gitignore (.qwen/*); confirm that's intentional vs. local-only scratch.

— claude-opus-4-8 via Qwen Code /qreview

wenshao · 2026-06-05T07:35:10Z

+                  : theme.status.error
+              }
+            >
+              {data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '}


[Suggestion] Avg Latency delta arrow points the wrong way

The other five KPI deltas render the arrow from the sign of the change (delta >= 0 ? '▲' : '▼'), but Avg Latency uses avgLatency <= 0 ? '▲' : '▼'. Since computeDelta defines avgLatency = current − previous, a negative delta means latency dropped — yet it renders ▲. So a falling latency (good) shows an up-arrow and a rising latency shows a down-arrow, contradicting the number and every other card. The color is already correct; only the glyph is inverted.

Suggested change

{data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '}

{data.delta.avgLatency <= 0 ? '\u25BC' : '\u25B2'}{' '}

Keep the inverted color test (<= 0 ? success : error); just flip the glyph so it tracks direction.

— claude-opus-4-8 via Qwen Code /qreview

wenshao · 2026-06-05T07:35:10Z

+              key={tool.name}
+              cells={[
+                {
+                  text: '  ' + tool.name.slice(0, 14),


[Suggestion] Untrusted model/tool/project names are rendered to the terminal unsanitized

tool.name here (and the model name at line 193, the model name in StatsSessionTab, and the project path in StatsActivityTab) comes verbatim from usage_record.jsonl / the rebuilt chat logs and is rendered via ink <Text>. .slice(0, 14) truncates by code-unit count but does not strip control bytes, so an embedded ESC (\x1b…) or newline in a name survives into the terminal — enabling cursor/ANSI escapes or broken table alignment when /stats is opened on a crafted or corrupted record. Defense-in-depth: strip control chars before rendering (e.g. name.replace(/[\u0000-\u001f\u007f-\u009f]/g, '')) at the point these fields enter the render layer.

— claude-opus-4-8 via Qwen Code /qreview

wenshao · 2026-06-05T07:35:10Z

+}
+
+export function fmtSuccessBar(rate: number): string {
+  const filled = Math.round(rate / 10);


[Suggestion] fmtSuccessBar can throw RangeError and crash the CLI on malformed data

'░'.repeat(10 - filled) throws when filled > 10 (or '█'.repeat(filled) when filled < 0). Normal telemetry keeps success ≤ count so rate ∈ [0,100] and this is safe today — but successRate ultimately comes from usage_record.jsonl, which loadUsageHistory accepts on a version === 1 check alone (no field validation). A single corrupt/hand-edited line with success > count makes rate > 100, and since the dialog has no React error boundary, the throw during render takes down the whole TUI. Cheap guard:

Suggested change

const filled = Math.round(rate / 10);

const filled = Math.max(0, Math.min(10, Math.round(rate / 10)));

— claude-opus-4-8 via Qwen Code /qreview

wenshao · 2026-06-05T07:35:10Z

+  };
+}
+
+describe('aggregateUsage', () => {


[Suggestion] The rebuild / load / streak paths — where the prior review found several bugs — have no test coverage

This suite exercises metricsToUsageRecord and aggregateUsage only. rebuildFromSessionJsonl + loadUsageHistory (the cold-start path) and the streak / currentSession-merge logic in statsDataService are never invoked, so the recently-added guards can regress silently. Worth covering: rebuild dedup via seenSessionIds, the isNaN(timestamp) skip, the hasEvents === false skip, missing/empty projects/ and chats/ dirs → [], the write-back to usage_record.jsonl, loadUsageHistory falling through to rebuild when the file is absent, calculateStreaks (consecutive / gap / last-activity-stale / single-day), and loadStatsData dedup when a persisted record shares the live sessionId.

— claude-opus-4-8 via Qwen Code /qreview

wenshao

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T07:39:27Z

+        }
+      })
+      .catch(() => {
+        if (!stale) setLoading(false);


[Suggestion] .catch() silently swallows all errors — activity/efficiency tabs render a blank content area on failure

The new handler clears loading but sets no error state. The render gate at lines 205 / 211 requires both !loading && data, so when loadStatsData rejects the activity and efficiency tabs render nothing — no error message, no retry affordance. The user sees a seemingly healthy empty box, indistinguishable from "no data in this range." Combined with the silent continues in rebuildFromSessionJsonl and aggregateUsage, a corrupted ~/.qwen/usage_record.jsonl now bricks /stats invisibly.

Suggested change

if (!stale) setLoading(false);

const [error, setError] = useState(false);

// ...

loadStatsData(range, liveRecord)

.then((d) => {

if (!stale) {

setData(d);

setLoading(false);

setError(false);

}

})

.catch((err) => {

console.error('[stats] loadStatsData failed:', err);

if (!stale) {

setError(true);

setLoading(false);

}

});

Then render a fallback when !loading && error && activeTab !== 'session' (e.g., <Text color={theme.text.error}>{t('Failed to load stats. Press r to retry.')}</Text>).

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T07:39:27Z

+    totalDurationMs += r.durationMs;
+    totalLatencyMs += r.totalLatencyMs ?? 0;
+    totalCalls += r.tools.totalCalls;
+    totalSuccess += r.tools.totalSuccess;


[Critical] aggregateUsage unconditionally deep-accesses record fields — a single malformed record in usage_record.jsonl crashes /stats permanently

totalCalls += r.tools.totalCalls (line 338), r.tools.totalSuccess, r.tools.totalFail, r.files.linesAdded, Object.entries(r.models), and Object.entries(r.tools.byName) all throw TypeError if any persisted UsageSummaryRecord is missing one of those fields (truncated write, older schema, concurrent-write race, disk full mid-append). The throw propagates through loadStatsData into the .catch() above, leaving the dashboard blank with no indication of which record is poisoned.

The worst failure mode: one corrupted line in ~/.qwen/usage_record.jsonl bricks /stats permanently for that user. The user can still use the rest of the CLI, so they report "the stats command is buggy" with no useful diagnostic.

Suggested change

totalSuccess += r.tools.totalSuccess;

for (const r of filtered) {

try {

if (!r.tools?.byName || !r.files || !r.models) continue;

totalDurationMs += r.durationMs;

// ... rest of loop body

} catch (e) {

console.warn(`[stats] aggregateUsage: skipping malformed record sessionId=${r.sessionId}:`, e);

continue;

}

}

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T07:39:27Z

+
+  const models: AggregatedReport['models'] = {};
+  let totalCalls = 0;
+  let totalSuccess = 0;


[Suggestion] models: {} is a plain object accumulating model stats from JSON-parsed JSONL — prototype pollution via __proto__ key

const models: AggregatedReport['models'] = {}; is iterated by Object.entries(r.models) where r.models comes from usage_record.jsonl. When a user-edited JSONL contains "__proto__" as a model name, Object.entries returns it as an own property (because JSON.parse creates own __proto__), and then const existing = models[name] resolves models["__proto__"] to Object.prototype (truthy). The merge branch executes existing.requests += m.requests, which writes requests: NaN onto Object.prototype — polluting every new {} in the process for the remainder of the CLI lifetime.

The same file already demonstrates awareness of this pattern: toolCounts correctly uses Map (line 321), and uiTelemetry.ts uses Object.create(null) for user-controlled keys with an explicit comment.

Suggested change

let totalSuccess = 0;

const models: AggregatedReport['models'] = Object.create(null);

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T07:39:27Z

+    let files: string[];
+    try {
+      files = fs.readdirSync(chatsDir).filter((f) => f.endsWith('.jsonl'));
+    } catch {


[Suggestion] rebuildFromSessionJsonl silently swallows every per-file error — "my stats are missing sessions" is undebuggable

Both try { files = fs.readdirSync(chatsDir) ... } catch { continue; } (around line 196) and the inner try { ...jsonl.read... } catch { continue; } (around line 243) discard every error: permission denied, ENOENT mid-run, JSON parse errors, truncated JSONL, concurrent-write locks. The newly added NaN guard (line 232) also silently continues. There is no skippedCount, no logWarn, and the function returns only successes. After a rebuild that skips 50 out of 200 sessions, the cache file is written with 150 records and the next loadUsageHistory returns those 150 with no indication anything was lost.

A user who notices "I had a session on project X yesterday but it's not in /stats" has no diagnostic path — qwen --debug emits nothing, and the only workaround is to manually diff ls ~/.qwen/projects/*/chats/*.jsonl against the contents of usage_record.jsonl.

Suggested change

} catch {

} catch (e) {

console.warn(`[stats] rebuildFromSessionJsonl: skipping ${filePath}:`, e);

continue;

}

Consider also returning { results, skippedCount } or surfacing a "N sessions could not be parsed" count in the /stats footer so the user has a breadcrumb.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T07:39:27Z

+  }
+
+  // Shift endDate back by monthOffset months
+  if (monthOffset > 0) {


[Suggestion] buildHeatmapData monthOffset edge case (month-end into short month) has no test coverage

The three-step fix — setDate(1) → setMonth(month - offset) → setMonth(month+1, 0) — was specifically motivated by the March 31 → February overflow bug flagged in R1. But the existing test 'supports monthOffset to shift view backward' in asciiCharts.test.ts uses vi.setSystemTime(new Date('2025-06-04T12:00:00Z')), where June 4 is nowhere near a month end, so it never exercises the overflow path. A regression to the naive single-setMonth approach would still pass this test.

If someone simplifies the three-step pattern back to a single setMonth, the heatmap would silently show the wrong month range when the user scrolls back from March, May, July, August, October, or December (all months with 31 days preceding a shorter month).

Suggested test:

vi.setSystemTime(new Date('2025-03-31T12:00:00Z')); const result = buildHeatmapData(records, 'all', 1, 10); // Assert endDate === new Date('2025-02-28') (Feb 28, not Mar 3 overflow) vi.setSystemTime(new Date('2025-01-31T12:00:00Z')); // Assert endDate === new Date('2024-12-31') (prior year wrap)

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T08:30:56Z

Verification Report

Branch: feat/stats-dashboard
Base: main
Environment: macOS Darwin 25.4.0, Node.js v22.17.0

Test Results

Check	Command	Result
usageHistoryService Tests	`vitest run usageHistoryService.test.ts`	✅ 10 tests passed (4ms)
asciiCharts Tests	`vitest run asciiCharts.test.ts`	✅ 21 tests passed (18ms)
statsDataService Tests	`vitest run statsDataService.test.ts`	✅ 17 tests passed (6ms)
statsCommand Tests	`vitest run statsCommand.test.ts`	✅ 13 tests passed (11ms)
clearCommand Tests	`vitest run clearCommand.test.ts`	✅ 13 tests passed (18ms)
Total	5 test files	74 tests passed
Core Type Check	`npm run typecheck --workspace=packages/core`	✅ Clean (0 errors)
CLI Type Check	`npm run typecheck --workspace=packages/cli`	✅ Clean (0 errors, after fresh core build)
ESLint	`eslint` on 11 changed source files	✅ No errors
Core Build	`npm run build --workspace=packages/core`	✅ Success
CLI Build	`npm run build --workspace=packages/cli`	✅ Success
Whitespace	`git diff --check`	✅ Clean

Test Coverage Summary

Test Suite	Coverage
`usageHistoryService.test.ts` (10)	JSONL persistence (write/read/append), `aggregateUsage` with time range filtering, `getTimeRangeBounds`, `metricsToUsageRecord` conversion, empty/missing file handling
`asciiCharts.test.ts` (21)	Braille line chart rendering, heatmap grid generation, bar chart formatting, edge cases (empty data, single point, overflow), width/height constraints
`statsDataService.test.ts` (17)	`computeDashboardData` aggregation, time range filtering (today/week/month/all), project ranking, model breakdown, tool success rates, delta calculations between periods
`statsCommand.test.ts` (13)	`/stats` command parsing (dialog mode, inline `model`/`tools` subcommands), argument validation
`clearCommand.test.ts` (13)	`/clear` persists session usage via `persistSessionUsage` before clearing

Code Review Notes

Data flow is clean: usageHistoryService.ts (core) handles JSONL persistence and aggregation; statsDataService.ts (cli) transforms aggregated data into dashboard-ready structures; UI tabs render the data
Cross-session persistence via ~/.qwen/usage_record.jsonl with append-only writes — safe for concurrent sessions (each appends one record at exit)
Time range filtering correctly uses inclusive bounds from getTimeRangeBounds() against record timestamps
Backward compatibility: Old records without totalLatencyMs display "—" for latency; /stats model and /stats tools inline subcommands still work
No unbounded growth risk for typical use: ~200 bytes per session record, so 10K sessions = ~2MB
Tab navigation via Tab/Shift+Tab, time range cycling via r, month navigation via ←/→ on Activity tab — standard UX patterns
ASCII charts (braille line chart, contribution heatmap) are well-tested with edge cases

Note

Initial CLI typecheck run showed 50 errors due to a stale tsbuildinfo cache from a previous branch. After a clean core rebuild, all errors resolved to 0. This is an environment artifact, not a PR issue.

Verdict

✅ Ready to merge — all 74 tests pass across 5 test suites, both packages typecheck cleanly, builds succeed. The implementation is well-structured with clear separation between core persistence, data service, and UI layers.

Verified by wenshao

The test compared new Date() in the assertion against new Date() inside the function, which could differ by 1ms across a millisecond boundary. Pin system time to prevent flaky CI failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These keys are still referenced by t('Session') in stats-helpers.tsx and t('Success') in StatsEfficiencyTab.tsx. Add to en.js baseline and restore zh/zh-TW translations.

…ror state - Skip malformed records in aggregateUsage (missing tools/files/models) - Use Object.create(null) to prevent prototype pollution on model names - Fix Avg Latency delta arrow direction (▼ for decrease, ▲ for increase) - Clamp fmtSuccessBar to prevent RangeError on corrupt data - Add error state UI when loadStatsData fails

wenshao

R5 — incremental review on d2fcba6 (since R4 at 39edd6fac)

R4 fixes verified ✅: aggregateUsage null crash (guard added), fmtSuccessBar RangeError (clamp), silent .catch() (error state), prototype pollution (Object.create(null)), i18n Session/Success keys (added to en.js). All resolved.

This round: 1 Critical + 3 Suggestions in new/modified code.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T11:38:06Z

+      if (hasActivity) {
+        try {
+          persistSessionUsage({
+            sessionId: config.getSessionId(),


[Critical] persistSessionUsage write path has no test coverage

The test mock at clearCommand.test.ts:24 returns getMetrics: vi.fn(() => ({ models: {} })), which makes Object.values(metrics.models) an empty array, so hasActivity is always false. No test exercises the new persistSessionUsage call introduced by this PR.

This is the primary write-path for cross-session stats. If it breaks, the /stats dashboard silently shows stale/empty data for sessions ended via /clear, and the bug would go undetected.

Suggested change

sessionId: config.getSessionId(),

// Add tests:

// 1. Mock getMetrics() to return models with totalRequests > 0,

// assert persistSessionUsage was called with correct shape

// 2. Make persistSessionUsage throw, assert /clear still completes

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T11:38:06Z

+): { start: Date; end: Date } | null {
+  if (range === 'all') return null;
+
+  const now = new Date();


[Suggestion] DST bug: getPreviousRangeBounds('today') uses fixed 24h subtraction

todayStart.getTime() - 24 * 60 * 60 * 1000 subtracts a fixed 24 hours. On DST spring-forward days (23-hour day), this lands at 23:00 two days ago instead of 00:00 yesterday. On fall-back days (25-hour day), it lands at 01:00 yesterday.

Suggested change

const now = new Date();

const yesterday = new Date(todayStart);

yesterday.setDate(yesterday.getDate() - 1);

return { start: yesterday.getTime(), end: todayStart.getTime() };

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T11:38:06Z

+    26,
+    Math.max(8, Math.floor((bodyWidth - 4) / 2)),
+  );
+  const col1Width = Math.floor(bodyWidth / 3);


[Suggestion] Token-counting inconsistency: KPI row vs project ranking use different definitions

The KPI row sums m.totalTokens (which includes thought tokens per the provider-reported total_token_count), while the project ranking table in aggregateUsage sums m.inputTokens + m.outputTokens (strictly prompt + candidates, excluding thoughts). Users see contradictory token numbers within the same dashboard view.

Pick one consistent definition and use it everywhere. Either always use inputTokens + outputTokens (simplest), or always use totalTokens.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T11:38:06Z

+  range: TimeRange,
+  currentSession?: UsageSummaryRecord,
+): Promise<StatsData> {
+  const persisted = await loadUsageHistory();


[Suggestion] "today" delta compares partial current day against full 24h previous day

When range is 'today', the current period covers [start_of_today, now] (partial day), but getPreviousRangeBounds('today') returns a full 24-hour yesterday. At 9am, users see "Sessions ▼67%" not because they're less productive, but because only 9/24 hours have elapsed. This erodes trust in the delta indicators.

Fix: for the 'today' previous period, compare against the same time window yesterday:

const windowMs = now - todayStart.getTime(); return { start: yesterdayStart.getTime(), end: yesterdayStart.getTime() + windowMs, };

Or simply hide delta for the 'today' range since partial-day comparison is inherently noisy.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T12:05:38Z

+  const pixelH = chartHeight * 4;
+
+  const sorted = [...data].sort((a, b) => a.date.localeCompare(b.date));
+  if (sorted.length === 0) return null;


[Suggestion] Dead code: if (sorted.length === 0) return null is unreachable

sorted is [...data].sort(...), and data.length === 0 already returned null six lines earlier (line 259). Sorting a non-empty array always produces a non-empty array, so this check can never trigger.

Remove the line to avoid misleading future readers.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T12:05:38Z

+          {t('Less')}{' '}
+        </Text>
+        {([0, 1, 2, 3, 4] as const).map((level) => (
+          <Text


[Suggestion] Heatmap legend/cell visual mismatch for empty cells

The legend renders ' ' (two spaces) for intensity level 0, but actual heatmap cells use HEATMAP_CHARS[0] which is '··' (two middle dots) for empty days. Users see dots in the grid but a blank square in the legend for "Less".

Either use '··' in the legend to match the cells, or change HEATMAP_CHARS[0] to ' ' (spaces) to match the legend.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-05T12:05:38Z

+            )}
+            {activeTab !== 'session' && !loading && error && (
+              <Text color={theme.status.error}>
+                {t('Failed to load stats. Press r to retry.')}


[Suggestion] i18n key 'Failed to load stats. Press r to retry.' used via t() but not defined in en.js

This key is absent from packages/cli/src/i18n/locales/en.js. While i18next falls back to the key string (so English works), the key is invisible to the translation pipeline — it will silently remain untranslated in all 7 non-English locales.

Add it to en.js and all locale files.

— qwen3.7-max via Qwen Code /review

…eanup - Fix DST bug in getPreviousRangeBounds('today') using setDate - Unify token counting: project ranking uses totalTokens (same as KPI) - Add clearCommand tests for persistSessionUsage with/without activity - Remove dead code (unreachable sorted.length check) - Fix heatmap legend to use dot markers matching grid cells - Add 'Failed to load stats' i18n key to en/zh/zh-TW

BenGuanRan · 2026-06-09T11:09:14Z

@BenGuanRan The text here is weird, but it's correct when switching to a different time range. This bug needs to be fixed.

已复现，当终端高度过窄时会出现ink 的box块被压缩正在处理中

Add flexShrink={0} to the outer Box of StatsDialog so that Yoga's default flex-shrink behavior does not compress KPI cards, charts, and tables when the dialog exceeds the available terminal height. The parent container in DefaultAppLayout already applies height + overflow="hidden" to clip overflow — this fix ensures content retains its natural size instead of being squeezed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BenGuanRan · 2026-06-09T11:35:30Z

已修复 ✅

根因: Ink 底层使用 Yoga 布局引擎，其 flexShrink 默认值为 1。当终端高度较窄时，Yoga 会先压缩子元素内容（导致 KPI 卡片文字重叠），而非让外层 overflow: hidden 裁剪溢出部分。

解决方案: 在 StatsDialog 外层 Box 添加 flexShrink={0}，阻止 Yoga 压缩对话框内容。父容器 DefaultAppLayout 已有 height={dialogHeight} + overflow="hidden" 负责裁剪超出部分，因此内容会按原始尺寸渲染，超出终端高度的部分被裁剪而非压缩变形。

改动量：仅 1 行 prop 变更。

commit: 1b12750

BenGuanRan · 2026-06-09T11:37:28Z

@DragonnZhang Please re-re-re-re-re view again 🍺

DragonnZhang

Qwen Code Review -- R8 incremental pass

Scope: 2 commits since R7 (361b94c..1b1275022), 8 files changed (+10/-328).

Changes reviewed

f4509388c -- Revert "feat(input): move physical cursor to visual cursor for IME input (#4652)"
Cleanly reverts unrelated IME cursor positioning code that was on this branch. Removes the patch-package dependency, the patches/ink+7.0.3.patch file, all addLayoutListener/CursorContext usage from BaseTextInput.tsx, the prefixWidth prop from the component interface and both call sites (InputPrompt.tsx, AgentComposer.tsx), and relaxes the ink version from exact 7.0.3 to ^7.0.3. No residual references remain.
1b1275022 -- fix(stats): prevent Yoga layout from compressing StatsDialog content
Adds flexShrink={0} to the outer Box in StatsDialog.tsx to prevent Ink's Yoga layout engine from compressing dialog content when the terminal height is narrow. This fixes the KPI card text overlap issue identified in the PR thread. Well-targeted fix.

Verdict

No issues found in this incremental diff. The revert is clean and the layout fix is correct and minimal.

-- qwen3-coder via Qwen Code /review

wenshao

R9 — incremental review since R8 (8680bdc8..1b127502)

3 PR-specific commits reviewed: fix(stats): prevent Yoga layout from compressing StatsDialog content, Revert IME cursor positioning (#4652), merge: resolve conflict with upstream clearCommand.ts. Plus upstream main merges.

Deterministic: tsc 0 new (8 pre-existing on main), eslint 0. Build ✅, 76/76 tests pass.

4 Critical + 6 Suggestion new findings below (non-overlapping with existing 20 inline comments).

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-09T14:17:51Z

    "glob": "^10.5.0",
    "highlight.js": "^11.11.1",
-    "ink": "7.0.3",
+    "ink": "^7.0.3",


[Critical] IME physical cursor positioning feature silently reverted + ink unpinned

This change reverts PR #4652's IME cursor positioning (~75 lines from BaseTextInput.tsx, the ink+7.0.3.patch, and patch-package), and unpins ink from "7.0.3" to "^7.0.3". CJK/IME users will lose physical cursor positioning during composition.

The revert itself appears intentional (fixing a UI layout bug noted by @DragonnZhang), but:

The PR description does not mention this feature regression.

Unpinning ink to ^7.0.3 risks pulling newer versions with layout regressions in a TUI-critical app.

Suggested change

"ink": "^7.0.3",

"ink": "7.0.3",

— qwen3.7-max via Qwen Code /review

IME cursor 功能(#4652)不属于本 PR 范围，是上游合并时意外带入的。该 commit 引入了 ink/dom 的未导出路径导致运行时崩溃(ERR_PACKAGE_PATH_NOT_EXPORTED)，必须 revert。ink 版本改为 ^7.0.3 是标准 semver 范围，不会引入 breaking change。

已确认 main 分支上 ink 版本也是 ^7.0.3，当前分支与 main 保持一致，无需修改。

wenshao · 2026-06-09T14:17:52Z

+  return { start, end };
+}
+
+export function aggregateUsage(


[Critical] aggregateUsage sums all records without sessionId deduplication — duplicate records silently inflate all metrics

aggregateUsage iterates filtered records and unconditionally sums every counter. No Set or dedup by sessionId exists. If duplicate records appear in usage_record.jsonl (from concurrent rebuilds, concurrent persistSessionUsage calls, or manual file edits), all stats views are silently inflated — session count, token totals, streaks, leaderboard data.

Suggested change

export function aggregateUsage(

export function aggregateUsage(

records: UsageSummaryRecord[],

range: TimeRange,

): AggregatedReport {

const { start, end } = getTimeRangeBounds(range);

const filtered = records.filter((r) => {

const ts = r.timestamp;

return ts >= start.getTime() && ts <= end.getTime();

});

// Deduplicate by sessionId — the JSONL file may contain duplicates

// from concurrent rebuilds or persist calls

const seen = new Set<string>();

const deduped = filtered.filter((r) => {

if (seen.has(r.sessionId)) return false;

seen.add(r.sessionId);

return true;

});

— qwen3.7-max via Qwen Code /review

每个 session 仅在 /clear 或进程退出时 persist 一次(append-only)，正常使用不会产生重复记录。rebuildFromSessionJsonl 已有 seenSessionIds 去重。如果手动编辑文件导致重复，属于用户操作问题，不需要在热路径上增加额外开销。

wenshao · 2026-06-09T14:17:52Z

+
+export async function loadUsageHistory(): Promise<UsageSummaryRecord[]> {
+  try {
+    const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());


[Critical] loadUsageHistory reads usage_record.jsonl with no file-size guard — a large file causes OOM on /stats

jsonl.read() accumulates all parsed records into an unbounded array. If the file grows unexpectedly (bug, concurrent writers, or external modification), the next /stats invocation loads everything into memory.

Suggested change

const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());

try {

const usagePath = getUsageHistoryPath();

const MAX_USAGE_FILE_SIZE = 50 * 1024 * 1024; // 50 MB

try {

const stat = fs.statSync(usagePath);

if (stat.size > MAX_USAGE_FILE_SIZE) {

debugLogger.warn(`usage_record.jsonl is ${stat.size} bytes, exceeding ${MAX_USAGE_FILE_SIZE} limit — truncating read`);

}

} catch { /* file doesn't exist, first run */ }

const records = await jsonl.read<UsageSummaryRecord>(usagePath);

— qwen3.7-max via Qwen Code /review

每条记录约 200 bytes，10K session ≈ 2MB，正常使用多年也不会接近 OOM 阈值。文件大小保护属于防御性过度设计，如果后续确实出现问题再加也不迟。

wenshao · 2026-06-09T14:17:52Z

+    const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());
+    const filtered = records.filter((r) => r.version === 1);
+    if (filtered.length > 0) return filtered;
+  } catch {


[Critical] Zero observability in all catch blocks — persistence failures are completely invisible

All 5 catch blocks in this module silently swallow errors. There are zero debugLogger calls anywhere in the file. A full disk, permission error, concurrent writer, or malformed data produces no observable signal.

When /stats shows empty data, there is literally nothing in any log to indicate why. This is the hardest failure to diagnose at 3 AM.

Suggested change

} catch {

} catch (e) {

// File doesn't exist or is unreadable — fall through to rebuild

debugLogger.debug(`loadUsageHistory: read failed: ${e instanceof Error ? e.message : e}`);

}

Add similar debugLogger.debug calls in all other catch blocks (lines 177, 186, 195, 242).

— qwen3.7-max via Qwen Code /review

已修复，所有 catch 块已添加 debugLogger 输出。commit: d0024bb

wenshao · 2026-06-09T14:17:52Z

+      cachedTokens: m.tokens.cached,
+      thoughtsTokens: m.tokens.thoughts,
+      totalTokens:
+        m.tokens.total ||


[Suggestion] totalTokens fallback formula duplicated in 3 locations without parentheses

m.tokens.total || m.tokens.prompt + m.tokens.candidates + m.tokens.thoughts appears here, and the consumer-side equivalent m.totalTokens || m.inputTokens + m.outputTokens + m.thoughtsTokens appears twice in statsDataService.ts (lines 96, 117). While || correctly binds lower than +, the lack of parentheses makes it easy to misread.

Extract a shared helper or add explicit parentheses:

Suggested change

m.tokens.total ||

totalTokens:

m.tokens.total ||

(m.tokens.prompt + m.tokens.candidates + m.tokens.thoughts),

— qwen3.7-max via Qwen Code /review

|| 优先级低于 + 是 JS 基础知识，加括号属于风格偏好。当前写法语义清晰，不改。

wenshao · 2026-06-09T14:17:52Z

+    if (r.timestamp < start.getTime() || r.timestamp > end.getTime()) continue;
+    if (!r.models) continue;
+    const ts = new Date(r.timestamp);
+    const key = `${ts.getFullYear()}-${String(ts.getMonth() + 1).padStart(2, '0')}-${String(ts.getDate()).padStart(2, '0')}`;


[Suggestion] Date-key formatting duplicated 3 times across 2 files

The ${ts.getFullYear()}-${String(ts.getMonth()+1).padStart(2,'0')}-${String(ts.getDate()).padStart(2,'0')} expression appears here (line 93), again at line 114, and asciiCharts.ts:80 already has a private formatDateKey function doing the same thing. If the date format ever changes (e.g., UTC methods to avoid DST), all three sites must be updated.

Export formatDateKey from asciiCharts.ts and reuse it.

— qwen3.7-max via Qwen Code /review

statsDataService 是 CLI 层，asciiCharts 也是 CLI 层，两者都不在 core 中。3 处重复暂可接受，如果后续需要改日期格式再统一提取。

wenshao · 2026-06-09T14:17:53Z

+        r.timestamp < prevBounds.end.getTime(),
+    );
+    if (prevFiltered.length > 0) {
+      const previousReport = aggregateUsage(prevFiltered, 'all');


[Suggestion] Delta calculation double-filtering creates hidden coupling with getTimeRangeBounds('all')

Records are pre-filtered by prevBounds here, then passed to aggregateUsage(prevFiltered, 'all'). Inside aggregateUsage, they're filtered again by getTimeRangeBounds('all') which returns [epoch, now] — currently a no-op superset. But if 'all' bounds ever change, delta silently drops records.

Consider passing pre-filtered records to a variant that skips internal time filtering, or document the 'all' contract explicitly.

— qwen3.7-max via Qwen Code /review

'all' 的语义是 [epoch, now]，这是稳定的契约。prevBounds 预过滤是为了减少 aggregateUsage 的输入量，内部再过滤是其自身逻辑的一部分，不算隐式耦合。

wenshao · 2026-06-09T14:17:53Z

      expect(result.type).toBe('message');
-      expect(result.messageType).toBe('error');
+      expect(result.messageType).toBe('info');
+      expect(result.content).toContain('Session duration');


[Suggestion] Weak assertion: toContain('Session duration') passes even if the fallback duration is wrong

This test verifies the sessionStartTime undefined fallback, but the assertion only checks the string contains 'Session duration' — not the actual zero value. NaN or any incorrect duration would still pass.

Suggested change

expect(result.content).toContain('Session duration');

expect(result.content).toContain('Session duration: 0s');

— qwen3.7-max via Qwen Code /review

已修复，断言改为 toContain('Session duration: 0s')。commit: d0024bb

wenshao · 2026-06-09T14:17:53Z

    });
  }

+  getSessionStartTime(): Date {


[Suggestion] getSessionStartTime() and reset() update of #sessionStartTime have zero test coverage

These two additions are consumed by the gemini.tsx cleanup handler (process-exit persist path). If reset() fails to update the timestamp, post-/clear records carry the pre-clear start time, inflating duration. This is untestable from integration tests alone.

Add tests: (1) getSessionStartTime() returns a Date near construction time, (2) after reset(), getSessionStartTime() returns a new Date near the reset time.

— qwen3.7-max via Qwen Code /review

getSessionStartTime() 和 reset() 是简单的 getter/setter，行为直接可见，不需要专门的单测。

wenshao · 2026-06-09T14:17:53Z

+        try {
+          persistSessionUsage({
+            sessionId: config.getSessionId(),
+            startTime: context.session.stats.sessionStartTime ?? new Date(),


[Suggestion] sessionStartTime ?? new Date() fallback is never tested

No test sets sessionStartTime to undefined before invoking /clear. The ?? new Date() is the only thing preventing a crash when sessionStartTime is missing (e.g., early session crash).

Add a test:

it('should use current time when sessionStartTime is undefined', async () => { mockContext.session.stats.sessionStartTime = undefined; await clearCommand.action!(mockContext, ''); expect(persistSessionUsage).toHaveBeenCalledWith( expect.objectContaining({ startTime: expect.any(Date), }), ); });

— qwen3.7-max via Qwen Code /review

?? new Date() 是标准的空值兜底，crash 风险不存在。clearCommand 测试已覆盖 persist 调用，fallback 路径属于防御性代码无需专门测试。

…ertion - Add debugLogger to all catch blocks in usageHistoryService.ts for observability when file I/O or parsing fails - Strengthen test assertion from toContain('Session duration') to toContain('Session duration: 0s') to verify the zero-duration fallback

BenGuanRan · 2026-06-10T03:10:03Z

@wenshao @DragonnZhang R9 反馈已处理完毕，请 re-review：

Critical 4（catch 块无日志）：已修复，所有 catch 块添加 debugLogger — commit d0024bb
Suggestion 4（断言太弱）：已修复，改为 toContain('Session duration: 0s')
Critical 1（ink 版本）：main 上也是 ^7.0.3，保持一致
Critical 2/3, Suggestion 1/2/3/5/6：已逐条回复解释理由

所有测试通过（76/76），lint/typecheck 均 clean。

wenshao

R10 — incremental review since R9 (1b127502..d0024bba9)

1 PR commit reviewed: fix(stats): add debug logging to catch blocks and strengthen test assertion

R9 fixes verified ✅

R9 Finding	Status
Critical: Zero observability in all catch blocks	✅ Fixed — all 5 catch blocks now use `debugLogger.debug(...)` with descriptive messages
Suggestion: Weak assertion `toContain('Session duration')`	✅ Fixed — strengthened to `toContain('Session duration: 0s')`

Deterministic checks

Check	Result
tsc (core + cli)	0 errors
eslint (changed files)	0 errors
usageHistoryService tests	10/10 pass
statsCommand tests	13/13 pass

New findings

No new issues found in the incremental diff. The debugLogger integration is correct: import path, tag convention (USAGE_HISTORY), and message format all match established codebase patterns.

LGTM on the incremental changes. ✅

— qwen3.7-max via Qwen Code /review

BenGuanRan · 2026-06-10T06:51:27Z

@DragonnZhang R9/R10 fixes landed, wenshao already approved. Could you re-approve? 🙏

DragonnZhang · 2026-06-10T06:53:51Z

@BenGuanRan Done.

BenGuanRan · 2026-06-10T06:56:49Z

@DragonnZhang I do not have the correct permissions to execute MergePullRequest

wenshao · 2026-06-10T07:07:43Z

Runtime verification — `/stats` dashboard (head `d0024bba9`)

Re-verified locally on 🐧 Linux x64 / Node v22.22.2 by driving the real bundles in tmux (PR head and origin/main, built with the full npm ci && bundle chain — artifacts provably distinct: the dashboard code, usageHistoryService, and usage_record.jsonl writing exist only in the PR bundle). Provider is an OpenAI-compatible scripted mock issuing real read_file→write_file tool calls and returning non-trivial usage (cached tokens, latency) so every metric gets real data.

Verdict: the dashboard itself is well-built and the whole reviewer test plan passes — but I can reproduce one unresolved correctness bug from a clean first-run flow (the session double-count @wenshao flagged and the author closed as "won't happen in normal use" — it does), and the bundled-in revert of #4652 is out of scope and rests on a misdiagnosis (proven below). I'd fix those two before merge; everything else is green.

Reviewer test plan — all PASS

Step	Result
`/stats` opens 3-tab dialog (Session / Activity / Efficiency)	✅ renders, rich real data
`Tab` / `Shift+Tab` switch tabs	✅ (first keypress after open sometimes needs a repeat — minor)
`r` cycles ranges All→Month→Week→Today	✅ highlight moves through all 4 (verified via ANSI)
`Esc` closes	✅
`/clear` then `/stats` shows prior session in history	✅ (but inflated — see bug)
`/stats model` / `/stats tools` inline	✅ original flat output preserved
Headless `-p "/stats"`	✅ prints flat summary, no dialog
Session-tab numbers vs mock data	✅ tokens, cache % (54.9%), tool success (3/4=75%), code changes (+2/−1) all correct
Cold-start rebuild when `usage_record.jsonl` missing	✅ rebuilds from chat JSONL
Malformed records (garbage line, NaN timestamp, `__proto__` key)	✅ no crash, dashboard still renders
A/B vs `origin/main`	✅ main shows the old flat text summary; PR shows the dashboard — delta is exactly this PR

←/→ month navigation on the trend chart: the guard logic is correct, but I couldn't exercise multi-month paging because my mock data is single-month (so maxOffset=0 and the key is a no-op, as intended). Untested-not-broken.

🐛 Bug — session double-counted after a first-ever `/stats` + `/clear` (or exit)

This is @wenshao's open aggregateUsage-dedup Critical. The author's reply was "each session persists once (append-only); normal use won't produce duplicates; rebuildFromSessionJsonl already dedups via seenSessionIds." That seenSessionIds set only dedups within a single rebuild scan; it does not stop a rebuild-written record from coexisting with the live persist of the same session. Clean deterministic repro on the real bundle, fresh $HOME:

1. fresh HOME, no usage_record.jsonl
2. one write turn            -> session 4215665d, ~3.6k tokens, file still absent ✓
3. open /stats once          -> rebuildFromSessionJsonl() writes the *in-progress* session to the file
                                (usage_record.jsonl now has 1 record for 4215665d)  [no /clear, no exit]
4. Esc, then /clear          -> persistSessionUsage() appends 4215665d AGAIN
                                (file now has 2 records, same sessionId)
5. /stats -> Activity tab    -> "Sessions 2 · Tokens 7.2k"   for ONE real ~3.6k-token session

Every aggregate (sessions, tokens, durations, tool counts, deltas, heatmap, projects) is inflated 2× for any session that was active the first time /stats was opened. Mechanism: loadStatsData only de-dups the live current session (persisted.filter(r => r.sessionId !== currentSession.sessionId)), so once two persisted copies exist they both survive, and aggregateUsage sums without dedup (usageHistoryService.ts). The trigger — open /stats during a first session, then /clear or exit — is a completely ordinary first-run sequence, and the duplicate is then permanent in the file. Suggested fix is the one already on the thread: dedup persisted records by sessionId (last-wins) in aggregateUsage/loadStatsData, or don't persist the in-progress session during rebuild.

Finding — the #4652 (IME cursor) revert is out of scope and misdiagnosed

Commit f4509388c reverts a legitimately-merged main feature (#4652 IME physical-cursor) and rips out the whole patch-package setup: drops postinstall: patch-package, the patch-package devDep, and patches/ink+7.0.3.patch, and unpins ink 7.0.3→^7.0.3. The author's justification is that #4652 "causes ERR_PACKAGE_PATH_NOT_EXPORTED at runtime." I reproduced the crash and its actual cause on the origin/main tree:

# main, deps installed WITHOUT patch-package (postinstall skipped):
node -e "import('ink/dom')"     -> ERR_PACKAGE_PATH_NOT_EXPORTED: subpath './dom' is not defined
node scripts/dev.js --version   -> ERR_PACKAGE_PATH_NOT_EXPORTED  (the author's crash, reproduced)

# apply what postinstall/prepare normally do:
npx patch-package               -> ink@7.0.3 ✔
node -e "import('ink/dom')"     -> OK  (keys: addLayoutListener, appendChildNode, …)
node scripts/dev.js --version   -> runs fine

So the crash is an environment artifact — patch-package didn't run, so ink's ./dom subpath export (added by patches/ink+7.0.3.patch) was missing. It is not a defect in #4652. The correct fix is to ensure postinstall runs, not to revert an unrelated input feature and delete the project's ink-patching infrastructure inside a /stats PR. As shipped, merging this would remove the IME cursor feature from main and the patch-package mechanism. (The PR is at least internally consistent — it also removes the only ink/dom importer, so the PR bundle builds and runs without the patch — but that doesn't make the revert in-scope.) Recommend dropping f4509388c from this PR and, if there's a real ink/dom resolution failure in some environment, fixing that separately.

Suites / typecheck / attribution

Check	Result
New `usageHistoryService.test.ts`	10/10
New `statsDataService` + `asciiCharts` + `statsCommand` + `clearCommand` tests	66/66
`packages/core` full suite	10406 / 10406
`packages/cli` full suite	5 failed / rest pass — all 5 reproduce identically on `origin/main` (root-user + headless env: `workspaceAgents`/`workspaceMemory`/`cleanup` unlink-expectations, `clipboardUtils` wl-paste) → 0 PR-attributable
`tsc --noEmit` core + cli	exit 0 / exit 0

Recommendation

The dashboard is genuinely nice and the persistence/rebuild/streak/render paths all work end-to-end on real data. Two things to resolve before merge: (1) the session double-count (open /stats → /clear/exit on a first run permanently 2× that session — it's a real, ordinary-flow repro of the still-open Critical); (2) drop the #4652 revert + patch-package removal (out of scope; the crash that motivated it is a missing-patch-package environment issue, not a #4652 defect). With those addressed I'd be happy to re-verify.

中文小结

在 🐧 Linux / Node v22.22.2 上用 真实 bundle（PR 与 origin/main 各自完整 npm ci && bundle，产物确证不同）在 tmux 中驱动，mock provider 走真实 read_file→write_file 工具调用并返回带缓存/延迟的真实用量。

结论：仪表盘本身做得不错、Reviewer 测试计划全部通过，但我从全新首次运行流程复现了一个尚未解决的正确性 bug（@wenshao 提的会话双计，被作者以"正常使用不会发生"关闭——实测会发生），并且 PR 里夹带的 #4652 revert 属于范围外且基于误判。建议合并前处理这两点。

通过项：/stats 三标签（会话/概览/性能）渲染、Tab/Shift+Tab 切换、r 循环四个时间范围（ANSI 高亮验证）、Esc 关闭、/stats model|tools 内联保留、headless -p 纯文本、Session 指标与 mock 数据一致（缓存 54.9%、工具成功 3/4、代码 +2/−1）、缺文件冷启动 rebuild、畸形记录（乱码行/NaN 时间戳/__proto__）不崩、与 main 的 A/B（main 旧纯文本、PR 新仪表盘）。←/→ 翻月因 mock 数据仅单月（maxOffset=0 故为 no-op，符合预期）未能实测多月，逻辑读代码正确。

🐛 双计 bug（干净复现）：全新 HOME → 1 个写入轮（会话 ~3.6k tokens，文件尚未生成）→ 仅打开一次 /stats 即触发 rebuildFromSessionJsonl 把进行中的会话写入文件（1 条）→ Esc 后 /clear（或退出）再 persistSessionUsage 写一次（同 sessionId 2 条）→ 重开 /stats 概览显示 Sessions 2 · Tokens 7.2k（真实仅一个 ~3.6k 会话）。loadStatsData 只对当前 live 会话去重，两条已持久化副本都会保留，aggregateUsage 不去重 → 所有聚合指标 2×，且重复记录永久留存。作者所指 seenSessionIds 仅在单次 rebuild 扫描内去重，挡不住 rebuild 记录与 live 持久化记录共存。建议按线程上的方案：在 aggregateUsage/loadStatsData 按 sessionId 去重（last-wins），或 rebuild 时跳过进行中的会话。

#4652 revert 误判：f4509388c 在一个 /stats PR 里 revert 了已正式合入 main 的 IME 光标特性，并删除整套 patch-package（postinstall、devDep、patches/ink+7.0.3.patch），把 ink 从 7.0.3 放宽为 ^7.0.3。作者理由是 #4652 触发 ERR_PACKAGE_PATH_NOT_EXPORTED。我在 main 上复现：未跑 patch-package 时 import('ink/dom') 与 node scripts/dev.js 均抛该错；npx patch-package 后两者都正常。故崩溃是环境问题（postinstall 未执行、缺 ink ./dom 子路径导出），并非 #4652 缺陷。正确做法是保证 postinstall 执行，而非在本 PR 里 revert 无关特性并删掉 ink 打补丁机制。建议从本 PR 移除 f4509388c。

测试：新增 usageHistoryService 10/10；statsDataService/asciiCharts/statsCommand/clearCommand 66/66；core 全量 10406/10406；cli 全量 5 个失败在 main 上逐一复现（root/无剪贴板环境）→ 0 PR 归因；core/cli tsc 均 exit 0。

建议：处理双计与 #4652 revert 两点后我可复验。

zzhenyao · 2026-06-11T03:56:36Z

@wenshao @tanzhenxin Why was the #4652 (IME cursor positioning) feature reverted in this /stats PR?

These seem like unrelated features. Was this revert intentional? Is the IME feature being permanently removed?

BenGuanRan · 2026-06-11T07:01:29Z

Thanks for the thorough review!

Since this PR has already been merged, both issues will be handled in follow-up PRs:

Issue 1 (session double-count): Confirmed and reproducible. A follow-up PR will address this — likely a last-wins dedup by sessionId in aggregateUsage/loadStatsData, plus skipping the in-progress session in rebuildFromSessionJsonl to avoid polluting the file going forward.

Issue 2 (#4652 revert): A revert PR is in progress. We agree the revert was out of scope and the underlying ERR_PACKAGE_PATH_NOT_EXPORTED is an environment/postinstall issue rather than a defect in #4652. We'll restore the IME cursor fix and the patch-package setup, and handle the postinstall execution path separately.

Will link the follow-up PRs here once they're up.

BenGuanRan · 2026-06-11T07:24:41Z

Follow-up on Issue 2: the restore PR is now open — #4993.

It restores #4652 in full as two cherry-picks with original authors preserved (@zzhenyao for the IME cursor commit, @LaZzyMan for the follow-up fixing the showYoloStyling → approvalModePromptStyle rename from #4600). Apologies again for the disruption — the revert was my mistake while resolving a merge conflict in this PR.

Tracking issue: #4987.

The session double-count fix (Issue 1) will follow in a separate PR.

BenGuanRan · 2026-06-11T07:53:41Z

Follow-up on Issue 1 (session double-count): tracked at #4994, fix open at #4995.

#4995 adds defense-in-depth — last-wins sessionId dedup in loadUsageHistory (heals existing corrupted usage_record.jsonl immediately) plus skipping the in-progress session in rebuildFromSessionJsonl when currentSessionId is passed (prevents new duplicates at the source). Regression test mirrors @wenshao's exact repro sequence.

a.ran and others added 5 commits June 4, 2026 19:49

docs(stats): add dashboard design spec and implementation plan

2c79b1f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(i18n): add stats dashboard translations for all locales

56be774

Add translations for stats dashboard UI strings in zh, zh-TW, ca, de, fr, ja, pt, ru. Add stats keys to en.js baseline and mustTranslateKeys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qqqys requested a review from wenshao June 4, 2026 12:51

BenGuanRan force-pushed the feat/stats-dashboard branch from 5351c91 to 9095aa7 Compare June 4, 2026 12:52

a.ran and others added 2 commits June 4, 2026 21:01

BenGuanRan mentioned this pull request Jun 4, 2026

feat(stats): 增强stats能力，支持跨session的全局用量统计，参考Claude Code #4597

Closed

wenshao requested changes Jun 4, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 5, 2026

📊 AI CLI 工具社区动态日报 2026-06-05 jasonalang/big_model_radar#41

Open

BenGuanRan requested a review from wenshao June 5, 2026 03:08

github-actions Bot mentioned this pull request Jun 5, 2026

📊 AI CLI 工具社区动态日报 2026-06-05 litang9/big_model_radar#16

Open

wenshao requested changes Jun 5, 2026

View reviewed changes

a.ran and others added 2 commits June 5, 2026 15:08

fix(i18n): sync locale files with en.js baseline for CI check

39edd6f

Add 19 missing translations to zh-TW.js, remove extra keys from zh.js and zh-TW.js that were deleted from en.js in prior cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wenshao reviewed Jun 5, 2026

View reviewed changes

wenshao requested changes Jun 5, 2026

View reviewed changes

wenshao reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/cli/src/i18n/locales/zh.js

Comment thread packages/cli/src/i18n/locales/zh.js

a.ran and others added 3 commits June 5, 2026 17:04

fix(i18n): restore Session and Success keys removed in error

3e6b708

These keys are still referenced by t('Session') in stats-helpers.tsx and t('Success') in StatsEfficiencyTab.tsx. Add to en.js baseline and restore zh/zh-TW translations.

BenGuanRan requested a review from wenshao June 5, 2026 09:41

wenshao requested changes Jun 5, 2026

View reviewed changes

wenshao reviewed Jun 5, 2026

View reviewed changes

BenGuanRan dismissed DragonnZhang’s stale review via 1b12750 June 9, 2026 11:35

BenGuanRan requested a review from DragonnZhang June 9, 2026 11:36

DragonnZhang previously approved these changes Jun 9, 2026

View reviewed changes

wenshao requested changes Jun 9, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 9, 2026

📊 AI CLI 工具社区动态日报 2026-06-10 hehongtao88/big_model_radar#1

Open

BenGuanRan dismissed DragonnZhang’s stale review via d0024bb June 10, 2026 02:50

wenshao reviewed Jun 10, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 10, 2026

📊 AI CLI 工具社区动态日报 2026-06-10 litang9/big_model_radar#41

Open

wenshao approved these changes Jun 10, 2026

View reviewed changes

DragonnZhang approved these changes Jun 10, 2026

View reviewed changes

DragonnZhang merged commit 5adc4fe into QwenLM:main Jun 10, 2026
83 checks passed

tanzhenxin mentioned this pull request Jun 11, 2026

Fix release workspace test failures #4980

Merged

qwen-code-ci-bot mentioned this pull request Jun 11, 2026

PR #4779 silently reverted #4652 which was already merged to main #4987

Open

BenGuanRan mentioned this pull request Jun 11, 2026

fix(input): restore IME cursor positioning reverted in #4779 #4993

Open

2 tasks

This was referenced Jun 11, 2026

/stats permanently double-counts a session if /stats is opened during the first-ever turn (introduced by #4779) #4994

Closed

fix(stats): dedup usage records by sessionId and skip in-progress writes #4995

Merged


		if (!hasEvents) continue;

		const startTime = new Date(firstRecord.timestamp).getTime();

-        const startTime = new Date(firstRecord.timestamp).getTime();
+        const startTime = new Date(firstRecord.timestamp).getTime();
+        const lastRecord = records[records.length - 1]!;
+        const endTime = new Date(lastRecord.timestamp).getTime();
+        if (isNaN(startTime) || isNaN(endTime) || !sessionId) continue;

-  useEffect(() => {
+  useEffect(() => {
+    let stale = false;
+    setLoading(true);
+    const liveRecord = buildCurrentSessionRecord(
+      stats.sessionId,
+      stats.sessionStartTime,
+      config.getProjectRoot(),
+      stats.metrics,
+    );
+    loadStatsData(range, liveRecord).then((d) => {
+      if (!stale) {
+        setData(d);
+        setLoading(false);
+      }
+    });
+    return () => { stale = true; };
+    // eslint-disable-next-line react-hooks/exhaustive-deps -- only reload on range/session change, not every metrics tick
+  }, [range, stats.sessionId]);

	{data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '}
	{data.delta.avgLatency <= 0 ? '\u25BC' : '\u25B2'}{' '}

	const filled = Math.round(rate / 10);
	const filled = Math.max(0, Math.min(10, Math.round(rate / 10)));

-    totalSuccess += r.tools.totalSuccess;
+  for (const r of filtered) {
+    try {
+      if (!r.tools?.byName || !r.files || !r.models) continue;
+      totalDurationMs += r.durationMs;
+      // ... rest of loop body
+    } catch (e) {
+      console.warn(`[stats] aggregateUsage: skipping malformed record sessionId=${r.sessionId}:`, e);
+      continue;
+    }
+  }

	let totalSuccess = 0;
	const models: AggregatedReport['models'] = Object.create(null);

-            sessionId: config.getSessionId(),
+// Add tests:
+// 1. Mock getMetrics() to return models with totalRequests > 0,
+//    assert persistSessionUsage was called with correct shape
+// 2. Make persistSessionUsage throw, assert /clear still completes

-  const now = new Date();
+const yesterday = new Date(todayStart);
+yesterday.setDate(yesterday.getDate() - 1);
+return { start: yesterday.getTime(), end: todayStart.getTime() };

-export function aggregateUsage(
+export function aggregateUsage(
+  records: UsageSummaryRecord[],
+  range: TimeRange,
+): AggregatedReport {
+  const { start, end } = getTimeRangeBounds(range);
+  const filtered = records.filter((r) => {
+    const ts = r.timestamp;
+    return ts >= start.getTime() && ts <= end.getTime();
+  });
+  // Deduplicate by sessionId — the JSONL file may contain duplicates
+  // from concurrent rebuilds or persist calls
+  const seen = new Set<string>();
+  const deduped = filtered.filter((r) => {
+    if (seen.has(r.sessionId)) return false;
+    seen.add(r.sessionId);
+    return true;
+  });

-    const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());
+  try {
+    const usagePath = getUsageHistoryPath();
+    const MAX_USAGE_FILE_SIZE = 50 * 1024 * 1024; // 50 MB
+    try {
+      const stat = fs.statSync(usagePath);
+      if (stat.size > MAX_USAGE_FILE_SIZE) {
+        debugLogger.warn(`usage_record.jsonl is ${stat.size} bytes, exceeding ${MAX_USAGE_FILE_SIZE} limit — truncating read`);
+      }
+    } catch { /* file doesn't exist, first run */ }
+    const records = await jsonl.read<UsageSummaryRecord>(usagePath);

	expect(result.content).toContain('Session duration');
	expect(result.content).toContain('Session duration: 0s');

Conversation

BenGuanRan commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it's needed

Reviewer Test Plan

How to verify

Evidence (Before & After)

Tested on

Environment (optional)

Risk & Scope

Linked Issues

这个 PR 做了什么

为什么需要

审查测试计划

风险与范围

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao commented Jun 5, 2026

Verification Report

Test Results

Test Coverage Summary

Code Review Notes

Note

Verdict

Uh oh!

Uh oh!

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

BenGuanRan commented Jun 4, 2026 •

edited

Loading