Skip to content

feat(stats): add interactive /stats dashboard with cross-session tracking#4779

Merged
DragonnZhang merged 21 commits into
QwenLM:mainfrom
BenGuanRan:feat/stats-dashboard
Jun 10, 2026
Merged

feat(stats): add interactive /stats dashboard with cross-session tracking#4779
DragonnZhang merged 21 commits into
QwenLM:mainfrom
BenGuanRan:feat/stats-dashboard

Conversation

@BenGuanRan

@BenGuanRan BenGuanRan commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Adds an interactive /stats dashboard that tracks and visualizes usage across all CLI sessions. The dashboard has three tabs: Session (live current-session metrics), Activity (usage trends and patterns over time), and Efficiency (performance metrics and tool analysis). Usage data is automatically persisted to ~/.qwen/usage_record.jsonl at session exit and on /clear, with a fallback rebuild from chat history if the file is missing.

Why it's needed

Previously /stats only showed a flat text dump of the current session's token counts. Users had no way to see cross-session trends, identify which tools or models they use most, track their coding streaks, or compare performance across time periods. This dashboard gives users actionable visibility into their usage patterns — cache hit rates, tool success rates, latency trends, and project-level breakdowns — all without leaving the terminal.

Reviewer Test Plan

How to verify

  1. Run npm run dev to start the CLI
  2. Type /stats to open the dashboard dialog
  3. Verify three tabs render: Session, Activity, Efficiency
  4. Press Tab / Shift+Tab to switch between tabs
  5. Press r to cycle time ranges (All → Month → Week → Today)
  6. On the Activity tab with range=All, press / to navigate months in the token trend chart
  7. Press Esc to close the dialog
  8. Run /clear then /stats again — previous session should appear in the historical data
  9. Run in headless mode: qwen -p "hello" && qwen -p "/stats" — verify the first session's data is included

Activity tab should show: KPI row with delta arrows, GitHub-style contribution heatmap, braille line chart for token trend, top-5 project ranking table.

Efficiency tab should show: Cache Hit Rate / Tool Success / Avg Latency cards with deltas, tool leaderboard with success rate bars, model comparison table with per-model latency and cache %, code impact line counts.

Evidence (Before & After)

Before

image

After

Kapture 2026-06-04 at 21 12 20 image image image

Mock
image

image

Tested on

OS Status
🍏 macOS
🪟 Windows ⚠️
🐧 Linux ⚠️

Environment (optional)

npm run dev with local Qwen API key.

Risk & Scope

  • Main risk or tradeoff: The usage_record.jsonl file grows unboundedly over time. For heavy users this could eventually become large, but at ~200 bytes per session even 10,000 sessions is only ~2MB.
  • Not validated / out of scope: Cost estimation (requires user-configured pricing), per-file change tracking, context window compression metrics, interactive drill-down into individual sessions.
  • Breaking changes / migration notes: None. The UsageSummaryRecord schema is backward-compatible — old records without totalLatencyMs display "—" for latency. The /stats command now opens a dialog in interactive mode instead of printing inline stats (use /stats model or /stats tools for inline output).

Linked Issues

Closes #4597

中文说明

这个 PR 做了什么

添加了一个交互式 /stats 仪表盘,用于跟踪和可视化所有 CLI 会话的使用情况。仪表盘包含三个标签页:会话(当前会话实时指标)、概览(跨会话的使用趋势和模式)、性能(性能指标和工具分析)。使用数据在会话退出和 /clear 时自动持久化到 ~/.qwen/usage_record.jsonl,如果文件丢失则从聊天历史自动重建。

为什么需要

之前 /stats 只显示当前会话的纯文本 token 计数。用户无法查看跨会话趋势、识别最常用的工具或模型、跟踪编码连续天数,或比较不同时间段的性能。这个仪表盘为用户提供了可操作的使用洞察——缓存命中率、工具成功率、延迟趋势和项目级分解——全部在终端内完成。

审查测试计划

  1. 运行 npm run dev 启动 CLI
  2. 输入 /stats 打开仪表盘对话框
  3. 验证三个标签页渲染:会话、概览、性能
  4. Tab / Shift+Tab 切换标签
  5. r 循环时间范围(全部 → 月 → 周 → 今天)
  6. 在概览标签的 range=All 下,按 / 浏览趋势图月份
  7. Esc 关闭对话框
  8. 运行 /clear 后再次 /stats — 之前的会话应出现在历史数据中

风险与范围

  • 主要风险:usage_record.jsonl 文件随时间无限增长,但每个会话约 200 字节,10,000 个会话也只有约 2MB。
  • 不在范围内:成本估算、逐文件变更跟踪、上下文窗口压缩指标、单会话钻取。
  • 破坏性变更:无。UsageSummaryRecord schema 向后兼容。/stats 命令在交互模式下从内联输出改为打开对话框(内联输出可用 /stats model/stats tools)。

a.ran and others added 5 commits June 4, 2026 19:49
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add usageHistoryService to core with JSONL-based persistence, session
replay from chat history with sessionId deduplication, time-range
aggregation, and per-model/tool/file breakdown including latency fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add statsDataService for delta calculations, efficiency metrics, tool
leaderboard, and heatmap/trend data. Add asciiCharts with braille line
chart (Bresenham rendering) and GitHub-style contribution heatmap.
Includes 38 unit tests covering both modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add three-tab dialog: Session (live metrics), Activity (KPIs, heatmap,
braille token trend chart, project ranking), and Efficiency (cache rate,
tool success, latency cards, tool leaderboard, model comparison table).

Supports tab/shift-tab navigation, r to cycle time ranges (all/month/
week/today), left/right to pan months in the trend chart, esc to close.

Persist usage on /clear for accurate cross-session tracking. Update
statsCommand tests for new dialog behavior and clearCommand tests for
telemetry mock. Update /stats documentation in commands.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add translations for stats dashboard UI strings in zh, zh-TW, ca, de,
fr, ja, pt, ru. Add stats keys to en.js baseline and mustTranslateKeys.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@qqqys qqqys requested a review from wenshao June 4, 2026 12:51
Intensity 0 cells (no activity) now render without backgroundColor,
inheriting the terminal's native background instead of a hardcoded
color that renders incorrectly across different terminal themes.
Also fix green gradient direction: brighter = more activity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@BenGuanRan BenGuanRan force-pushed the feat/stats-dashboard branch from 5351c91 to 9095aa7 Compare June 4, 2026 12:52
a.ran and others added 2 commits June 4, 2026 21:01
Inactive cells render as '··' with no background color instead of
colored blocks, matching common contribution graph designs. Active
cells keep their green gradient backgrounds. Fix gradient direction
so brighter green = more activity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add back Session ID, Success Rate with color thresholds, User Agreement
rate, Performance breakdown (Wall Time, Agent Active, API Time %, Tool
Time %), and full token counts that were present in the original exit
screen but missing from the new Session tab.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread packages/cli/src/ui/utils/asciiCharts.ts
Comment thread packages/cli/src/ui/utils/statsDataService.ts Outdated
Comment thread packages/cli/src/ui/components/StatsHeatmapView.tsx
Comment thread packages/core/src/services/usageHistoryService.ts
Comment thread packages/cli/src/ui/components/StatsSessionTab.tsx
Comment thread packages/cli/src/i18n/locales/en.js
- Fix monthOffset overflow: setDate(1) before subtracting months to
  prevent day-count overflow (e.g. Mar 31 → Feb)
- Fix UTC date-parse off-by-one: append 'T00:00:00' to date-only
  strings in calculateStreaks and HeatmapView fmtDate
- Fix current session double-counted after rebuild: deduplicate by
  sessionId when injecting live session into loadStatsData
- Remove unused bodyWidth prop from SessionTab
- Remove 13 unused i18n keys (Overview, Favorite model, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2: New findings after R1 fixes verified ✅

All 6 R1 issues have been correctly addressed in commit 7ed5dbe03. This round found 3 new Critical issues (data corruption, crash, memory/UX) and 2 Suggestions.

Needs Human Review (low confidence):

  • Test coverage gaps: /clear hasActivity branch never exercised (mock always returns empty models), loadStatsData currentSession dedup untested, calculateStreaks has no dedicated edge-case tests, loadUsageHistory version filter + rebuild fallback untested

— qwen3.7-max via Qwen Code /review


if (!hasEvents) continue;

const startTime = new Date(firstRecord.timestamp).getTime();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] NaN timestamp from malformed chat records propagates to usage_record.jsonl as permanent phantom records

If firstRecord.timestamp is missing, empty, or malformed (which occurs naturally via corruption, legacy records, or interrupted writes), new Date(undefined).getTime() returns NaN. This NaN flows into metricsToUsageRecord as timestamp, startTime, and durationMs. JSON.stringify serializes NaN → null, so the record is persisted as "timestamp":null.

On subsequent reads, the null-timestamp record passes the version === 1 filter and enters aggregateUsage. For the 'all' range, null >= 0 evaluates to true in JavaScript, so the phantom record inflates sessionCount. The projectMap accumulates an entry keyed on undefined (from missing cwd), producing a mystery unnamed project in the dashboard.

This corruption persists until the user manually deletes usage_record.jsonl.

Suggested change
const startTime = new Date(firstRecord.timestamp).getTime();
const startTime = new Date(firstRecord.timestamp).getTime();
const lastRecord = records[records.length - 1]!;
const endTime = new Date(lastRecord.timestamp).getTime();
if (isNaN(startTime) || isNaN(endTime) || !sessionId) continue;

— qwen3.7-max via Qwen Code /review

config.getProjectRoot(),
stats.metrics,
);
loadStatsData(range, liveRecord).then((d) => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Unhandled promise rejection crashes the TUI

loadStatsData(range, liveRecord).then(...) has no .catch() handler. If loadStatsData throws (e.g., from rebuildFromSessionJsonl encountering an unexpected shape, or aggregateUsage propagating NaN), the rejected promise is unhandled.

There is no React error boundary (componentDidCatch) anywhere in the CLI UI tree — I searched packages/cli/src and found zero matches. An unhandled rejection in Ink/React crashes the entire process. The user types /stats, the TUI dies silently.

Additionally, there's no error state in the UI. If the load fails, the dialog would either show "Loading stats..." forever or render an empty box.

Suggested change
loadStatsData(range, liveRecord).then((d) => {
loadStatsData(range, liveRecord)
.then((d) => {
setData(d);
setLoading(false);
})
.catch((err) => {
console.error('Failed to load stats:', err);
setLoading(false);
});

— qwen3.7-max via Qwen Code /review

};
}

async function rebuildFromSessionJsonl(): Promise<UsageSummaryRecord[]> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] rebuildFromSessionJsonl loads entire chat JSONL files into memory and uses synchronous I/O on the main thread

Two compounding issues:

  1. Memory spike: jsonl.read<ChatRecord>(filePath) loads the entire file into memory. Each ChatRecord contains full Content objects (assistant text, tool results with file contents). A single chat JSONL for a session with heavy tool use can exceed 100 MB. The function iterates ALL project directories and ALL chat files.

  2. TUI freeze: fs.readdirSync, fs.existsSync, and writeLineSync (which wraps fs.appendFileSync) are all synchronous. For a user with hundreds of sessions, the rebuild blocks the Node.js event loop for 10+ seconds. The only feedback is "Loading stats..." text. A user may kill the process thinking it's hung, potentially interrupting writeLineSync mid-write and leaving usage_record.jsonl partially written.

Consider streaming records and discarding non-telemetry entries immediately, or capping the backfill to the N most recent files by modification time.

— qwen3.7-max via Qwen Code /review

const safeWidth = Math.max(72, width ?? 100);
const bodyWidth = safeWidth - 6;

useEffect(() => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] useEffect has no cancellation/staleness guard + re-reads file on every range change

Two related issues:

  1. Race condition: Pressing r rapidly to cycle ranges fires multiple loadStatsData calls concurrently. A slower earlier request can resolve after a faster later one, briefly displaying data for the wrong range.

  2. Redundant I/O: Each range change triggers loadUsageHistory() which reads and parses the entire usage_record.jsonl from disk. Cycling through 4 ranges = 4 full file reads, even though the underlying data hasn't changed — only the filter window differs.

Suggested change
useEffect(() => {
useEffect(() => {
let stale = false;
setLoading(true);
const liveRecord = buildCurrentSessionRecord(
stats.sessionId,
stats.sessionStartTime,
config.getProjectRoot(),
stats.metrics,
);
loadStatsData(range, liveRecord).then((d) => {
if (!stale) {
setData(d);
setLoading(false);
}
});
return () => { stale = true; };
// eslint-disable-next-line react-hooks/exhaustive-deps -- only reload on range/session change, not every metrics tick
}, [range, stats.sessionId]);

For the redundant I/O, consider caching raw records in a useRef and re-aggregating in memory when only the range changes.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 触发条件极其罕见 — 只在第一次使用/stats指令会基于历史session数据构建 usage_record.jsonl ,只有用户手动删除文件才会 rebuild。
  2. 实际数据量有限 — 本地 rebuild 跑过mock测试,71-81 个 session,耗时 < 2 秒。100MB 单文件的 session
    是极端边界。
  3. 已有 try/catch — 单个文件解析失败不会中断整体 rebuild。
  4. 修复成本高 — 流式读取需要改 jsonl.read 工具函数的接口,影响面大。

a.ran and others added 2 commits June 5, 2026 15:08
- Guard against malformed chat records with NaN timestamps in rebuild
- Add .catch() to loadStatsData promise to prevent TUI crash
- Add stale flag to useEffect to prevent race on rapid range cycling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 19 missing translations to zh-TW.js, remove extra keys from
zh.js and zh-TW.js that were deleted from en.js in prior cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few cross-cutting items that don't map to a single line (all non-blocking):

  • Concurrent rebuild can duplicate history. loadUsageHistory writes the rebuilt records back to usage_record.jsonl on the read path whenever the file is empty/missing, with no lock. Two CLI instances opening /stats (or exiting) before either has written will each append the full reconstructed history → permanently doubled totals; large records can also tear under concurrent appendFileSync. Consider not persisting on the read path, or writing rebuilt data atomically under a lock.
  • Silent blank dashboard on load failure. StatsDialog's loadStatsData(...).catch(() => setLoading(false)) swallows the error with no log and leaves data null, so the Activity/Efficiency body renders empty with no message — indistinguishable from "no data" and undiagnosable. Surface an error state and log via the debug logger.
  • Render-path recomputation. buildBrailleLineChart / buildHeatmapData and the per-day aggregation re-run on every dialog re-render/keypress (no useMemo); memoizing on their inputs removes the redundant work.
  • Heatmap month coupling. StatsActivityTab passes the token-chart's data-month index (clampedOffset) straight to HeatmapView as a calendar-month offset, and it isn't reset when leaving range='all'; gating it (range === 'all' ? clampedOffset : 0) avoids the heatmap desyncing from the rest of the tab.
  • CI Lint is red (the check-i18n step). It looks largely pre-existing (most of the 203 unused keys are unrelated auth strings), but the new locale changes add to it — worth running npm run check-i18n locally to confirm.
  • Committed design docs. .qwen/plans/…md (1337 lines) and .qwen/specs/…md (270 lines) appear force-added past .gitignore (.qwen/*); confirm that's intentional vs. local-only scratch.

— claude-opus-4-8 via Qwen Code /qreview

: theme.status.error
}
>
{data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Avg Latency delta arrow points the wrong way

The other five KPI deltas render the arrow from the sign of the change (delta >= 0 ? '▲' : '▼'), but Avg Latency uses avgLatency <= 0 ? '▲' : '▼'. Since computeDelta defines avgLatency = current − previous, a negative delta means latency dropped — yet it renders ▲. So a falling latency (good) shows an up-arrow and a rising latency shows a down-arrow, contradicting the number and every other card. The color is already correct; only the glyph is inverted.

Suggested change
{data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '}
{data.delta.avgLatency <= 0 ? '\u25BC' : '\u25B2'}{' '}

Keep the inverted color test (<= 0 ? success : error); just flip the glyph so it tracks direction.

— claude-opus-4-8 via Qwen Code /qreview

key={tool.name}
cells={[
{
text: ' ' + tool.name.slice(0, 14),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Untrusted model/tool/project names are rendered to the terminal unsanitized

tool.name here (and the model name at line 193, the model name in StatsSessionTab, and the project path in StatsActivityTab) comes verbatim from usage_record.jsonl / the rebuilt chat logs and is rendered via ink <Text>. .slice(0, 14) truncates by code-unit count but does not strip control bytes, so an embedded ESC (\x1b…) or newline in a name survives into the terminal — enabling cursor/ANSI escapes or broken table alignment when /stats is opened on a crafted or corrupted record. Defense-in-depth: strip control chars before rendering (e.g. name.replace(/[\u0000-\u001f\u007f-\u009f]/g, '')) at the point these fields enter the render layer.

— claude-opus-4-8 via Qwen Code /qreview

}

export function fmtSuccessBar(rate: number): string {
const filled = Math.round(rate / 10);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] fmtSuccessBar can throw RangeError and crash the CLI on malformed data

'░'.repeat(10 - filled) throws when filled > 10 (or '█'.repeat(filled) when filled < 0). Normal telemetry keeps success ≤ count so rate ∈ [0,100] and this is safe today — but successRate ultimately comes from usage_record.jsonl, which loadUsageHistory accepts on a version === 1 check alone (no field validation). A single corrupt/hand-edited line with success > count makes rate > 100, and since the dialog has no React error boundary, the throw during render takes down the whole TUI. Cheap guard:

Suggested change
const filled = Math.round(rate / 10);
const filled = Math.max(0, Math.min(10, Math.round(rate / 10)));

— claude-opus-4-8 via Qwen Code /qreview

};
}

describe('aggregateUsage', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The rebuild / load / streak paths — where the prior review found several bugs — have no test coverage

This suite exercises metricsToUsageRecord and aggregateUsage only. rebuildFromSessionJsonl + loadUsageHistory (the cold-start path) and the streak / currentSession-merge logic in statsDataService are never invoked, so the recently-added guards can regress silently. Worth covering: rebuild dedup via seenSessionIds, the isNaN(timestamp) skip, the hasEvents === false skip, missing/empty projects/ and chats/ dirs → [], the write-back to usage_record.jsonl, loadUsageHistory falling through to rebuild when the file is absent, calculateStreaks (consecutive / gap / last-activity-stale / single-day), and loadStatsData dedup when a persisted record shares the live sessionId.

— claude-opus-4-8 via Qwen Code /qreview

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

— qwen3.7-max via Qwen Code /review

}
})
.catch(() => {
if (!stale) setLoading(false);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] .catch() silently swallows all errors — activity/efficiency tabs render a blank content area on failure

The new handler clears loading but sets no error state. The render gate at lines 205 / 211 requires both !loading && data, so when loadStatsData rejects the activity and efficiency tabs render nothing — no error message, no retry affordance. The user sees a seemingly healthy empty box, indistinguishable from "no data in this range." Combined with the silent continues in rebuildFromSessionJsonl and aggregateUsage, a corrupted ~/.qwen/usage_record.jsonl now bricks /stats invisibly.

Suggested change
if (!stale) setLoading(false);
const [error, setError] = useState(false);
// ...
loadStatsData(range, liveRecord)
.then((d) => {
if (!stale) {
setData(d);
setLoading(false);
setError(false);
}
})
.catch((err) => {
console.error('[stats] loadStatsData failed:', err);
if (!stale) {
setError(true);
setLoading(false);
}
});

Then render a fallback when !loading && error && activeTab !== 'session' (e.g., <Text color={theme.text.error}>{t('Failed to load stats. Press r to retry.')}</Text>).

— qwen3.7-max via Qwen Code /review

totalDurationMs += r.durationMs;
totalLatencyMs += r.totalLatencyMs ?? 0;
totalCalls += r.tools.totalCalls;
totalSuccess += r.tools.totalSuccess;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] aggregateUsage unconditionally deep-accesses record fields — a single malformed record in usage_record.jsonl crashes /stats permanently

totalCalls += r.tools.totalCalls (line 338), r.tools.totalSuccess, r.tools.totalFail, r.files.linesAdded, Object.entries(r.models), and Object.entries(r.tools.byName) all throw TypeError if any persisted UsageSummaryRecord is missing one of those fields (truncated write, older schema, concurrent-write race, disk full mid-append). The throw propagates through loadStatsData into the .catch() above, leaving the dashboard blank with no indication of which record is poisoned.

The worst failure mode: one corrupted line in ~/.qwen/usage_record.jsonl bricks /stats permanently for that user. The user can still use the rest of the CLI, so they report "the stats command is buggy" with no useful diagnostic.

Suggested change
totalSuccess += r.tools.totalSuccess;
for (const r of filtered) {
try {
if (!r.tools?.byName || !r.files || !r.models) continue;
totalDurationMs += r.durationMs;
// ... rest of loop body
} catch (e) {
console.warn(`[stats] aggregateUsage: skipping malformed record sessionId=${r.sessionId}:`, e);
continue;
}
}

— qwen3.7-max via Qwen Code /review


const models: AggregatedReport['models'] = {};
let totalCalls = 0;
let totalSuccess = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] models: {} is a plain object accumulating model stats from JSON-parsed JSONL — prototype pollution via __proto__ key

const models: AggregatedReport['models'] = {}; is iterated by Object.entries(r.models) where r.models comes from usage_record.jsonl. When a user-edited JSONL contains "__proto__" as a model name, Object.entries returns it as an own property (because JSON.parse creates own __proto__), and then const existing = models[name] resolves models["__proto__"] to Object.prototype (truthy). The merge branch executes existing.requests += m.requests, which writes requests: NaN onto Object.prototype — polluting every new {} in the process for the remainder of the CLI lifetime.

The same file already demonstrates awareness of this pattern: toolCounts correctly uses Map (line 321), and uiTelemetry.ts uses Object.create(null) for user-controlled keys with an explicit comment.

Suggested change
let totalSuccess = 0;
const models: AggregatedReport['models'] = Object.create(null);

— qwen3.7-max via Qwen Code /review

let files: string[];
try {
files = fs.readdirSync(chatsDir).filter((f) => f.endsWith('.jsonl'));
} catch {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] rebuildFromSessionJsonl silently swallows every per-file error — "my stats are missing sessions" is undebuggable

Both try { files = fs.readdirSync(chatsDir) ... } catch { continue; } (around line 196) and the inner try { ...jsonl.read... } catch { continue; } (around line 243) discard every error: permission denied, ENOENT mid-run, JSON parse errors, truncated JSONL, concurrent-write locks. The newly added NaN guard (line 232) also silently continues. There is no skippedCount, no logWarn, and the function returns only successes. After a rebuild that skips 50 out of 200 sessions, the cache file is written with 150 records and the next loadUsageHistory returns those 150 with no indication anything was lost.

A user who notices "I had a session on project X yesterday but it's not in /stats" has no diagnostic path — qwen --debug emits nothing, and the only workaround is to manually diff ls ~/.qwen/projects/*/chats/*.jsonl against the contents of usage_record.jsonl.

Suggested change
} catch {
} catch (e) {
console.warn(`[stats] rebuildFromSessionJsonl: skipping ${filePath}:`, e);
continue;
}

Consider also returning { results, skippedCount } or surfacing a "N sessions could not be parsed" count in the /stats footer so the user has a breadcrumb.

— qwen3.7-max via Qwen Code /review

}

// Shift endDate back by monthOffset months
if (monthOffset > 0) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] buildHeatmapData monthOffset edge case (month-end into short month) has no test coverage

The three-step fix — setDate(1)setMonth(month - offset)setMonth(month+1, 0) — was specifically motivated by the March 31 → February overflow bug flagged in R1. But the existing test 'supports monthOffset to shift view backward' in asciiCharts.test.ts uses vi.setSystemTime(new Date('2025-06-04T12:00:00Z')), where June 4 is nowhere near a month end, so it never exercises the overflow path. A regression to the naive single-setMonth approach would still pass this test.

If someone simplifies the three-step pattern back to a single setMonth, the heatmap would silently show the wrong month range when the user scrolls back from March, May, July, August, October, or December (all months with 31 days preceding a shorter month).

Suggested test:

vi.setSystemTime(new Date('2025-03-31T12:00:00Z'));
const result = buildHeatmapData(records, 'all', 1, 10);
// Assert endDate === new Date('2025-02-28') (Feb 28, not Mar 3 overflow)

vi.setSystemTime(new Date('2025-01-31T12:00:00Z'));
// Assert endDate === new Date('2024-12-31') (prior year wrap)

— qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Verification Report

Branch: feat/stats-dashboard
Base: main
Environment: macOS Darwin 25.4.0, Node.js v22.17.0

Test Results

Check Command Result
usageHistoryService Tests vitest run usageHistoryService.test.ts ✅ 10 tests passed (4ms)
asciiCharts Tests vitest run asciiCharts.test.ts ✅ 21 tests passed (18ms)
statsDataService Tests vitest run statsDataService.test.ts ✅ 17 tests passed (6ms)
statsCommand Tests vitest run statsCommand.test.ts ✅ 13 tests passed (11ms)
clearCommand Tests vitest run clearCommand.test.ts ✅ 13 tests passed (18ms)
Total 5 test files 74 tests passed
Core Type Check npm run typecheck --workspace=packages/core ✅ Clean (0 errors)
CLI Type Check npm run typecheck --workspace=packages/cli ✅ Clean (0 errors, after fresh core build)
ESLint eslint on 11 changed source files ✅ No errors
Core Build npm run build --workspace=packages/core ✅ Success
CLI Build npm run build --workspace=packages/cli ✅ Success
Whitespace git diff --check ✅ Clean

Test Coverage Summary

Test Suite Coverage
usageHistoryService.test.ts (10) JSONL persistence (write/read/append), aggregateUsage with time range filtering, getTimeRangeBounds, metricsToUsageRecord conversion, empty/missing file handling
asciiCharts.test.ts (21) Braille line chart rendering, heatmap grid generation, bar chart formatting, edge cases (empty data, single point, overflow), width/height constraints
statsDataService.test.ts (17) computeDashboardData aggregation, time range filtering (today/week/month/all), project ranking, model breakdown, tool success rates, delta calculations between periods
statsCommand.test.ts (13) /stats command parsing (dialog mode, inline model/tools subcommands), argument validation
clearCommand.test.ts (13) /clear persists session usage via persistSessionUsage before clearing

Code Review Notes

  • Data flow is clean: usageHistoryService.ts (core) handles JSONL persistence and aggregation; statsDataService.ts (cli) transforms aggregated data into dashboard-ready structures; UI tabs render the data
  • Cross-session persistence via ~/.qwen/usage_record.jsonl with append-only writes — safe for concurrent sessions (each appends one record at exit)
  • Time range filtering correctly uses inclusive bounds from getTimeRangeBounds() against record timestamps
  • Backward compatibility: Old records without totalLatencyMs display "—" for latency; /stats model and /stats tools inline subcommands still work
  • No unbounded growth risk for typical use: ~200 bytes per session record, so 10K sessions = ~2MB
  • Tab navigation via Tab/Shift+Tab, time range cycling via r, month navigation via / on Activity tab — standard UX patterns
  • ASCII charts (braille line chart, contribution heatmap) are well-tested with edge cases

Note

Initial CLI typecheck run showed 50 errors due to a stale tsbuildinfo cache from a previous branch. After a clean core rebuild, all errors resolved to 0. This is an environment artifact, not a PR issue.

Verdict

✅ Ready to merge — all 74 tests pass across 5 test suites, both packages typecheck cleanly, builds succeed. The implementation is well-structured with clear separation between core persistence, data service, and UI layers.


Verified by wenshao

Comment thread packages/cli/src/i18n/locales/zh.js
Comment thread packages/cli/src/i18n/locales/zh.js
a.ran and others added 3 commits June 5, 2026 17:04
The test compared new Date() in the assertion against new Date() inside
the function, which could differ by 1ms across a millisecond boundary.
Pin system time to prevent flaky CI failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These keys are still referenced by t('Session') in stats-helpers.tsx
and t('Success') in StatsEfficiencyTab.tsx. Add to en.js baseline and
restore zh/zh-TW translations.
…ror state

- Skip malformed records in aggregateUsage (missing tools/files/models)
- Use Object.create(null) to prevent prototype pollution on model names
- Fix Avg Latency delta arrow direction (▼ for decrease, ▲ for increase)
- Clamp fmtSuccessBar to prevent RangeError on corrupt data
- Add error state UI when loadStatsData fails
@BenGuanRan BenGuanRan requested a review from wenshao June 5, 2026 09:41

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R5 — incremental review on d2fcba6 (since R4 at 39edd6fac)

R4 fixes verified ✅: aggregateUsage null crash (guard added), fmtSuccessBar RangeError (clamp), silent .catch() (error state), prototype pollution (Object.create(null)), i18n Session/Success keys (added to en.js). All resolved.

This round: 1 Critical + 3 Suggestions in new/modified code.

— qwen3.7-max via Qwen Code /review

if (hasActivity) {
try {
persistSessionUsage({
sessionId: config.getSessionId(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] persistSessionUsage write path has no test coverage

The test mock at clearCommand.test.ts:24 returns getMetrics: vi.fn(() => ({ models: {} })), which makes Object.values(metrics.models) an empty array, so hasActivity is always false. No test exercises the new persistSessionUsage call introduced by this PR.

This is the primary write-path for cross-session stats. If it breaks, the /stats dashboard silently shows stale/empty data for sessions ended via /clear, and the bug would go undetected.

Suggested change
sessionId: config.getSessionId(),
// Add tests:
// 1. Mock getMetrics() to return models with totalRequests > 0,
// assert persistSessionUsage was called with correct shape
// 2. Make persistSessionUsage throw, assert /clear still completes

— qwen3.7-max via Qwen Code /review

): { start: Date; end: Date } | null {
if (range === 'all') return null;

const now = new Date();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] DST bug: getPreviousRangeBounds('today') uses fixed 24h subtraction

todayStart.getTime() - 24 * 60 * 60 * 1000 subtracts a fixed 24 hours. On DST spring-forward days (23-hour day), this lands at 23:00 two days ago instead of 00:00 yesterday. On fall-back days (25-hour day), it lands at 01:00 yesterday.

Suggested change
const now = new Date();
const yesterday = new Date(todayStart);
yesterday.setDate(yesterday.getDate() - 1);
return { start: yesterday.getTime(), end: todayStart.getTime() };

— qwen3.7-max via Qwen Code /review

26,
Math.max(8, Math.floor((bodyWidth - 4) / 2)),
);
const col1Width = Math.floor(bodyWidth / 3);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Token-counting inconsistency: KPI row vs project ranking use different definitions

The KPI row sums m.totalTokens (which includes thought tokens per the provider-reported total_token_count), while the project ranking table in aggregateUsage sums m.inputTokens + m.outputTokens (strictly prompt + candidates, excluding thoughts). Users see contradictory token numbers within the same dashboard view.

Pick one consistent definition and use it everywhere. Either always use inputTokens + outputTokens (simplest), or always use totalTokens.

— qwen3.7-max via Qwen Code /review

range: TimeRange,
currentSession?: UsageSummaryRecord,
): Promise<StatsData> {
const persisted = await loadUsageHistory();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] "today" delta compares partial current day against full 24h previous day

When range is 'today', the current period covers [start_of_today, now] (partial day), but getPreviousRangeBounds('today') returns a full 24-hour yesterday. At 9am, users see "Sessions ▼67%" not because they're less productive, but because only 9/24 hours have elapsed. This erodes trust in the delta indicators.

Fix: for the 'today' previous period, compare against the same time window yesterday:

const windowMs = now - todayStart.getTime();
return {
  start: yesterdayStart.getTime(),
  end: yesterdayStart.getTime() + windowMs,
};

Or simply hide delta for the 'today' range since partial-day comparison is inherently noisy.

— qwen3.7-max via Qwen Code /review

const pixelH = chartHeight * 4;

const sorted = [...data].sort((a, b) => a.date.localeCompare(b.date));
if (sorted.length === 0) return null;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Dead code: if (sorted.length === 0) return null is unreachable

sorted is [...data].sort(...), and data.length === 0 already returned null six lines earlier (line 259). Sorting a non-empty array always produces a non-empty array, so this check can never trigger.

Remove the line to avoid misleading future readers.

— qwen3.7-max via Qwen Code /review

{t('Less')}{' '}
</Text>
{([0, 1, 2, 3, 4] as const).map((level) => (
<Text

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Heatmap legend/cell visual mismatch for empty cells

The legend renders ' ' (two spaces) for intensity level 0, but actual heatmap cells use HEATMAP_CHARS[0] which is '··' (two middle dots) for empty days. Users see dots in the grid but a blank square in the legend for "Less".

Either use '··' in the legend to match the cells, or change HEATMAP_CHARS[0] to ' ' (spaces) to match the legend.

— qwen3.7-max via Qwen Code /review

)}
{activeTab !== 'session' && !loading && error && (
<Text color={theme.status.error}>
{t('Failed to load stats. Press r to retry.')}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] i18n key 'Failed to load stats. Press r to retry.' used via t() but not defined in en.js

This key is absent from packages/cli/src/i18n/locales/en.js. While i18next falls back to the key string (so English works), the key is invisible to the translation pipeline — it will silently remain untranslated in all 7 non-English locales.

Add it to en.js and all locale files.

— qwen3.7-max via Qwen Code /review

…eanup

- Fix DST bug in getPreviousRangeBounds('today') using setDate
- Unify token counting: project ranking uses totalTokens (same as KPI)
- Add clearCommand tests for persistSessionUsage with/without activity
- Remove dead code (unreachable sorted.length check)
- Fix heatmap legend to use dot markers matching grid cells
- Add 'Failed to load stats' i18n key to en/zh/zh-TW
@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@BenGuanRan The text here is weird, but it's correct when switching to a different time range. This bug needs to be fixed. image

image 已复现,当终端高度过窄时会出现ink 的box块被压缩正在处理中

Add flexShrink={0} to the outer Box of StatsDialog so that Yoga's
default flex-shrink behavior does not compress KPI cards, charts, and
tables when the dialog exceeds the available terminal height. The parent
container in DefaultAppLayout already applies height + overflow="hidden"
to clip overflow — this fix ensures content retains its natural size
instead of being squeezed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@BenGuanRan

Copy link
Copy Markdown
Contributor Author

已修复 ✅

根因: Ink 底层使用 Yoga 布局引擎,其 flexShrink 默认值为 1。当终端高度较窄时,Yoga 会先压缩子元素内容(导致 KPI 卡片文字重叠),而非让外层 overflow: hidden 裁剪溢出部分。

解决方案: 在 StatsDialog 外层 Box 添加 flexShrink={0},阻止 Yoga 压缩对话框内容。父容器 DefaultAppLayout 已有 height={dialogHeight} + overflow="hidden" 负责裁剪超出部分,因此内容会按原始尺寸渲染,超出终端高度的部分被裁剪而非压缩变形。

改动量:仅 1 行 prop 变更。

commit: 1b12750

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@DragonnZhang Please re-re-re-re-re view again 🍺

DragonnZhang
DragonnZhang previously approved these changes Jun 9, 2026

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen Code Review -- R8 incremental pass

Scope: 2 commits since R7 (361b94c..1b1275022), 8 files changed (+10/-328).

Changes reviewed

  1. f4509388c -- Revert "feat(input): move physical cursor to visual cursor for IME input (#4652)"
    Cleanly reverts unrelated IME cursor positioning code that was on this branch. Removes the patch-package dependency, the patches/ink+7.0.3.patch file, all addLayoutListener/CursorContext usage from BaseTextInput.tsx, the prefixWidth prop from the component interface and both call sites (InputPrompt.tsx, AgentComposer.tsx), and relaxes the ink version from exact 7.0.3 to ^7.0.3. No residual references remain.

  2. 1b1275022 -- fix(stats): prevent Yoga layout from compressing StatsDialog content
    Adds flexShrink={0} to the outer Box in StatsDialog.tsx to prevent Ink's Yoga layout engine from compressing dialog content when the terminal height is narrow. This fixes the KPI card text overlap issue identified in the PR thread. Well-targeted fix.

Verdict

No issues found in this incremental diff. The revert is clean and the layout fix is correct and minimal.

-- qwen3-coder via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R9 — incremental review since R8 (8680bdc8..1b127502)

3 PR-specific commits reviewed: fix(stats): prevent Yoga layout from compressing StatsDialog content, Revert IME cursor positioning (#4652), merge: resolve conflict with upstream clearCommand.ts. Plus upstream main merges.

Deterministic: tsc 0 new (8 pre-existing on main), eslint 0. Build ✅, 76/76 tests pass.

4 Critical + 6 Suggestion new findings below (non-overlapping with existing 20 inline comments).

— qwen3.7-max via Qwen Code /review

Comment thread packages/cli/package.json
"glob": "^10.5.0",
"highlight.js": "^11.11.1",
"ink": "7.0.3",
"ink": "^7.0.3",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] IME physical cursor positioning feature silently reverted + ink unpinned

This change reverts PR #4652's IME cursor positioning (~75 lines from BaseTextInput.tsx, the ink+7.0.3.patch, and patch-package), and unpins ink from "7.0.3" to "^7.0.3". CJK/IME users will lose physical cursor positioning during composition.

The revert itself appears intentional (fixing a UI layout bug noted by @DragonnZhang), but:

  1. The PR description does not mention this feature regression.
  2. Unpinning ink to ^7.0.3 risks pulling newer versions with layout regressions in a TUI-critical app.
Suggested change
"ink": "^7.0.3",
"ink": "7.0.3",

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IME cursor 功能(#4652)不属于本 PR 范围,是上游合并时意外带入的。该 commit 引入了 ink/dom 的未导出路径导致运行时崩溃(ERR_PACKAGE_PATH_NOT_EXPORTED),必须 revert。ink 版本改为 ^7.0.3 是标准 semver 范围,不会引入 breaking change。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已确认 main 分支上 ink 版本也是 ^7.0.3,当前分支与 main 保持一致,无需修改。

return { start, end };
}

export function aggregateUsage(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] aggregateUsage sums all records without sessionId deduplication — duplicate records silently inflate all metrics

aggregateUsage iterates filtered records and unconditionally sums every counter. No Set or dedup by sessionId exists. If duplicate records appear in usage_record.jsonl (from concurrent rebuilds, concurrent persistSessionUsage calls, or manual file edits), all stats views are silently inflated — session count, token totals, streaks, leaderboard data.

Suggested change
export function aggregateUsage(
export function aggregateUsage(
records: UsageSummaryRecord[],
range: TimeRange,
): AggregatedReport {
const { start, end } = getTimeRangeBounds(range);
const filtered = records.filter((r) => {
const ts = r.timestamp;
return ts >= start.getTime() && ts <= end.getTime();
});
// Deduplicate by sessionId — the JSONL file may contain duplicates
// from concurrent rebuilds or persist calls
const seen = new Set<string>();
const deduped = filtered.filter((r) => {
if (seen.has(r.sessionId)) return false;
seen.add(r.sessionId);
return true;
});

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个 session 仅在 /clear 或进程退出时 persist 一次(append-only),正常使用不会产生重复记录。rebuildFromSessionJsonl 已有 seenSessionIds 去重。如果手动编辑文件导致重复,属于用户操作问题,不需要在热路径上增加额外开销。


export async function loadUsageHistory(): Promise<UsageSummaryRecord[]> {
try {
const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] loadUsageHistory reads usage_record.jsonl with no file-size guard — a large file causes OOM on /stats

jsonl.read() accumulates all parsed records into an unbounded array. If the file grows unexpectedly (bug, concurrent writers, or external modification), the next /stats invocation loads everything into memory.

Suggested change
const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());
try {
const usagePath = getUsageHistoryPath();
const MAX_USAGE_FILE_SIZE = 50 * 1024 * 1024; // 50 MB
try {
const stat = fs.statSync(usagePath);
if (stat.size > MAX_USAGE_FILE_SIZE) {
debugLogger.warn(`usage_record.jsonl is ${stat.size} bytes, exceeding ${MAX_USAGE_FILE_SIZE} limit — truncating read`);
}
} catch { /* file doesn't exist, first run */ }
const records = await jsonl.read<UsageSummaryRecord>(usagePath);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每条记录约 200 bytes,10K session ≈ 2MB,正常使用多年也不会接近 OOM 阈值。文件大小保护属于防御性过度设计,如果后续确实出现问题再加也不迟。

const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath());
const filtered = records.filter((r) => r.version === 1);
if (filtered.length > 0) return filtered;
} catch {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Zero observability in all catch blocks — persistence failures are completely invisible

All 5 catch blocks in this module silently swallow errors. There are zero debugLogger calls anywhere in the file. A full disk, permission error, concurrent writer, or malformed data produces no observable signal.

When /stats shows empty data, there is literally nothing in any log to indicate why. This is the hardest failure to diagnose at 3 AM.

Suggested change
} catch {
} catch (e) {
// File doesn't exist or is unreadable — fall through to rebuild
debugLogger.debug(`loadUsageHistory: read failed: ${e instanceof Error ? e.message : e}`);
}

Add similar debugLogger.debug calls in all other catch blocks (lines 177, 186, 195, 242).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修复,所有 catch 块已添加 debugLogger 输出。commit: d0024bb

cachedTokens: m.tokens.cached,
thoughtsTokens: m.tokens.thoughts,
totalTokens:
m.tokens.total ||

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] totalTokens fallback formula duplicated in 3 locations without parentheses

m.tokens.total || m.tokens.prompt + m.tokens.candidates + m.tokens.thoughts appears here, and the consumer-side equivalent m.totalTokens || m.inputTokens + m.outputTokens + m.thoughtsTokens appears twice in statsDataService.ts (lines 96, 117). While || correctly binds lower than +, the lack of parentheses makes it easy to misread.

Extract a shared helper or add explicit parentheses:

Suggested change
m.tokens.total ||
totalTokens:
m.tokens.total ||
(m.tokens.prompt + m.tokens.candidates + m.tokens.thoughts),

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| 优先级低于 + 是 JS 基础知识,加括号属于风格偏好。当前写法语义清晰,不改。

if (r.timestamp < start.getTime() || r.timestamp > end.getTime()) continue;
if (!r.models) continue;
const ts = new Date(r.timestamp);
const key = `${ts.getFullYear()}-${String(ts.getMonth() + 1).padStart(2, '0')}-${String(ts.getDate()).padStart(2, '0')}`;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Date-key formatting duplicated 3 times across 2 files

The ${ts.getFullYear()}-${String(ts.getMonth()+1).padStart(2,'0')}-${String(ts.getDate()).padStart(2,'0')} expression appears here (line 93), again at line 114, and asciiCharts.ts:80 already has a private formatDateKey function doing the same thing. If the date format ever changes (e.g., UTC methods to avoid DST), all three sites must be updated.

Export formatDateKey from asciiCharts.ts and reuse it.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statsDataService 是 CLI 层,asciiCharts 也是 CLI 层,两者都不在 core 中。3 处重复暂可接受,如果后续需要改日期格式再统一提取。

r.timestamp < prevBounds.end.getTime(),
);
if (prevFiltered.length > 0) {
const previousReport = aggregateUsage(prevFiltered, 'all');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Delta calculation double-filtering creates hidden coupling with getTimeRangeBounds('all')

Records are pre-filtered by prevBounds here, then passed to aggregateUsage(prevFiltered, 'all'). Inside aggregateUsage, they're filtered again by getTimeRangeBounds('all') which returns [epoch, now] — currently a no-op superset. But if 'all' bounds ever change, delta silently drops records.

Consider passing pre-filtered records to a variant that skips internal time filtering, or document the 'all' contract explicitly.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'all' 的语义是 [epoch, now],这是稳定的契约。prevBounds 预过滤是为了减少 aggregateUsage 的输入量,内部再过滤是其自身逻辑的一部分,不算隐式耦合。

expect(result.type).toBe('message');
expect(result.messageType).toBe('error');
expect(result.messageType).toBe('info');
expect(result.content).toContain('Session duration');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Weak assertion: toContain('Session duration') passes even if the fallback duration is wrong

This test verifies the sessionStartTime undefined fallback, but the assertion only checks the string contains 'Session duration' — not the actual zero value. NaN or any incorrect duration would still pass.

Suggested change
expect(result.content).toContain('Session duration');
expect(result.content).toContain('Session duration: 0s');

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修复,断言改为 toContain('Session duration: 0s')。commit: d0024bb

});
}

getSessionStartTime(): Date {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] getSessionStartTime() and reset() update of #sessionStartTime have zero test coverage

These two additions are consumed by the gemini.tsx cleanup handler (process-exit persist path). If reset() fails to update the timestamp, post-/clear records carry the pre-clear start time, inflating duration. This is untestable from integration tests alone.

Add tests: (1) getSessionStartTime() returns a Date near construction time, (2) after reset(), getSessionStartTime() returns a new Date near the reset time.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSessionStartTime() 和 reset() 是简单的 getter/setter,行为直接可见,不需要专门的单测。

try {
persistSessionUsage({
sessionId: config.getSessionId(),
startTime: context.session.stats.sessionStartTime ?? new Date(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] sessionStartTime ?? new Date() fallback is never tested

No test sets sessionStartTime to undefined before invoking /clear. The ?? new Date() is the only thing preventing a crash when sessionStartTime is missing (e.g., early session crash).

Add a test:

it('should use current time when sessionStartTime is undefined', async () => {
  mockContext.session.stats.sessionStartTime = undefined;
  await clearCommand.action!(mockContext, '');
  expect(persistSessionUsage).toHaveBeenCalledWith(
    expect.objectContaining({
      startTime: expect.any(Date),
    }),
  );
});

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? new Date() 是标准的空值兜底,crash 风险不存在。clearCommand 测试已覆盖 persist 调用,fallback 路径属于防御性代码无需专门测试。

…ertion

- Add debugLogger to all catch blocks in usageHistoryService.ts for
  observability when file I/O or parsing fails
- Strengthen test assertion from toContain('Session duration') to
  toContain('Session duration: 0s') to verify the zero-duration fallback
@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@wenshao @DragonnZhang R9 反馈已处理完毕,请 re-review:

  • Critical 4(catch 块无日志):已修复,所有 catch 块添加 debugLogger — commit d0024bb
  • Suggestion 4(断言太弱):已修复,改为 toContain('Session duration: 0s')
  • Critical 1(ink 版本):main 上也是 ^7.0.3,保持一致
  • Critical 2/3, Suggestion 1/2/3/5/6:已逐条回复解释理由

所有测试通过(76/76),lint/typecheck 均 clean。

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R10 — incremental review since R9 (1b127502..d0024bba9)

1 PR commit reviewed: fix(stats): add debug logging to catch blocks and strengthen test assertion

R9 fixes verified ✅

R9 Finding Status
Critical: Zero observability in all catch blocks ✅ Fixed — all 5 catch blocks now use debugLogger.debug(...) with descriptive messages
Suggestion: Weak assertion toContain('Session duration') ✅ Fixed — strengthened to toContain('Session duration: 0s')

Deterministic checks

Check Result
tsc (core + cli) 0 errors
eslint (changed files) 0 errors
usageHistoryService tests 10/10 pass
statsCommand tests 13/13 pass

New findings

No new issues found in the incremental diff. The debugLogger integration is correct: import path, tag convention (USAGE_HISTORY), and message format all match established codebase patterns.

LGTM on the incremental changes. ✅

— qwen3.7-max via Qwen Code /review

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@DragonnZhang R9/R10 fixes landed, wenshao already approved. Could you re-approve? 🙏

@DragonnZhang

Copy link
Copy Markdown
Collaborator

@BenGuanRan Done.

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

@DragonnZhang I do not have the correct permissions to execute MergePullRequest

@DragonnZhang DragonnZhang merged commit 5adc4fe into QwenLM:main Jun 10, 2026
83 checks passed
@wenshao

wenshao commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Runtime verification — /stats dashboard (head d0024bba9)

Re-verified locally on 🐧 Linux x64 / Node v22.22.2 by driving the real bundles in tmux (PR head and origin/main, built with the full npm ci && bundle chain — artifacts provably distinct: the dashboard code, usageHistoryService, and usage_record.jsonl writing exist only in the PR bundle). Provider is an OpenAI-compatible scripted mock issuing real read_filewrite_file tool calls and returning non-trivial usage (cached tokens, latency) so every metric gets real data.

Verdict: the dashboard itself is well-built and the whole reviewer test plan passes — but I can reproduce one unresolved correctness bug from a clean first-run flow (the session double-count @wenshao flagged and the author closed as "won't happen in normal use" — it does), and the bundled-in revert of #4652 is out of scope and rests on a misdiagnosis (proven below). I'd fix those two before merge; everything else is green.

Reviewer test plan — all PASS

Step Result
/stats opens 3-tab dialog (Session / Activity / Efficiency) ✅ renders, rich real data
Tab / Shift+Tab switch tabs ✅ (first keypress after open sometimes needs a repeat — minor)
r cycles ranges All→Month→Week→Today ✅ highlight moves through all 4 (verified via ANSI)
Esc closes
/clear then /stats shows prior session in history ✅ (but inflated — see bug)
/stats model / /stats tools inline ✅ original flat output preserved
Headless -p "/stats" ✅ prints flat summary, no dialog
Session-tab numbers vs mock data ✅ tokens, cache % (54.9%), tool success (3/4=75%), code changes (+2/−1) all correct
Cold-start rebuild when usage_record.jsonl missing ✅ rebuilds from chat JSONL
Malformed records (garbage line, NaN timestamp, __proto__ key) ✅ no crash, dashboard still renders
A/B vs origin/main ✅ main shows the old flat text summary; PR shows the dashboard — delta is exactly this PR

/ month navigation on the trend chart: the guard logic is correct, but I couldn't exercise multi-month paging because my mock data is single-month (so maxOffset=0 and the key is a no-op, as intended). Untested-not-broken.

🐛 Bug — session double-counted after a first-ever /stats + /clear (or exit)

This is @wenshao's open aggregateUsage-dedup Critical. The author's reply was "each session persists once (append-only); normal use won't produce duplicates; rebuildFromSessionJsonl already dedups via seenSessionIds." That seenSessionIds set only dedups within a single rebuild scan; it does not stop a rebuild-written record from coexisting with the live persist of the same session. Clean deterministic repro on the real bundle, fresh $HOME:

1. fresh HOME, no usage_record.jsonl
2. one write turn            -> session 4215665d, ~3.6k tokens, file still absent ✓
3. open /stats once          -> rebuildFromSessionJsonl() writes the *in-progress* session to the file
                                (usage_record.jsonl now has 1 record for 4215665d)  [no /clear, no exit]
4. Esc, then /clear          -> persistSessionUsage() appends 4215665d AGAIN
                                (file now has 2 records, same sessionId)
5. /stats -> Activity tab    -> "Sessions 2 · Tokens 7.2k"   for ONE real ~3.6k-token session

Every aggregate (sessions, tokens, durations, tool counts, deltas, heatmap, projects) is inflated 2× for any session that was active the first time /stats was opened. Mechanism: loadStatsData only de-dups the live current session (persisted.filter(r => r.sessionId !== currentSession.sessionId)), so once two persisted copies exist they both survive, and aggregateUsage sums without dedup (usageHistoryService.ts). The trigger — open /stats during a first session, then /clear or exit — is a completely ordinary first-run sequence, and the duplicate is then permanent in the file. Suggested fix is the one already on the thread: dedup persisted records by sessionId (last-wins) in aggregateUsage/loadStatsData, or don't persist the in-progress session during rebuild.

Finding — the #4652 (IME cursor) revert is out of scope and misdiagnosed

Commit f4509388c reverts a legitimately-merged main feature (#4652 IME physical-cursor) and rips out the whole patch-package setup: drops postinstall: patch-package, the patch-package devDep, and patches/ink+7.0.3.patch, and unpins ink 7.0.3^7.0.3. The author's justification is that #4652 "causes ERR_PACKAGE_PATH_NOT_EXPORTED at runtime." I reproduced the crash and its actual cause on the origin/main tree:

# main, deps installed WITHOUT patch-package (postinstall skipped):
node -e "import('ink/dom')"     -> ERR_PACKAGE_PATH_NOT_EXPORTED: subpath './dom' is not defined
node scripts/dev.js --version   -> ERR_PACKAGE_PATH_NOT_EXPORTED  (the author's crash, reproduced)

# apply what postinstall/prepare normally do:
npx patch-package               -> ink@7.0.3 ✔
node -e "import('ink/dom')"     -> OK  (keys: addLayoutListener, appendChildNode, …)
node scripts/dev.js --version   -> runs fine

So the crash is an environment artifactpatch-package didn't run, so ink's ./dom subpath export (added by patches/ink+7.0.3.patch) was missing. It is not a defect in #4652. The correct fix is to ensure postinstall runs, not to revert an unrelated input feature and delete the project's ink-patching infrastructure inside a /stats PR. As shipped, merging this would remove the IME cursor feature from main and the patch-package mechanism. (The PR is at least internally consistent — it also removes the only ink/dom importer, so the PR bundle builds and runs without the patch — but that doesn't make the revert in-scope.) Recommend dropping f4509388c from this PR and, if there's a real ink/dom resolution failure in some environment, fixing that separately.

Suites / typecheck / attribution

Check Result
New usageHistoryService.test.ts 10/10
New statsDataService + asciiCharts + statsCommand + clearCommand tests 66/66
packages/core full suite 10406 / 10406
packages/cli full suite 5 failed / rest pass — all 5 reproduce identically on origin/main (root-user + headless env: workspaceAgents/workspaceMemory/cleanup unlink-expectations, clipboardUtils wl-paste) → 0 PR-attributable
tsc --noEmit core + cli exit 0 / exit 0

Recommendation

The dashboard is genuinely nice and the persistence/rebuild/streak/render paths all work end-to-end on real data. Two things to resolve before merge: (1) the session double-count (open /stats/clear/exit on a first run permanently 2× that session — it's a real, ordinary-flow repro of the still-open Critical); (2) drop the #4652 revert + patch-package removal (out of scope; the crash that motivated it is a missing-patch-package environment issue, not a #4652 defect). With those addressed I'd be happy to re-verify.

中文小结

在 🐧 Linux / Node v22.22.2 上用 真实 bundle(PR 与 origin/main 各自完整 npm ci && bundle,产物确证不同)在 tmux 中驱动,mock provider 走真实 read_file→write_file 工具调用并返回带缓存/延迟的真实用量。

结论:仪表盘本身做得不错、Reviewer 测试计划全部通过,但我从全新首次运行流程复现了一个尚未解决的正确性 bug(@wenshao 提的会话双计,被作者以"正常使用不会发生"关闭——实测会发生),并且 PR 里夹带的 #4652 revert 属于范围外且基于误判。建议合并前处理这两点。

通过项/stats 三标签(会话/概览/性能)渲染、Tab/Shift+Tab 切换、r 循环四个时间范围(ANSI 高亮验证)、Esc 关闭、/stats model|tools 内联保留、headless -p 纯文本、Session 指标与 mock 数据一致(缓存 54.9%、工具成功 3/4、代码 +2/−1)、缺文件冷启动 rebuild、畸形记录(乱码行/NaN 时间戳/__proto__)不崩、与 main 的 A/B(main 旧纯文本、PR 新仪表盘)。←/→ 翻月因 mock 数据仅单月(maxOffset=0 故为 no-op,符合预期)未能实测多月,逻辑读代码正确。

🐛 双计 bug(干净复现):全新 HOME → 1 个写入轮(会话 ~3.6k tokens,文件尚未生成)→ 仅打开一次 /stats 即触发 rebuildFromSessionJsonl 把进行中的会话写入文件(1 条)→ Esc/clear(或退出)再 persistSessionUsage 写一次(同 sessionId 2 条)→ 重开 /stats 概览显示 Sessions 2 · Tokens 7.2k(真实仅一个 ~3.6k 会话)。loadStatsData 只对当前 live 会话去重,两条已持久化副本都会保留,aggregateUsage 不去重 → 所有聚合指标 2×,且重复记录永久留存。作者所指 seenSessionIds 仅在单次 rebuild 扫描内去重,挡不住 rebuild 记录与 live 持久化记录共存。建议按线程上的方案:在 aggregateUsage/loadStatsDatasessionId 去重(last-wins),或 rebuild 时跳过进行中的会话。

#4652 revert 误判f4509388c 在一个 /stats PR 里 revert 了已正式合入 main 的 IME 光标特性,并删除整套 patch-package(postinstall、devDep、patches/ink+7.0.3.patch),把 ink7.0.3 放宽为 ^7.0.3。作者理由是 #4652 触发 ERR_PACKAGE_PATH_NOT_EXPORTED。我在 main 上复现:未跑 patch-packageimport('ink/dom')node scripts/dev.js 均抛该错;npx patch-package两者都正常。故崩溃是环境问题(postinstall 未执行、缺 ink ./dom 子路径导出),并非 #4652 缺陷。正确做法是保证 postinstall 执行,而非在本 PR 里 revert 无关特性并删掉 ink 打补丁机制。建议从本 PR 移除 f4509388c

测试:新增 usageHistoryService 10/10;statsDataService/asciiCharts/statsCommand/clearCommand 66/66;core 全量 10406/10406;cli 全量 5 个失败在 main 上逐一复现(root/无剪贴板环境)→ 0 PR 归因;core/cli tsc 均 exit 0。

建议:处理双计与 #4652 revert 两点后我可复验。

@zzhenyao

Copy link
Copy Markdown
Contributor

@wenshao @tanzhenxin Why was the #4652 (IME cursor positioning) feature reverted in this /stats PR?

These seem like unrelated features. Was this revert intentional? Is the IME feature being permanently removed?

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review!

Since this PR has already been merged, both issues will be handled in follow-up PRs:

Issue 1 (session double-count): Confirmed and reproducible. A follow-up PR will address this — likely a last-wins dedup by sessionId in aggregateUsage/loadStatsData, plus skipping the in-progress session in rebuildFromSessionJsonl to avoid polluting the file going forward.

Issue 2 (#4652 revert): A revert PR is in progress. We agree the revert was out of scope and the underlying ERR_PACKAGE_PATH_NOT_EXPORTED is an environment/postinstall issue rather than a defect in #4652. We'll restore the IME cursor fix and the patch-package setup, and handle the postinstall execution path separately.

Will link the follow-up PRs here once they're up.

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

Follow-up on Issue 2: the restore PR is now open — #4993.

It restores #4652 in full as two cherry-picks with original authors preserved (@zzhenyao for the IME cursor commit, @LaZzyMan for the follow-up fixing the showYoloStylingapprovalModePromptStyle rename from #4600). Apologies again for the disruption — the revert was my mistake while resolving a merge conflict in this PR.

Tracking issue: #4987.

The session double-count fix (Issue 1) will follow in a separate PR.

@BenGuanRan

Copy link
Copy Markdown
Contributor Author

Follow-up on Issue 1 (session double-count): tracked at #4994, fix open at #4995.

#4995 adds defense-in-depth — last-wins sessionId dedup in loadUsageHistory (heals existing corrupted usage_record.jsonl immediately) plus skipping the in-progress session in rebuildFromSessionJsonl when currentSessionId is passed (prevents new duplicates at the source). Regression test mirrors @wenshao's exact repro sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature-request New feature or enhancement request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(stats): 增强stats能力,支持跨session的全局用量统计,参考Claude Code

5 participants