feat(stats): add interactive /stats dashboard with cross-session tracking#4779
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add usageHistoryService to core with JSONL-based persistence, session replay from chat history with sessionId deduplication, time-range aggregation, and per-model/tool/file breakdown including latency fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add statsDataService for delta calculations, efficiency metrics, tool leaderboard, and heatmap/trend data. Add asciiCharts with braille line chart (Bresenham rendering) and GitHub-style contribution heatmap. Includes 38 unit tests covering both modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add three-tab dialog: Session (live metrics), Activity (KPIs, heatmap, braille token trend chart, project ranking), and Efficiency (cache rate, tool success, latency cards, tool leaderboard, model comparison table). Supports tab/shift-tab navigation, r to cycle time ranges (all/month/ week/today), left/right to pan months in the trend chart, esc to close. Persist usage on /clear for accurate cross-session tracking. Update statsCommand tests for new dialog behavior and clearCommand tests for telemetry mock. Update /stats documentation in commands.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add translations for stats dashboard UI strings in zh, zh-TW, ca, de, fr, ja, pt, ru. Add stats keys to en.js baseline and mustTranslateKeys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Intensity 0 cells (no activity) now render without backgroundColor, inheriting the terminal's native background instead of a hardcoded color that renders incorrectly across different terminal themes. Also fix green gradient direction: brighter = more activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5351c91 to
9095aa7
Compare
Inactive cells render as '··' with no background color instead of colored blocks, matching common contribution graph designs. Active cells keep their green gradient backgrounds. Fix gradient direction so brighter green = more activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add back Session ID, Success Rate with color thresholds, User Agreement rate, Performance breakdown (Wall Time, Agent Active, API Time %, Tool Time %), and full token counts that were present in the original exit screen but missing from the new Session tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix monthOffset overflow: setDate(1) before subtracting months to prevent day-count overflow (e.g. Mar 31 → Feb) - Fix UTC date-parse off-by-one: append 'T00:00:00' to date-only strings in calculateStreaks and HeatmapView fmtDate - Fix current session double-counted after rebuild: deduplicate by sessionId when injecting live session into loadStatsData - Remove unused bodyWidth prop from SessionTab - Remove 13 unused i18n keys (Overview, Favorite model, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao
left a comment
There was a problem hiding this comment.
R2: New findings after R1 fixes verified ✅
All 6 R1 issues have been correctly addressed in commit 7ed5dbe03. This round found 3 new Critical issues (data corruption, crash, memory/UX) and 2 Suggestions.
Needs Human Review (low confidence):
- Test coverage gaps:
/clearhasActivitybranch never exercised (mock always returns empty models),loadStatsDatacurrentSessiondedup untested,calculateStreakshas no dedicated edge-case tests,loadUsageHistoryversion filter + rebuild fallback untested
— qwen3.7-max via Qwen Code /review
|
|
||
| if (!hasEvents) continue; | ||
|
|
||
| const startTime = new Date(firstRecord.timestamp).getTime(); |
There was a problem hiding this comment.
[Critical] NaN timestamp from malformed chat records propagates to usage_record.jsonl as permanent phantom records
If firstRecord.timestamp is missing, empty, or malformed (which occurs naturally via corruption, legacy records, or interrupted writes), new Date(undefined).getTime() returns NaN. This NaN flows into metricsToUsageRecord as timestamp, startTime, and durationMs. JSON.stringify serializes NaN → null, so the record is persisted as "timestamp":null.
On subsequent reads, the null-timestamp record passes the version === 1 filter and enters aggregateUsage. For the 'all' range, null >= 0 evaluates to true in JavaScript, so the phantom record inflates sessionCount. The projectMap accumulates an entry keyed on undefined (from missing cwd), producing a mystery unnamed project in the dashboard.
This corruption persists until the user manually deletes usage_record.jsonl.
| const startTime = new Date(firstRecord.timestamp).getTime(); | |
| const startTime = new Date(firstRecord.timestamp).getTime(); | |
| const lastRecord = records[records.length - 1]!; | |
| const endTime = new Date(lastRecord.timestamp).getTime(); | |
| if (isNaN(startTime) || isNaN(endTime) || !sessionId) continue; |
— qwen3.7-max via Qwen Code /review
| config.getProjectRoot(), | ||
| stats.metrics, | ||
| ); | ||
| loadStatsData(range, liveRecord).then((d) => { |
There was a problem hiding this comment.
[Critical] Unhandled promise rejection crashes the TUI
loadStatsData(range, liveRecord).then(...) has no .catch() handler. If loadStatsData throws (e.g., from rebuildFromSessionJsonl encountering an unexpected shape, or aggregateUsage propagating NaN), the rejected promise is unhandled.
There is no React error boundary (componentDidCatch) anywhere in the CLI UI tree — I searched packages/cli/src and found zero matches. An unhandled rejection in Ink/React crashes the entire process. The user types /stats, the TUI dies silently.
Additionally, there's no error state in the UI. If the load fails, the dialog would either show "Loading stats..." forever or render an empty box.
| loadStatsData(range, liveRecord).then((d) => { | |
| loadStatsData(range, liveRecord) | |
| .then((d) => { | |
| setData(d); | |
| setLoading(false); | |
| }) | |
| .catch((err) => { | |
| console.error('Failed to load stats:', err); | |
| setLoading(false); | |
| }); |
— qwen3.7-max via Qwen Code /review
| }; | ||
| } | ||
|
|
||
| async function rebuildFromSessionJsonl(): Promise<UsageSummaryRecord[]> { |
There was a problem hiding this comment.
[Critical] rebuildFromSessionJsonl loads entire chat JSONL files into memory and uses synchronous I/O on the main thread
Two compounding issues:
-
Memory spike:
jsonl.read<ChatRecord>(filePath)loads the entire file into memory. EachChatRecordcontains fullContentobjects (assistant text, tool results with file contents). A single chat JSONL for a session with heavy tool use can exceed 100 MB. The function iterates ALL project directories and ALL chat files. -
TUI freeze:
fs.readdirSync,fs.existsSync, andwriteLineSync(which wrapsfs.appendFileSync) are all synchronous. For a user with hundreds of sessions, the rebuild blocks the Node.js event loop for 10+ seconds. The only feedback is "Loading stats..." text. A user may kill the process thinking it's hung, potentially interruptingwriteLineSyncmid-write and leavingusage_record.jsonlpartially written.
Consider streaming records and discarding non-telemetry entries immediately, or capping the backfill to the N most recent files by modification time.
— qwen3.7-max via Qwen Code /review
| const safeWidth = Math.max(72, width ?? 100); | ||
| const bodyWidth = safeWidth - 6; | ||
|
|
||
| useEffect(() => { |
There was a problem hiding this comment.
[Suggestion] useEffect has no cancellation/staleness guard + re-reads file on every range change
Two related issues:
-
Race condition: Pressing
rrapidly to cycle ranges fires multipleloadStatsDatacalls concurrently. A slower earlier request can resolve after a faster later one, briefly displaying data for the wrong range. -
Redundant I/O: Each range change triggers
loadUsageHistory()which reads and parses the entireusage_record.jsonlfrom disk. Cycling through 4 ranges = 4 full file reads, even though the underlying data hasn't changed — only the filter window differs.
| useEffect(() => { | |
| useEffect(() => { | |
| let stale = false; | |
| setLoading(true); | |
| const liveRecord = buildCurrentSessionRecord( | |
| stats.sessionId, | |
| stats.sessionStartTime, | |
| config.getProjectRoot(), | |
| stats.metrics, | |
| ); | |
| loadStatsData(range, liveRecord).then((d) => { | |
| if (!stale) { | |
| setData(d); | |
| setLoading(false); | |
| } | |
| }); | |
| return () => { stale = true; }; | |
| // eslint-disable-next-line react-hooks/exhaustive-deps -- only reload on range/session change, not every metrics tick | |
| }, [range, stats.sessionId]); |
For the redundant I/O, consider caching raw records in a useRef and re-aggregating in memory when only the range changes.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
- 触发条件极其罕见 — 只在第一次使用/stats指令会基于历史session数据构建 usage_record.jsonl ,只有用户手动删除文件才会 rebuild。
- 实际数据量有限 — 本地 rebuild 跑过mock测试,71-81 个 session,耗时 < 2 秒。100MB 单文件的 session
是极端边界。 - 已有 try/catch — 单个文件解析失败不会中断整体 rebuild。
- 修复成本高 — 流式读取需要改 jsonl.read 工具函数的接口,影响面大。
- Guard against malformed chat records with NaN timestamps in rebuild - Add .catch() to loadStatsData promise to prevent TUI crash - Add stale flag to useEffect to prevent race on rapid range cycling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 19 missing translations to zh-TW.js, remove extra keys from zh.js and zh-TW.js that were deleted from en.js in prior cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wenshao
left a comment
There was a problem hiding this comment.
A few cross-cutting items that don't map to a single line (all non-blocking):
- Concurrent rebuild can duplicate history.
loadUsageHistorywrites the rebuilt records back tousage_record.jsonlon the read path whenever the file is empty/missing, with no lock. Two CLI instances opening/stats(or exiting) before either has written will each append the full reconstructed history → permanently doubled totals; large records can also tear under concurrentappendFileSync. Consider not persisting on the read path, or writing rebuilt data atomically under a lock. - Silent blank dashboard on load failure.
StatsDialog'sloadStatsData(...).catch(() => setLoading(false))swallows the error with no log and leavesdatanull, so the Activity/Efficiency body renders empty with no message — indistinguishable from "no data" and undiagnosable. Surface an error state and log via the debug logger. - Render-path recomputation.
buildBrailleLineChart/buildHeatmapDataand the per-day aggregation re-run on every dialog re-render/keypress (nouseMemo); memoizing on their inputs removes the redundant work. - Heatmap month coupling.
StatsActivityTabpasses the token-chart's data-month index (clampedOffset) straight toHeatmapViewas a calendar-month offset, and it isn't reset when leaving range='all'; gating it (range === 'all' ? clampedOffset : 0) avoids the heatmap desyncing from the rest of the tab. - CI
Lintis red (thecheck-i18nstep). It looks largely pre-existing (most of the 203 unused keys are unrelated auth strings), but the new locale changes add to it — worth runningnpm run check-i18nlocally to confirm. - Committed design docs.
.qwen/plans/…md(1337 lines) and.qwen/specs/…md(270 lines) appear force-added past.gitignore(.qwen/*); confirm that's intentional vs. local-only scratch.
— claude-opus-4-8 via Qwen Code /qreview
| : theme.status.error | ||
| } | ||
| > | ||
| {data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '} |
There was a problem hiding this comment.
[Suggestion] Avg Latency delta arrow points the wrong way
The other five KPI deltas render the arrow from the sign of the change (delta >= 0 ? '▲' : '▼'), but Avg Latency uses avgLatency <= 0 ? '▲' : '▼'. Since computeDelta defines avgLatency = current − previous, a negative delta means latency dropped — yet it renders ▲. So a falling latency (good) shows an up-arrow and a rising latency shows a down-arrow, contradicting the number and every other card. The color is already correct; only the glyph is inverted.
| {data.delta.avgLatency <= 0 ? '\u25B2' : '\u25BC'}{' '} | |
| {data.delta.avgLatency <= 0 ? '\u25BC' : '\u25B2'}{' '} |
Keep the inverted color test (<= 0 ? success : error); just flip the glyph so it tracks direction.
— claude-opus-4-8 via Qwen Code /qreview
| key={tool.name} | ||
| cells={[ | ||
| { | ||
| text: ' ' + tool.name.slice(0, 14), |
There was a problem hiding this comment.
[Suggestion] Untrusted model/tool/project names are rendered to the terminal unsanitized
tool.name here (and the model name at line 193, the model name in StatsSessionTab, and the project path in StatsActivityTab) comes verbatim from usage_record.jsonl / the rebuilt chat logs and is rendered via ink <Text>. .slice(0, 14) truncates by code-unit count but does not strip control bytes, so an embedded ESC (\x1b…) or newline in a name survives into the terminal — enabling cursor/ANSI escapes or broken table alignment when /stats is opened on a crafted or corrupted record. Defense-in-depth: strip control chars before rendering (e.g. name.replace(/[\u0000-\u001f\u007f-\u009f]/g, '')) at the point these fields enter the render layer.
— claude-opus-4-8 via Qwen Code /qreview
| } | ||
|
|
||
| export function fmtSuccessBar(rate: number): string { | ||
| const filled = Math.round(rate / 10); |
There was a problem hiding this comment.
[Suggestion] fmtSuccessBar can throw RangeError and crash the CLI on malformed data
'░'.repeat(10 - filled) throws when filled > 10 (or '█'.repeat(filled) when filled < 0). Normal telemetry keeps success ≤ count so rate ∈ [0,100] and this is safe today — but successRate ultimately comes from usage_record.jsonl, which loadUsageHistory accepts on a version === 1 check alone (no field validation). A single corrupt/hand-edited line with success > count makes rate > 100, and since the dialog has no React error boundary, the throw during render takes down the whole TUI. Cheap guard:
| const filled = Math.round(rate / 10); | |
| const filled = Math.max(0, Math.min(10, Math.round(rate / 10))); |
— claude-opus-4-8 via Qwen Code /qreview
| }; | ||
| } | ||
|
|
||
| describe('aggregateUsage', () => { |
There was a problem hiding this comment.
[Suggestion] The rebuild / load / streak paths — where the prior review found several bugs — have no test coverage
This suite exercises metricsToUsageRecord and aggregateUsage only. rebuildFromSessionJsonl + loadUsageHistory (the cold-start path) and the streak / currentSession-merge logic in statsDataService are never invoked, so the recently-added guards can regress silently. Worth covering: rebuild dedup via seenSessionIds, the isNaN(timestamp) skip, the hasEvents === false skip, missing/empty projects/ and chats/ dirs → [], the write-back to usage_record.jsonl, loadUsageHistory falling through to rebuild when the file is absent, calculateStreaks (consecutive / gap / last-activity-stale / single-day), and loadStatsData dedup when a persisted record shares the live sessionId.
— claude-opus-4-8 via Qwen Code /qreview
wenshao
left a comment
There was a problem hiding this comment.
— qwen3.7-max via Qwen Code /review
| } | ||
| }) | ||
| .catch(() => { | ||
| if (!stale) setLoading(false); |
There was a problem hiding this comment.
[Suggestion] .catch() silently swallows all errors — activity/efficiency tabs render a blank content area on failure
The new handler clears loading but sets no error state. The render gate at lines 205 / 211 requires both !loading && data, so when loadStatsData rejects the activity and efficiency tabs render nothing — no error message, no retry affordance. The user sees a seemingly healthy empty box, indistinguishable from "no data in this range." Combined with the silent continues in rebuildFromSessionJsonl and aggregateUsage, a corrupted ~/.qwen/usage_record.jsonl now bricks /stats invisibly.
| if (!stale) setLoading(false); | |
| const [error, setError] = useState(false); | |
| // ... | |
| loadStatsData(range, liveRecord) | |
| .then((d) => { | |
| if (!stale) { | |
| setData(d); | |
| setLoading(false); | |
| setError(false); | |
| } | |
| }) | |
| .catch((err) => { | |
| console.error('[stats] loadStatsData failed:', err); | |
| if (!stale) { | |
| setError(true); | |
| setLoading(false); | |
| } | |
| }); |
Then render a fallback when !loading && error && activeTab !== 'session' (e.g., <Text color={theme.text.error}>{t('Failed to load stats. Press r to retry.')}</Text>).
— qwen3.7-max via Qwen Code /review
| totalDurationMs += r.durationMs; | ||
| totalLatencyMs += r.totalLatencyMs ?? 0; | ||
| totalCalls += r.tools.totalCalls; | ||
| totalSuccess += r.tools.totalSuccess; |
There was a problem hiding this comment.
[Critical] aggregateUsage unconditionally deep-accesses record fields — a single malformed record in usage_record.jsonl crashes /stats permanently
totalCalls += r.tools.totalCalls (line 338), r.tools.totalSuccess, r.tools.totalFail, r.files.linesAdded, Object.entries(r.models), and Object.entries(r.tools.byName) all throw TypeError if any persisted UsageSummaryRecord is missing one of those fields (truncated write, older schema, concurrent-write race, disk full mid-append). The throw propagates through loadStatsData into the .catch() above, leaving the dashboard blank with no indication of which record is poisoned.
The worst failure mode: one corrupted line in ~/.qwen/usage_record.jsonl bricks /stats permanently for that user. The user can still use the rest of the CLI, so they report "the stats command is buggy" with no useful diagnostic.
| totalSuccess += r.tools.totalSuccess; | |
| for (const r of filtered) { | |
| try { | |
| if (!r.tools?.byName || !r.files || !r.models) continue; | |
| totalDurationMs += r.durationMs; | |
| // ... rest of loop body | |
| } catch (e) { | |
| console.warn(`[stats] aggregateUsage: skipping malformed record sessionId=${r.sessionId}:`, e); | |
| continue; | |
| } | |
| } |
— qwen3.7-max via Qwen Code /review
|
|
||
| const models: AggregatedReport['models'] = {}; | ||
| let totalCalls = 0; | ||
| let totalSuccess = 0; |
There was a problem hiding this comment.
[Suggestion] models: {} is a plain object accumulating model stats from JSON-parsed JSONL — prototype pollution via __proto__ key
const models: AggregatedReport['models'] = {}; is iterated by Object.entries(r.models) where r.models comes from usage_record.jsonl. When a user-edited JSONL contains "__proto__" as a model name, Object.entries returns it as an own property (because JSON.parse creates own __proto__), and then const existing = models[name] resolves models["__proto__"] to Object.prototype (truthy). The merge branch executes existing.requests += m.requests, which writes requests: NaN onto Object.prototype — polluting every new {} in the process for the remainder of the CLI lifetime.
The same file already demonstrates awareness of this pattern: toolCounts correctly uses Map (line 321), and uiTelemetry.ts uses Object.create(null) for user-controlled keys with an explicit comment.
| let totalSuccess = 0; | |
| const models: AggregatedReport['models'] = Object.create(null); |
— qwen3.7-max via Qwen Code /review
| let files: string[]; | ||
| try { | ||
| files = fs.readdirSync(chatsDir).filter((f) => f.endsWith('.jsonl')); | ||
| } catch { |
There was a problem hiding this comment.
[Suggestion] rebuildFromSessionJsonl silently swallows every per-file error — "my stats are missing sessions" is undebuggable
Both try { files = fs.readdirSync(chatsDir) ... } catch { continue; } (around line 196) and the inner try { ...jsonl.read... } catch { continue; } (around line 243) discard every error: permission denied, ENOENT mid-run, JSON parse errors, truncated JSONL, concurrent-write locks. The newly added NaN guard (line 232) also silently continues. There is no skippedCount, no logWarn, and the function returns only successes. After a rebuild that skips 50 out of 200 sessions, the cache file is written with 150 records and the next loadUsageHistory returns those 150 with no indication anything was lost.
A user who notices "I had a session on project X yesterday but it's not in /stats" has no diagnostic path — qwen --debug emits nothing, and the only workaround is to manually diff ls ~/.qwen/projects/*/chats/*.jsonl against the contents of usage_record.jsonl.
| } catch { | |
| } catch (e) { | |
| console.warn(`[stats] rebuildFromSessionJsonl: skipping ${filePath}:`, e); | |
| continue; | |
| } |
Consider also returning { results, skippedCount } or surfacing a "N sessions could not be parsed" count in the /stats footer so the user has a breadcrumb.
— qwen3.7-max via Qwen Code /review
| } | ||
|
|
||
| // Shift endDate back by monthOffset months | ||
| if (monthOffset > 0) { |
There was a problem hiding this comment.
[Suggestion] buildHeatmapData monthOffset edge case (month-end into short month) has no test coverage
The three-step fix — setDate(1) → setMonth(month - offset) → setMonth(month+1, 0) — was specifically motivated by the March 31 → February overflow bug flagged in R1. But the existing test 'supports monthOffset to shift view backward' in asciiCharts.test.ts uses vi.setSystemTime(new Date('2025-06-04T12:00:00Z')), where June 4 is nowhere near a month end, so it never exercises the overflow path. A regression to the naive single-setMonth approach would still pass this test.
If someone simplifies the three-step pattern back to a single setMonth, the heatmap would silently show the wrong month range when the user scrolls back from March, May, July, August, October, or December (all months with 31 days preceding a shorter month).
Suggested test:
vi.setSystemTime(new Date('2025-03-31T12:00:00Z'));
const result = buildHeatmapData(records, 'all', 1, 10);
// Assert endDate === new Date('2025-02-28') (Feb 28, not Mar 3 overflow)
vi.setSystemTime(new Date('2025-01-31T12:00:00Z'));
// Assert endDate === new Date('2024-12-31') (prior year wrap)— qwen3.7-max via Qwen Code /review
Verification ReportBranch: Test Results
Test Coverage Summary
Code Review Notes
NoteInitial CLI typecheck run showed 50 errors due to a stale Verdict✅ Ready to merge — all 74 tests pass across 5 test suites, both packages typecheck cleanly, builds succeed. The implementation is well-structured with clear separation between core persistence, data service, and UI layers. Verified by wenshao |
The test compared new Date() in the assertion against new Date() inside the function, which could differ by 1ms across a millisecond boundary. Pin system time to prevent flaky CI failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These keys are still referenced by t('Session') in stats-helpers.tsx
and t('Success') in StatsEfficiencyTab.tsx. Add to en.js baseline and
restore zh/zh-TW translations.
…ror state - Skip malformed records in aggregateUsage (missing tools/files/models) - Use Object.create(null) to prevent prototype pollution on model names - Fix Avg Latency delta arrow direction (▼ for decrease, ▲ for increase) - Clamp fmtSuccessBar to prevent RangeError on corrupt data - Add error state UI when loadStatsData fails
wenshao
left a comment
There was a problem hiding this comment.
R5 — incremental review on d2fcba6 (since R4 at 39edd6fac)
R4 fixes verified ✅: aggregateUsage null crash (guard added), fmtSuccessBar RangeError (clamp), silent .catch() (error state), prototype pollution (Object.create(null)), i18n Session/Success keys (added to en.js). All resolved.
This round: 1 Critical + 3 Suggestions in new/modified code.
— qwen3.7-max via Qwen Code /review
| if (hasActivity) { | ||
| try { | ||
| persistSessionUsage({ | ||
| sessionId: config.getSessionId(), |
There was a problem hiding this comment.
[Critical] persistSessionUsage write path has no test coverage
The test mock at clearCommand.test.ts:24 returns getMetrics: vi.fn(() => ({ models: {} })), which makes Object.values(metrics.models) an empty array, so hasActivity is always false. No test exercises the new persistSessionUsage call introduced by this PR.
This is the primary write-path for cross-session stats. If it breaks, the /stats dashboard silently shows stale/empty data for sessions ended via /clear, and the bug would go undetected.
| sessionId: config.getSessionId(), | |
| // Add tests: | |
| // 1. Mock getMetrics() to return models with totalRequests > 0, | |
| // assert persistSessionUsage was called with correct shape | |
| // 2. Make persistSessionUsage throw, assert /clear still completes |
— qwen3.7-max via Qwen Code /review
| ): { start: Date; end: Date } | null { | ||
| if (range === 'all') return null; | ||
|
|
||
| const now = new Date(); |
There was a problem hiding this comment.
[Suggestion] DST bug: getPreviousRangeBounds('today') uses fixed 24h subtraction
todayStart.getTime() - 24 * 60 * 60 * 1000 subtracts a fixed 24 hours. On DST spring-forward days (23-hour day), this lands at 23:00 two days ago instead of 00:00 yesterday. On fall-back days (25-hour day), it lands at 01:00 yesterday.
| const now = new Date(); | |
| const yesterday = new Date(todayStart); | |
| yesterday.setDate(yesterday.getDate() - 1); | |
| return { start: yesterday.getTime(), end: todayStart.getTime() }; |
— qwen3.7-max via Qwen Code /review
| 26, | ||
| Math.max(8, Math.floor((bodyWidth - 4) / 2)), | ||
| ); | ||
| const col1Width = Math.floor(bodyWidth / 3); |
There was a problem hiding this comment.
[Suggestion] Token-counting inconsistency: KPI row vs project ranking use different definitions
The KPI row sums m.totalTokens (which includes thought tokens per the provider-reported total_token_count), while the project ranking table in aggregateUsage sums m.inputTokens + m.outputTokens (strictly prompt + candidates, excluding thoughts). Users see contradictory token numbers within the same dashboard view.
Pick one consistent definition and use it everywhere. Either always use inputTokens + outputTokens (simplest), or always use totalTokens.
— qwen3.7-max via Qwen Code /review
| range: TimeRange, | ||
| currentSession?: UsageSummaryRecord, | ||
| ): Promise<StatsData> { | ||
| const persisted = await loadUsageHistory(); |
There was a problem hiding this comment.
[Suggestion] "today" delta compares partial current day against full 24h previous day
When range is 'today', the current period covers [start_of_today, now] (partial day), but getPreviousRangeBounds('today') returns a full 24-hour yesterday. At 9am, users see "Sessions ▼67%" not because they're less productive, but because only 9/24 hours have elapsed. This erodes trust in the delta indicators.
Fix: for the 'today' previous period, compare against the same time window yesterday:
const windowMs = now - todayStart.getTime();
return {
start: yesterdayStart.getTime(),
end: yesterdayStart.getTime() + windowMs,
};Or simply hide delta for the 'today' range since partial-day comparison is inherently noisy.
— qwen3.7-max via Qwen Code /review
| const pixelH = chartHeight * 4; | ||
|
|
||
| const sorted = [...data].sort((a, b) => a.date.localeCompare(b.date)); | ||
| if (sorted.length === 0) return null; |
There was a problem hiding this comment.
[Suggestion] Dead code: if (sorted.length === 0) return null is unreachable
sorted is [...data].sort(...), and data.length === 0 already returned null six lines earlier (line 259). Sorting a non-empty array always produces a non-empty array, so this check can never trigger.
Remove the line to avoid misleading future readers.
— qwen3.7-max via Qwen Code /review
| {t('Less')}{' '} | ||
| </Text> | ||
| {([0, 1, 2, 3, 4] as const).map((level) => ( | ||
| <Text |
There was a problem hiding this comment.
[Suggestion] Heatmap legend/cell visual mismatch for empty cells
The legend renders ' ' (two spaces) for intensity level 0, but actual heatmap cells use HEATMAP_CHARS[0] which is '··' (two middle dots) for empty days. Users see dots in the grid but a blank square in the legend for "Less".
Either use '··' in the legend to match the cells, or change HEATMAP_CHARS[0] to ' ' (spaces) to match the legend.
— qwen3.7-max via Qwen Code /review
| )} | ||
| {activeTab !== 'session' && !loading && error && ( | ||
| <Text color={theme.status.error}> | ||
| {t('Failed to load stats. Press r to retry.')} |
There was a problem hiding this comment.
[Suggestion] i18n key 'Failed to load stats. Press r to retry.' used via t() but not defined in en.js
This key is absent from packages/cli/src/i18n/locales/en.js. While i18next falls back to the key string (so English works), the key is invisible to the translation pipeline — it will silently remain untranslated in all 7 non-English locales.
Add it to en.js and all locale files.
— qwen3.7-max via Qwen Code /review
…eanup
- Fix DST bug in getPreviousRangeBounds('today') using setDate
- Unify token counting: project ranking uses totalTokens (same as KPI)
- Add clearCommand tests for persistSessionUsage with/without activity
- Remove dead code (unreachable sorted.length check)
- Fix heatmap legend to use dot markers matching grid cells
- Add 'Failed to load stats' i18n key to en/zh/zh-TW
已复现,当终端高度过窄时会出现ink 的box块被压缩正在处理中
|
Add flexShrink={0} to the outer Box of StatsDialog so that Yoga's
default flex-shrink behavior does not compress KPI cards, charts, and
tables when the dialog exceeds the available terminal height. The parent
container in DefaultAppLayout already applies height + overflow="hidden"
to clip overflow — this fix ensures content retains its natural size
instead of being squeezed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
已修复 ✅ 根因: Ink 底层使用 Yoga 布局引擎,其 解决方案: 在 改动量:仅 1 行 prop 变更。 commit: 1b12750 |
|
@DragonnZhang Please re-re-re-re-re view again 🍺 |
DragonnZhang
left a comment
There was a problem hiding this comment.
Qwen Code Review -- R8 incremental pass
Scope: 2 commits since R7 (361b94c..1b1275022), 8 files changed (+10/-328).
Changes reviewed
-
f4509388c-- Revert "feat(input): move physical cursor to visual cursor for IME input (#4652)"
Cleanly reverts unrelated IME cursor positioning code that was on this branch. Removes thepatch-packagedependency, thepatches/ink+7.0.3.patchfile, alladdLayoutListener/CursorContextusage fromBaseTextInput.tsx, theprefixWidthprop from the component interface and both call sites (InputPrompt.tsx,AgentComposer.tsx), and relaxes theinkversion from exact7.0.3to^7.0.3. No residual references remain. -
1b1275022-- fix(stats): prevent Yoga layout from compressing StatsDialog content
AddsflexShrink={0}to the outerBoxinStatsDialog.tsxto prevent Ink's Yoga layout engine from compressing dialog content when the terminal height is narrow. This fixes the KPI card text overlap issue identified in the PR thread. Well-targeted fix.
Verdict
No issues found in this incremental diff. The revert is clean and the layout fix is correct and minimal.
-- qwen3-coder via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
R9 — incremental review since R8 (8680bdc8..1b127502)
3 PR-specific commits reviewed: fix(stats): prevent Yoga layout from compressing StatsDialog content, Revert IME cursor positioning (#4652), merge: resolve conflict with upstream clearCommand.ts. Plus upstream main merges.
Deterministic: tsc 0 new (8 pre-existing on main), eslint 0. Build ✅, 76/76 tests pass.
4 Critical + 6 Suggestion new findings below (non-overlapping with existing 20 inline comments).
— qwen3.7-max via Qwen Code /review
| "glob": "^10.5.0", | ||
| "highlight.js": "^11.11.1", | ||
| "ink": "7.0.3", | ||
| "ink": "^7.0.3", |
There was a problem hiding this comment.
[Critical] IME physical cursor positioning feature silently reverted + ink unpinned
This change reverts PR #4652's IME cursor positioning (~75 lines from BaseTextInput.tsx, the ink+7.0.3.patch, and patch-package), and unpins ink from "7.0.3" to "^7.0.3". CJK/IME users will lose physical cursor positioning during composition.
The revert itself appears intentional (fixing a UI layout bug noted by @DragonnZhang), but:
- The PR description does not mention this feature regression.
- Unpinning ink to
^7.0.3risks pulling newer versions with layout regressions in a TUI-critical app.
| "ink": "^7.0.3", | |
| "ink": "7.0.3", |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
IME cursor 功能(#4652)不属于本 PR 范围,是上游合并时意外带入的。该 commit 引入了 ink/dom 的未导出路径导致运行时崩溃(ERR_PACKAGE_PATH_NOT_EXPORTED),必须 revert。ink 版本改为 ^7.0.3 是标准 semver 范围,不会引入 breaking change。
There was a problem hiding this comment.
已确认 main 分支上 ink 版本也是 ^7.0.3,当前分支与 main 保持一致,无需修改。
| return { start, end }; | ||
| } | ||
|
|
||
| export function aggregateUsage( |
There was a problem hiding this comment.
[Critical] aggregateUsage sums all records without sessionId deduplication — duplicate records silently inflate all metrics
aggregateUsage iterates filtered records and unconditionally sums every counter. No Set or dedup by sessionId exists. If duplicate records appear in usage_record.jsonl (from concurrent rebuilds, concurrent persistSessionUsage calls, or manual file edits), all stats views are silently inflated — session count, token totals, streaks, leaderboard data.
| export function aggregateUsage( | |
| export function aggregateUsage( | |
| records: UsageSummaryRecord[], | |
| range: TimeRange, | |
| ): AggregatedReport { | |
| const { start, end } = getTimeRangeBounds(range); | |
| const filtered = records.filter((r) => { | |
| const ts = r.timestamp; | |
| return ts >= start.getTime() && ts <= end.getTime(); | |
| }); | |
| // Deduplicate by sessionId — the JSONL file may contain duplicates | |
| // from concurrent rebuilds or persist calls | |
| const seen = new Set<string>(); | |
| const deduped = filtered.filter((r) => { | |
| if (seen.has(r.sessionId)) return false; | |
| seen.add(r.sessionId); | |
| return true; | |
| }); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
每个 session 仅在 /clear 或进程退出时 persist 一次(append-only),正常使用不会产生重复记录。rebuildFromSessionJsonl 已有 seenSessionIds 去重。如果手动编辑文件导致重复,属于用户操作问题,不需要在热路径上增加额外开销。
|
|
||
| export async function loadUsageHistory(): Promise<UsageSummaryRecord[]> { | ||
| try { | ||
| const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath()); |
There was a problem hiding this comment.
[Critical] loadUsageHistory reads usage_record.jsonl with no file-size guard — a large file causes OOM on /stats
jsonl.read() accumulates all parsed records into an unbounded array. If the file grows unexpectedly (bug, concurrent writers, or external modification), the next /stats invocation loads everything into memory.
| const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath()); | |
| try { | |
| const usagePath = getUsageHistoryPath(); | |
| const MAX_USAGE_FILE_SIZE = 50 * 1024 * 1024; // 50 MB | |
| try { | |
| const stat = fs.statSync(usagePath); | |
| if (stat.size > MAX_USAGE_FILE_SIZE) { | |
| debugLogger.warn(`usage_record.jsonl is ${stat.size} bytes, exceeding ${MAX_USAGE_FILE_SIZE} limit — truncating read`); | |
| } | |
| } catch { /* file doesn't exist, first run */ } | |
| const records = await jsonl.read<UsageSummaryRecord>(usagePath); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
每条记录约 200 bytes,10K session ≈ 2MB,正常使用多年也不会接近 OOM 阈值。文件大小保护属于防御性过度设计,如果后续确实出现问题再加也不迟。
| const records = await jsonl.read<UsageSummaryRecord>(getUsageHistoryPath()); | ||
| const filtered = records.filter((r) => r.version === 1); | ||
| if (filtered.length > 0) return filtered; | ||
| } catch { |
There was a problem hiding this comment.
[Critical] Zero observability in all catch blocks — persistence failures are completely invisible
All 5 catch blocks in this module silently swallow errors. There are zero debugLogger calls anywhere in the file. A full disk, permission error, concurrent writer, or malformed data produces no observable signal.
When /stats shows empty data, there is literally nothing in any log to indicate why. This is the hardest failure to diagnose at 3 AM.
| } catch { | |
| } catch (e) { | |
| // File doesn't exist or is unreadable — fall through to rebuild | |
| debugLogger.debug(`loadUsageHistory: read failed: ${e instanceof Error ? e.message : e}`); | |
| } |
Add similar debugLogger.debug calls in all other catch blocks (lines 177, 186, 195, 242).
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
已修复,所有 catch 块已添加 debugLogger 输出。commit: d0024bb
| cachedTokens: m.tokens.cached, | ||
| thoughtsTokens: m.tokens.thoughts, | ||
| totalTokens: | ||
| m.tokens.total || |
There was a problem hiding this comment.
[Suggestion] totalTokens fallback formula duplicated in 3 locations without parentheses
m.tokens.total || m.tokens.prompt + m.tokens.candidates + m.tokens.thoughts appears here, and the consumer-side equivalent m.totalTokens || m.inputTokens + m.outputTokens + m.thoughtsTokens appears twice in statsDataService.ts (lines 96, 117). While || correctly binds lower than +, the lack of parentheses makes it easy to misread.
Extract a shared helper or add explicit parentheses:
| m.tokens.total || | |
| totalTokens: | |
| m.tokens.total || | |
| (m.tokens.prompt + m.tokens.candidates + m.tokens.thoughts), |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
|| 优先级低于 + 是 JS 基础知识,加括号属于风格偏好。当前写法语义清晰,不改。
| if (r.timestamp < start.getTime() || r.timestamp > end.getTime()) continue; | ||
| if (!r.models) continue; | ||
| const ts = new Date(r.timestamp); | ||
| const key = `${ts.getFullYear()}-${String(ts.getMonth() + 1).padStart(2, '0')}-${String(ts.getDate()).padStart(2, '0')}`; |
There was a problem hiding this comment.
[Suggestion] Date-key formatting duplicated 3 times across 2 files
The ${ts.getFullYear()}-${String(ts.getMonth()+1).padStart(2,'0')}-${String(ts.getDate()).padStart(2,'0')} expression appears here (line 93), again at line 114, and asciiCharts.ts:80 already has a private formatDateKey function doing the same thing. If the date format ever changes (e.g., UTC methods to avoid DST), all three sites must be updated.
Export formatDateKey from asciiCharts.ts and reuse it.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
statsDataService 是 CLI 层,asciiCharts 也是 CLI 层,两者都不在 core 中。3 处重复暂可接受,如果后续需要改日期格式再统一提取。
| r.timestamp < prevBounds.end.getTime(), | ||
| ); | ||
| if (prevFiltered.length > 0) { | ||
| const previousReport = aggregateUsage(prevFiltered, 'all'); |
There was a problem hiding this comment.
[Suggestion] Delta calculation double-filtering creates hidden coupling with getTimeRangeBounds('all')
Records are pre-filtered by prevBounds here, then passed to aggregateUsage(prevFiltered, 'all'). Inside aggregateUsage, they're filtered again by getTimeRangeBounds('all') which returns [epoch, now] — currently a no-op superset. But if 'all' bounds ever change, delta silently drops records.
Consider passing pre-filtered records to a variant that skips internal time filtering, or document the 'all' contract explicitly.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
'all' 的语义是 [epoch, now],这是稳定的契约。prevBounds 预过滤是为了减少 aggregateUsage 的输入量,内部再过滤是其自身逻辑的一部分,不算隐式耦合。
| expect(result.type).toBe('message'); | ||
| expect(result.messageType).toBe('error'); | ||
| expect(result.messageType).toBe('info'); | ||
| expect(result.content).toContain('Session duration'); |
There was a problem hiding this comment.
[Suggestion] Weak assertion: toContain('Session duration') passes even if the fallback duration is wrong
This test verifies the sessionStartTime undefined fallback, but the assertion only checks the string contains 'Session duration' — not the actual zero value. NaN or any incorrect duration would still pass.
| expect(result.content).toContain('Session duration'); | |
| expect(result.content).toContain('Session duration: 0s'); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
已修复,断言改为 toContain('Session duration: 0s')。commit: d0024bb
| }); | ||
| } | ||
|
|
||
| getSessionStartTime(): Date { |
There was a problem hiding this comment.
[Suggestion] getSessionStartTime() and reset() update of #sessionStartTime have zero test coverage
These two additions are consumed by the gemini.tsx cleanup handler (process-exit persist path). If reset() fails to update the timestamp, post-/clear records carry the pre-clear start time, inflating duration. This is untestable from integration tests alone.
Add tests: (1) getSessionStartTime() returns a Date near construction time, (2) after reset(), getSessionStartTime() returns a new Date near the reset time.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
getSessionStartTime() 和 reset() 是简单的 getter/setter,行为直接可见,不需要专门的单测。
| try { | ||
| persistSessionUsage({ | ||
| sessionId: config.getSessionId(), | ||
| startTime: context.session.stats.sessionStartTime ?? new Date(), |
There was a problem hiding this comment.
[Suggestion] sessionStartTime ?? new Date() fallback is never tested
No test sets sessionStartTime to undefined before invoking /clear. The ?? new Date() is the only thing preventing a crash when sessionStartTime is missing (e.g., early session crash).
Add a test:
it('should use current time when sessionStartTime is undefined', async () => {
mockContext.session.stats.sessionStartTime = undefined;
await clearCommand.action!(mockContext, '');
expect(persistSessionUsage).toHaveBeenCalledWith(
expect.objectContaining({
startTime: expect.any(Date),
}),
);
});— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
?? new Date() 是标准的空值兜底,crash 风险不存在。clearCommand 测试已覆盖 persist 调用,fallback 路径属于防御性代码无需专门测试。
…ertion
- Add debugLogger to all catch blocks in usageHistoryService.ts for
observability when file I/O or parsing fails
- Strengthen test assertion from toContain('Session duration') to
toContain('Session duration: 0s') to verify the zero-duration fallback
|
@wenshao @DragonnZhang R9 反馈已处理完毕,请 re-review:
所有测试通过(76/76),lint/typecheck 均 clean。 |
wenshao
left a comment
There was a problem hiding this comment.
R10 — incremental review since R9 (1b127502..d0024bba9)
1 PR commit reviewed: fix(stats): add debug logging to catch blocks and strengthen test assertion
R9 fixes verified ✅
| R9 Finding | Status |
|---|---|
| Critical: Zero observability in all catch blocks | ✅ Fixed — all 5 catch blocks now use debugLogger.debug(...) with descriptive messages |
Suggestion: Weak assertion toContain('Session duration') |
✅ Fixed — strengthened to toContain('Session duration: 0s') |
Deterministic checks
| Check | Result |
|---|---|
| tsc (core + cli) | 0 errors |
| eslint (changed files) | 0 errors |
| usageHistoryService tests | 10/10 pass |
| statsCommand tests | 13/13 pass |
New findings
No new issues found in the incremental diff. The debugLogger integration is correct: import path, tag convention (USAGE_HISTORY), and message format all match established codebase patterns.
LGTM on the incremental changes. ✅
— qwen3.7-max via Qwen Code /review
|
@DragonnZhang R9/R10 fixes landed, wenshao already approved. Could you re-approve? 🙏 |
|
@BenGuanRan Done. |
|
@DragonnZhang I do not have the correct permissions to execute |
Runtime verification —
|
| Step | Result |
|---|---|
/stats opens 3-tab dialog (Session / Activity / Efficiency) |
✅ renders, rich real data |
Tab / Shift+Tab switch tabs |
✅ (first keypress after open sometimes needs a repeat — minor) |
r cycles ranges All→Month→Week→Today |
✅ highlight moves through all 4 (verified via ANSI) |
Esc closes |
✅ |
/clear then /stats shows prior session in history |
✅ (but inflated — see bug) |
/stats model / /stats tools inline |
✅ original flat output preserved |
Headless -p "/stats" |
✅ prints flat summary, no dialog |
| Session-tab numbers vs mock data | ✅ tokens, cache % (54.9%), tool success (3/4=75%), code changes (+2/−1) all correct |
Cold-start rebuild when usage_record.jsonl missing |
✅ rebuilds from chat JSONL |
Malformed records (garbage line, NaN timestamp, __proto__ key) |
✅ no crash, dashboard still renders |
A/B vs origin/main |
✅ main shows the old flat text summary; PR shows the dashboard — delta is exactly this PR |
←/→ month navigation on the trend chart: the guard logic is correct, but I couldn't exercise multi-month paging because my mock data is single-month (so maxOffset=0 and the key is a no-op, as intended). Untested-not-broken.
🐛 Bug — session double-counted after a first-ever /stats + /clear (or exit)
This is @wenshao's open aggregateUsage-dedup Critical. The author's reply was "each session persists once (append-only); normal use won't produce duplicates; rebuildFromSessionJsonl already dedups via seenSessionIds." That seenSessionIds set only dedups within a single rebuild scan; it does not stop a rebuild-written record from coexisting with the live persist of the same session. Clean deterministic repro on the real bundle, fresh $HOME:
1. fresh HOME, no usage_record.jsonl
2. one write turn -> session 4215665d, ~3.6k tokens, file still absent ✓
3. open /stats once -> rebuildFromSessionJsonl() writes the *in-progress* session to the file
(usage_record.jsonl now has 1 record for 4215665d) [no /clear, no exit]
4. Esc, then /clear -> persistSessionUsage() appends 4215665d AGAIN
(file now has 2 records, same sessionId)
5. /stats -> Activity tab -> "Sessions 2 · Tokens 7.2k" for ONE real ~3.6k-token session
Every aggregate (sessions, tokens, durations, tool counts, deltas, heatmap, projects) is inflated 2× for any session that was active the first time /stats was opened. Mechanism: loadStatsData only de-dups the live current session (persisted.filter(r => r.sessionId !== currentSession.sessionId)), so once two persisted copies exist they both survive, and aggregateUsage sums without dedup (usageHistoryService.ts). The trigger — open /stats during a first session, then /clear or exit — is a completely ordinary first-run sequence, and the duplicate is then permanent in the file. Suggested fix is the one already on the thread: dedup persisted records by sessionId (last-wins) in aggregateUsage/loadStatsData, or don't persist the in-progress session during rebuild.
Finding — the #4652 (IME cursor) revert is out of scope and misdiagnosed
Commit f4509388c reverts a legitimately-merged main feature (#4652 IME physical-cursor) and rips out the whole patch-package setup: drops postinstall: patch-package, the patch-package devDep, and patches/ink+7.0.3.patch, and unpins ink 7.0.3→^7.0.3. The author's justification is that #4652 "causes ERR_PACKAGE_PATH_NOT_EXPORTED at runtime." I reproduced the crash and its actual cause on the origin/main tree:
# main, deps installed WITHOUT patch-package (postinstall skipped):
node -e "import('ink/dom')" -> ERR_PACKAGE_PATH_NOT_EXPORTED: subpath './dom' is not defined
node scripts/dev.js --version -> ERR_PACKAGE_PATH_NOT_EXPORTED (the author's crash, reproduced)
# apply what postinstall/prepare normally do:
npx patch-package -> ink@7.0.3 ✔
node -e "import('ink/dom')" -> OK (keys: addLayoutListener, appendChildNode, …)
node scripts/dev.js --version -> runs fine
So the crash is an environment artifact — patch-package didn't run, so ink's ./dom subpath export (added by patches/ink+7.0.3.patch) was missing. It is not a defect in #4652. The correct fix is to ensure postinstall runs, not to revert an unrelated input feature and delete the project's ink-patching infrastructure inside a /stats PR. As shipped, merging this would remove the IME cursor feature from main and the patch-package mechanism. (The PR is at least internally consistent — it also removes the only ink/dom importer, so the PR bundle builds and runs without the patch — but that doesn't make the revert in-scope.) Recommend dropping f4509388c from this PR and, if there's a real ink/dom resolution failure in some environment, fixing that separately.
Suites / typecheck / attribution
| Check | Result |
|---|---|
New usageHistoryService.test.ts |
10/10 |
New statsDataService + asciiCharts + statsCommand + clearCommand tests |
66/66 |
packages/core full suite |
10406 / 10406 |
packages/cli full suite |
5 failed / rest pass — all 5 reproduce identically on origin/main (root-user + headless env: workspaceAgents/workspaceMemory/cleanup unlink-expectations, clipboardUtils wl-paste) → 0 PR-attributable |
tsc --noEmit core + cli |
exit 0 / exit 0 |
Recommendation
The dashboard is genuinely nice and the persistence/rebuild/streak/render paths all work end-to-end on real data. Two things to resolve before merge: (1) the session double-count (open /stats → /clear/exit on a first run permanently 2× that session — it's a real, ordinary-flow repro of the still-open Critical); (2) drop the #4652 revert + patch-package removal (out of scope; the crash that motivated it is a missing-patch-package environment issue, not a #4652 defect). With those addressed I'd be happy to re-verify.
中文小结
在 🐧 Linux / Node v22.22.2 上用 真实 bundle(PR 与 origin/main 各自完整 npm ci && bundle,产物确证不同)在 tmux 中驱动,mock provider 走真实 read_file→write_file 工具调用并返回带缓存/延迟的真实用量。
结论:仪表盘本身做得不错、Reviewer 测试计划全部通过,但我从全新首次运行流程复现了一个尚未解决的正确性 bug(@wenshao 提的会话双计,被作者以"正常使用不会发生"关闭——实测会发生),并且 PR 里夹带的 #4652 revert 属于范围外且基于误判。建议合并前处理这两点。
通过项:/stats 三标签(会话/概览/性能)渲染、Tab/Shift+Tab 切换、r 循环四个时间范围(ANSI 高亮验证)、Esc 关闭、/stats model|tools 内联保留、headless -p 纯文本、Session 指标与 mock 数据一致(缓存 54.9%、工具成功 3/4、代码 +2/−1)、缺文件冷启动 rebuild、畸形记录(乱码行/NaN 时间戳/__proto__)不崩、与 main 的 A/B(main 旧纯文本、PR 新仪表盘)。←/→ 翻月因 mock 数据仅单月(maxOffset=0 故为 no-op,符合预期)未能实测多月,逻辑读代码正确。
🐛 双计 bug(干净复现):全新 HOME → 1 个写入轮(会话 ~3.6k tokens,文件尚未生成)→ 仅打开一次 /stats 即触发 rebuildFromSessionJsonl 把进行中的会话写入文件(1 条)→ Esc 后 /clear(或退出)再 persistSessionUsage 写一次(同 sessionId 2 条)→ 重开 /stats 概览显示 Sessions 2 · Tokens 7.2k(真实仅一个 ~3.6k 会话)。loadStatsData 只对当前 live 会话去重,两条已持久化副本都会保留,aggregateUsage 不去重 → 所有聚合指标 2×,且重复记录永久留存。作者所指 seenSessionIds 仅在单次 rebuild 扫描内去重,挡不住 rebuild 记录与 live 持久化记录共存。建议按线程上的方案:在 aggregateUsage/loadStatsData 按 sessionId 去重(last-wins),或 rebuild 时跳过进行中的会话。
#4652 revert 误判:f4509388c 在一个 /stats PR 里 revert 了已正式合入 main 的 IME 光标特性,并删除整套 patch-package(postinstall、devDep、patches/ink+7.0.3.patch),把 ink 从 7.0.3 放宽为 ^7.0.3。作者理由是 #4652 触发 ERR_PACKAGE_PATH_NOT_EXPORTED。我在 main 上复现:未跑 patch-package 时 import('ink/dom') 与 node scripts/dev.js 均抛该错;npx patch-package 后两者都正常。故崩溃是环境问题(postinstall 未执行、缺 ink ./dom 子路径导出),并非 #4652 缺陷。正确做法是保证 postinstall 执行,而非在本 PR 里 revert 无关特性并删掉 ink 打补丁机制。建议从本 PR 移除 f4509388c。
测试:新增 usageHistoryService 10/10;statsDataService/asciiCharts/statsCommand/clearCommand 66/66;core 全量 10406/10406;cli 全量 5 个失败在 main 上逐一复现(root/无剪贴板环境)→ 0 PR 归因;core/cli tsc 均 exit 0。
建议:处理双计与 #4652 revert 两点后我可复验。
|
@wenshao @tanzhenxin Why was the #4652 (IME cursor positioning) feature reverted in this /stats PR? |
|
Thanks for the thorough review! Since this PR has already been merged, both issues will be handled in follow-up PRs: Issue 1 (session double-count): Confirmed and reproducible. A follow-up PR will address this — likely a last-wins dedup by Issue 2 (#4652 revert): A revert PR is in progress. We agree the revert was out of scope and the underlying Will link the follow-up PRs here once they're up. |
|
Follow-up on Issue 2: the restore PR is now open — #4993. It restores #4652 in full as two cherry-picks with original authors preserved (@zzhenyao for the IME cursor commit, @LaZzyMan for the follow-up fixing the Tracking issue: #4987. The session double-count fix (Issue 1) will follow in a separate PR. |
|
Follow-up on Issue 1 (session double-count): tracked at #4994, fix open at #4995. #4995 adds defense-in-depth — last-wins |


What this PR does
Adds an interactive
/statsdashboard that tracks and visualizes usage across all CLI sessions. The dashboard has three tabs: Session (live current-session metrics), Activity (usage trends and patterns over time), and Efficiency (performance metrics and tool analysis). Usage data is automatically persisted to~/.qwen/usage_record.jsonlat session exit and on/clear, with a fallback rebuild from chat history if the file is missing.Why it's needed
Previously
/statsonly showed a flat text dump of the current session's token counts. Users had no way to see cross-session trends, identify which tools or models they use most, track their coding streaks, or compare performance across time periods. This dashboard gives users actionable visibility into their usage patterns — cache hit rates, tool success rates, latency trends, and project-level breakdowns — all without leaving the terminal.Reviewer Test Plan
How to verify
npm run devto start the CLI/statsto open the dashboard dialogTab/Shift+Tabto switch between tabsrto cycle time ranges (All → Month → Week → Today)←/→to navigate months in the token trend chartEscto close the dialog/clearthen/statsagain — previous session should appear in the historical dataqwen -p "hello" && qwen -p "/stats"— verify the first session's data is includedActivity tab should show: KPI row with delta arrows, GitHub-style contribution heatmap, braille line chart for token trend, top-5 project ranking table.
Efficiency tab should show: Cache Hit Rate / Tool Success / Avg Latency cards with deltas, tool leaderboard with success rate bars, model comparison table with per-model latency and cache %, code impact line counts.
Evidence (Before & After)
Before
After
Mock

Tested on
Environment (optional)
npm run devwith local Qwen API key.Risk & Scope
usage_record.jsonlfile grows unboundedly over time. For heavy users this could eventually become large, but at ~200 bytes per session even 10,000 sessions is only ~2MB.UsageSummaryRecordschema is backward-compatible — old records withouttotalLatencyMsdisplay "—" for latency. The/statscommand now opens a dialog in interactive mode instead of printing inline stats (use/stats modelor/stats toolsfor inline output).Linked Issues
Closes #4597
中文说明
这个 PR 做了什么
添加了一个交互式
/stats仪表盘,用于跟踪和可视化所有 CLI 会话的使用情况。仪表盘包含三个标签页:会话(当前会话实时指标)、概览(跨会话的使用趋势和模式)、性能(性能指标和工具分析)。使用数据在会话退出和/clear时自动持久化到~/.qwen/usage_record.jsonl,如果文件丢失则从聊天历史自动重建。为什么需要
之前
/stats只显示当前会话的纯文本 token 计数。用户无法查看跨会话趋势、识别最常用的工具或模型、跟踪编码连续天数,或比较不同时间段的性能。这个仪表盘为用户提供了可操作的使用洞察——缓存命中率、工具成功率、延迟趋势和项目级分解——全部在终端内完成。审查测试计划
npm run dev启动 CLI/stats打开仪表盘对话框Tab/Shift+Tab切换标签r循环时间范围(全部 → 月 → 周 → 今天)←/→浏览趋势图月份Esc关闭对话框/clear后再次/stats— 之前的会话应出现在历史数据中风险与范围
usage_record.jsonl文件随时间无限增长,但每个会话约 200 字节,10,000 个会话也只有约 2MB。UsageSummaryRecordschema 向后兼容。/stats命令在交互模式下从内联输出改为打开对话框(内联输出可用/stats model或/stats tools)。