Skip to content

fix(analytics): avg duration handles both quoted and unquoted duration_s#1728

Open
RyanAlberts wants to merge 1 commit into
garrytan:mainfrom
RyanAlberts:fix-analytics-avg-duration-quoted
Open

fix(analytics): avg duration handles both quoted and unquoted duration_s#1728
RyanAlberts wants to merge 1 commit into
garrytan:mainfrom
RyanAlberts:fix-analytics-avg-duration-quoted

Conversation

@RyanAlberts

Copy link
Copy Markdown

Summary

gstack-analytics (the local-usage dashboard) silently displayed (avg 0s) next to any /skill whose completion events emit "duration_s" as a quoted JSON string. The completion-status preamble that every skill runs (scripts/resolvers/preamble/generate-completion-status.ts:70) and bin/gstack-codex-probe:84 both emit "duration_s":"<N>", so the dominant in-the-wild case rendered as 0s and most per-skill averages in the dashboard have been wrong since this code path was added. bin/gstack-telemetry-log emits the unquoted numeric form, which is why the bug wasn't visible for every skill.

The companion Total time: line at the bottom of the dashboard was always correct — it uses a different parser, so the bug was confined to the per-/skill (avg X) column.

Current behavior on upstream main

Repro on origin/main (cf50443b):

mkdir -p /tmp/repro/.gstack/analytics
cat > /tmp/repro/.gstack/analytics/skill-usage.jsonl <<'JSONL'
{"skill":"review","duration_s":"120","outcome":"success","ts":"2026-05-25T10:00:00Z"}
{"skill":"review","duration_s":"180","outcome":"success","ts":"2026-05-25T11:00:00Z"}
{"skill":"review","duration_s":"60","outcome":"success","ts":"2026-05-25T12:00:00Z"}
{"skill":"qa","duration_s":300,"outcome":"success","ts":"2026-05-25T13:00:00Z"}
{"skill":"qa","duration_s":420,"outcome":"success","ts":"2026-05-25T14:00:00Z"}
JSONL
GSTACK_STATE_DIR=/tmp/repro/.gstack bin/gstack-analytics all

Before:

gstack usage (all time)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  /review               ████████████████████  3 runs  (avg 0s)   <-- WRONG, should be 2m
  /qa                   █████████████  2 runs  (avg 6m)
Total time: 18m

Total time: 18m matches the real sum (120 + 180 + 60 + 300 + 420 = 1080s = 18m), confirming the total-time parser is already correct. Only the per-skill (avg ...) column is broken.

Root cause

The AVG_DUR block in bin/gstack-analytics:165-177 splits each line on the literal "duration_s": prefix and then strips non-numeric characters from parts[2]:

n = split($0, parts, "\"duration_s\":")
if (n >= 2) {
  gsub(/[^0-9.].*/, "", parts[2])      # greedy .*
  if (parts[2]+0 > 0) { total += parts[2]; count++ }
}
  • Unquoted form 600,"ts":...gsub matches at the comma → parts[2] becomes 600 → accumulates.
  • Quoted form "600","ts":...gsub matches at the leading " (position 0), and the trailing .* consumes the rest of the line → parts[2] becomes the empty string → "" + 0 > 0 is false → never accumulates → count stays 0 → average renders as 0.

TOTAL_DURATION (bin/gstack-analytics:114-125) uses -F'[:,]' field splitting and a non-greedy gsub(/[^0-9.]/, "", val), which isolates the value before stripping. That parser handles both forms and isn't affected.

Fix

Strip an optional leading double-quote with sub(/^"/, "", parts[2]) before the existing gsub so quoted and unquoted forms take the same path through the existing digit-extraction:

sub(/^"/, "", parts[2])                # NEW
gsub(/[^0-9.].*/, "", parts[2])

Three lines in bin/gstack-analytics (one new statement + two-line comment update). No producer-side changes — the parser fix accommodates both emitter formats that are already in the wild, so existing JSONL files on users' disks render correctly without migration.

Edge cases verified by hand:

Input value After sub(/^"/) After gsub(/[^0-9.].*/) Accumulated?
600,"ts":... (unquoted) 600,"ts":... 600 yes
"600","ts":... (quoted) 600","ts":... 600 yes (was: no)
null,"ts":... null,"ts":... `` (empty) no — "" + 0 > 0 is false
"1.5","ts":... (decimal) 1.5","ts":... 1.5 yes

Validation

After the fix, same repro renders correctly:

  /review               ████████████████████  3 runs  (avg 2m)   <-- (120+180+60)/3 = 120s = 2m
  /qa                   █████████████  2 runs  (avg 6m)
Total time: 18m

New regression test in test/gstack-analytics-avg-duration.test.ts exercises four cases (quoted-only, unquoted-only, mixed quoted+unquoted for the same skill, and total-time aggregation). I confirmed it fails on the pre-fix tree (expected "2m", got "0s") by stashing the patch and re-running, then passes after popping the stash.

$ bun test test/gstack-analytics-avg-duration.test.ts
 4 pass
 0 fail
 4 expect() calls

$ bun test test/telemetry.test.ts test/gstack-analytics-avg-duration.test.ts
 39 pass
 0 fail
 87 expect() calls

$ bun test
exit 0

Scope

bin/gstack-analytics (3 lines) + 1 new test file. No SKILL.md/template changes, no VERSION/CHANGELOG bumps (per the wave-process note in CONTRIBUTING.md). The producer-side emitters (generate-completion-status.ts, gstack-codex-probe, gstack-telemetry-log) are untouched — fixing the parser keeps existing JSONL on users' machines readable without a migration.

The per-skill (avg X) column in gstack-analytics silently rendered as
"0s" for any /skill whose completion events emit "duration_s" as a
quoted JSON string. The completion-status preamble that every skill
runs (scripts/resolvers/preamble/generate-completion-status.ts:70) and
gstack-codex-probe both emit "duration_s":"<N>", so the dominant case
was wrong; gstack-telemetry-log emits the unquoted form which is why
the bug didn't show up everywhere.

The AVG_DUR awk block split each line on the literal "duration_s":
prefix and then ran gsub(/[^0-9.].*/, "", parts[2]). For unquoted
600,"ts":... the gsub matches at the comma and yields "600". For
quoted "600","ts":... it matches at the leading double-quote
(position 0) and the trailing .* consumes the entire rest of the
line, leaving an empty string that never accumulates.

The companion TOTAL_DURATION calculation (line 114) uses a different
parser keyed on -F'[:,]' field splitting + non-greedy gsub, so it
isolates the value before stripping and was always correct. Only the
per-skill average was broken.

Fix: strip an optional leading double-quote with sub(/^"/, ...) before
the existing gsub so quoted and unquoted forms take the same path.

Adds test/gstack-analytics-avg-duration.test.ts covering quoted-only,
unquoted-only, mixed, and total-time aggregation. The test fails on
the pre-fix tree (expected "2m", got "0s") and passes after the fix.
@jbetala7

Copy link
Copy Markdown
Contributor

Independent validation — this looks correct.

Confirmed the bug on origin/main: the emit sites do write quoted strings — scripts/resolvers/preamble/generate-completion-status.ts:70 and bin/gstack-codex-probe:84 both emit "duration_s":"<N>", which is the dominant in-the-wild form. With the pre-fix awk, parts[2] for a quoted value is "120",...; the leading " is non-numeric, so gsub(/[^0-9.].*/, "", parts[2]) erases the whole token → 0, and those rows silently drop out of both the per-skill avg and the total.

Running this PR's test/gstack-analytics-avg-duration.test.ts against origin/main fails 2/4 (the mixed case yields 50s instead of 40s because the quoted "30" is dropped; the all-quoted review case averages to 0). On the PR branch all 4 pass. The sub(/^"/, "", parts[2]) strip is the minimal correct fix and leaves the unquoted numeric path (from gstack-telemetry-log) untouched.

No competing open PR touches bin/gstack-analytics for this. Nice catch and good regression coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants