v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify by garrytan · Pull Request #1444 · garrytan/gbrain

garrytan · 2026-05-25T19:59:37Z

Summary

CI wallclock on every PR drops from ~23 min to ~9-10 min, with ≤2 min on cache-hit PRs. Five orthogonal levers, one cathedral PR with atomic bisect-friendly commits.

Performance levers

Restructured test.yml jobs — verify + serial-tests extracted from shard 1 into their own runners. Shard 1 stops carrying ~3min of verify + serial overhead on top of its matrix work.
Matrix 4 → 6 shards — stays under GitHub free-tier ~20-job concurrency budget on multi-PR days (6 shards × 2 PRs + verify + serial + gitleaks + cache jobs = ~24 jobs).
Weight-aware LPT bin-packer — scripts/sharding.ts replaces FNV-1a path-hash partition. Real-weight 6-shard projection: every shard estimated at 534s = 8.9 min wallclock (compare to current p100 of 23-26 min on the unlucky shard).
Mined CI-log weights — scripts/mine-shard-weights.ts scrapes per-file wallclock from gh run view --log (timestamp delta between ##[group]test/foo.test.ts: headers within a shard). Free, real-world data, methodologically right (measures CI shard runtime, not isolated cold-start).
Auto SHA cache — scripts/ci-cache-hash.sh produces a 16-char sha256 over every git-tracked file EXCEPT the deny-list (CHANGELOG, TODOS, README, LICENSE, docs/.md/.txt). actions/cache/restore@v4.2.3 in lookup-only mode probes the cache key first; on hit, every gated job skips and the test-status aggregator goes green. Cache write is post-all-pass (if: success() && ...) — never blesses a bad state.

Plan + reviews

Plan + 11 captured decisions at ~/.claude/plans/system-instruction-you-are-working-graceful-platypus.md. Eng review (cleared) + Codex outside-voice review produced four material plan changes baked into this ship: (a) verified e2e.yml is 3-5 min (NOT the critical path), confirming test.yml targeting is right; (b) corrected deny-list to keep CLAUDE.md and AGENTS.md IN the hash (8+ test files read them, deny-listing would create false-pass holes); (c) replaced original draft's isolated per-file profiling with log-mining; (d) added the job restructure missing from original plan.

Test Coverage

8 CRITICAL false-pass guards in test/scripts/ci-cache-hash.test.ts:

CLAUDE.md edit → DIFFERENT hash
AGENTS.md edit → DIFFERENT hash
skills/foo/SKILL.md edit → DIFFERENT hash
src/core/db.ts edit → DIFFERENT hash
test/foo.test.ts edit → DIFFERENT hash
package.json edit → DIFFERENT hash
bun.lock edit → DIFFERENT hash
.github/workflows/test.yml edit → DIFFERENT hash

7 SAFE deny-list invariants (CHANGELOG, README, TODOS, LICENSE, docs/.md, docs/sub/.md, docs/*.txt → SAME hash). 9 edge cases (symlinks, rename detection, untracked-file-excluded, new-file-type-discovery defaults to include, deny-list typo guard, locale-stable sort, determinism, usage errors).

LPT bin-packer: 23 cases (happy path, fallback semantics, full coverage, determinism, balance ratio ≤1.5 on synthetic Zipf corpus, N=1 trivial, malformed weights). Verify dispatcher: 6 cases. Mine-shard-weights: 15 cases. Extended test-shard.slow.test.ts with LPT balance + slow-file inclusion regression.

Tests: 120/120 green across test/scripts/. Full local fast loop: 10,195 unit pass + 475 serial pass. Verify chain: 21/21 parallel checks pass in 13s (was 27s sequential).

Pre-Landing Review

Eng review CLEARED via /plan-eng-review. Codex outside-voice ran (4 findings produced material plan changes; rest considered and noted). Plan document carries the full ## GSTACK REVIEW REPORT block.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

All 8 implementation tasks DONE per ~/.claude/plans/system-instruction-you-are-working-graceful-platypus.md.

Test plan

bun run verify (parallel): 21/21 green in 13s
bun test test/scripts/: 120/120 green
bun run test full fast loop: 10,195 unit pass + 475 serial pass
6-shard balance projection from real mined weights: ~9 min per shard
First CI run on this PR validates the workflow restructure end-to-end
Second push (same SHA or CHANGELOG-only) validates cache hit → ≤2 min

🤖 Generated with Claude Code

Fans out the 21 pre-test grep guards via & + wait, captures per-check exit codes in a tempdir, aggregates failures with named check + log tail to stderr on miss. Wallclock 27s sequential → 13s parallel locally (2x). Bigger CI win is shard 1 deload (workflow restructure in a later commit). Pinned by test/scripts/run-verify-parallel.test.ts (6 cases: CLI contract + synthetic dispatcher failure-surfacing).

scripts/sharding.ts (NEW) — pure TypeScript LPT bin-packer. Sort weights desc, assign each file to the shard with current minimum total. Worst-case makespan within 4/3 of optimal, O(n log n). Missing weights fall back to corpus median (not 0). New test file → ships immediately without regenerating weights. Pinned by test/scripts/sharding.test.ts (23 cases). scripts/mine-shard-weights.ts (NEW) — scrapes per-file timing from gh run view --log via timestamp delta between ##[group]test/foo.test.ts: headers within a shard. Three input modes: --run <ID>, --from-file <PATH>, stdin. Stable JSON output (sorted keys). Initial weights mined from run 26398061007. Pinned by test/scripts/mine-shard-weights.test.ts (15 cases). scripts/ci-cache-hash.sh (NEW) — deterministic 16-char sha256 over git ls-files -s minus deny-list (CHANGELOG/TODOS/README/LICENSE/ docs/**/*.md). CLAUDE.md, AGENTS.md, skills/**/* deliberately INCLUDED (8+ test files read them; deny-listing would create false-pass holes). ~40ms on 1891 files. Pinned by test/scripts/ci-cache-hash.test.ts (24 cases: 8 CRITICAL false-pass guards + 7 SAFE deny-list invariants + 9 edge cases). scripts/test-weights.json (NEW) — 712 weights. Total 3306s observed runtime; median 30ms; max 6 min outlier.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Master shipped v0.41.6.0 (CI speedup: 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify, #1444). Master now holds the v0.41.6.0 slot that our branch previously claimed before the v0.41.9.0 retarget. Resolved VERSION + package.json + CHANGELOG conflicts. Our v0.41.9.0 remains correct — it deliberately skipped past master's allocator to avoid collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* upstream/master: v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate (garrytan#1445) v0.41.10.0 feat: orphan reduction via --by-mention + UTF-16 surrogate-pair fix (garrytan#1442) v0.41.9.0 — UX/reliability fix wave (5 defects from production report) (garrytan#1440) v0.41.8.0 fix(pglite): search/query/get exit cleanly + garrytan#1340 hint + garrytan#1342 breadcrumbs (garrytan#1405) v0.41.7.0 feat: compact list-format resolver + 300-skill scaling tutorial (garrytan#1407) v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify (garrytan#1444) v0.41.5.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability (garrytan#1374) # Conflicts: # src/core/ai/recipes/openai.ts

…weight-aware sharding + auto SHA cache + parallel verify (garrytan#1444) * feat(ci): scripts/run-verify-parallel.sh — parallel verify dispatcher Fans out the 21 pre-test grep guards via & + wait, captures per-check exit codes in a tempdir, aggregates failures with named check + log tail to stderr on miss. Wallclock 27s sequential → 13s parallel locally (2x). Bigger CI win is shard 1 deload (workflow restructure in a later commit). Pinned by test/scripts/run-verify-parallel.test.ts (6 cases: CLI contract + synthetic dispatcher failure-surfacing). * feat(ci): weight-aware LPT bin-packer + auto SHA cache hash scripts/sharding.ts (NEW) — pure TypeScript LPT bin-packer. Sort weights desc, assign each file to the shard with current minimum total. Worst-case makespan within 4/3 of optimal, O(n log n). Missing weights fall back to corpus median (not 0). New test file → ships immediately without regenerating weights. Pinned by test/scripts/sharding.test.ts (23 cases). scripts/mine-shard-weights.ts (NEW) — scrapes per-file timing from gh run view --log via timestamp delta between ##[group]test/foo.test.ts: headers within a shard. Three input modes: --run <ID>, --from-file <PATH>, stdin. Stable JSON output (sorted keys). Initial weights mined from run 26398061007. Pinned by test/scripts/mine-shard-weights.test.ts (15 cases). scripts/ci-cache-hash.sh (NEW) — deterministic 16-char sha256 over git ls-files -s minus deny-list (CHANGELOG/TODOS/README/LICENSE/ docs/**/*.md). CLAUDE.md, AGENTS.md, skills/**/* deliberately INCLUDED (8+ test files read them; deny-listing would create false-pass holes). ~40ms on 1891 files. Pinned by test/scripts/ci-cache-hash.test.ts (24 cases: 8 CRITICAL false-pass guards + 7 SAFE deny-list invariants + 9 edge cases). scripts/test-weights.json (NEW) — 712 weights. Total 3306s observed runtime; median 30ms; max 6 min outlier. * chore: bump version and changelog (v0.41.6.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

garrytan and others added 3 commits May 25, 2026 12:58

chore: bump version and changelog (v0.41.6.0)

e84875d

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

garrytan merged commit 3a2605e into master May 25, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify#1444

v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify#1444
garrytan merged 3 commits into
masterfrom
garrytan/ci-speedup

garrytan commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 25, 2026

Summary

Performance levers

Plan + reviews

Test Coverage

Pre-Landing Review

Eval Results

Plan Completion

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant