Skip to content

v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify#1444

Merged
garrytan merged 3 commits into
masterfrom
garrytan/ci-speedup
May 25, 2026
Merged

v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify#1444
garrytan merged 3 commits into
masterfrom
garrytan/ci-speedup

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

CI wallclock on every PR drops from ~23 min to ~9-10 min, with ≤2 min on cache-hit PRs. Five orthogonal levers, one cathedral PR with atomic bisect-friendly commits.

Performance levers

  • Restructured test.yml jobsverify + serial-tests extracted from shard 1 into their own runners. Shard 1 stops carrying ~3min of verify + serial overhead on top of its matrix work.
  • Matrix 4 → 6 shards — stays under GitHub free-tier ~20-job concurrency budget on multi-PR days (6 shards × 2 PRs + verify + serial + gitleaks + cache jobs = ~24 jobs).
  • Weight-aware LPT bin-packerscripts/sharding.ts replaces FNV-1a path-hash partition. Real-weight 6-shard projection: every shard estimated at 534s = 8.9 min wallclock (compare to current p100 of 23-26 min on the unlucky shard).
  • Mined CI-log weightsscripts/mine-shard-weights.ts scrapes per-file wallclock from gh run view --log (timestamp delta between ##[group]test/foo.test.ts: headers within a shard). Free, real-world data, methodologically right (measures CI shard runtime, not isolated cold-start).
  • Auto SHA cachescripts/ci-cache-hash.sh produces a 16-char sha256 over every git-tracked file EXCEPT the deny-list (CHANGELOG, TODOS, README, LICENSE, docs/.md/.txt). actions/cache/restore@v4.2.3 in lookup-only mode probes the cache key first; on hit, every gated job skips and the test-status aggregator goes green. Cache write is post-all-pass (if: success() && ...) — never blesses a bad state.

Plan + reviews

Plan + 11 captured decisions at ~/.claude/plans/system-instruction-you-are-working-graceful-platypus.md. Eng review (cleared) + Codex outside-voice review produced four material plan changes baked into this ship: (a) verified e2e.yml is 3-5 min (NOT the critical path), confirming test.yml targeting is right; (b) corrected deny-list to keep CLAUDE.md and AGENTS.md IN the hash (8+ test files read them, deny-listing would create false-pass holes); (c) replaced original draft's isolated per-file profiling with log-mining; (d) added the job restructure missing from original plan.

Test Coverage

8 CRITICAL false-pass guards in test/scripts/ci-cache-hash.test.ts:

  • CLAUDE.md edit → DIFFERENT hash
  • AGENTS.md edit → DIFFERENT hash
  • skills/foo/SKILL.md edit → DIFFERENT hash
  • src/core/db.ts edit → DIFFERENT hash
  • test/foo.test.ts edit → DIFFERENT hash
  • package.json edit → DIFFERENT hash
  • bun.lock edit → DIFFERENT hash
  • .github/workflows/test.yml edit → DIFFERENT hash

7 SAFE deny-list invariants (CHANGELOG, README, TODOS, LICENSE, docs/.md, docs/sub/.md, docs/*.txt → SAME hash). 9 edge cases (symlinks, rename detection, untracked-file-excluded, new-file-type-discovery defaults to include, deny-list typo guard, locale-stable sort, determinism, usage errors).

LPT bin-packer: 23 cases (happy path, fallback semantics, full coverage, determinism, balance ratio ≤1.5 on synthetic Zipf corpus, N=1 trivial, malformed weights). Verify dispatcher: 6 cases. Mine-shard-weights: 15 cases. Extended test-shard.slow.test.ts with LPT balance + slow-file inclusion regression.

Tests: 120/120 green across test/scripts/. Full local fast loop: 10,195 unit pass + 475 serial pass. Verify chain: 21/21 parallel checks pass in 13s (was 27s sequential).

Pre-Landing Review

Eng review CLEARED via /plan-eng-review. Codex outside-voice ran (4 findings produced material plan changes; rest considered and noted). Plan document carries the full ## GSTACK REVIEW REPORT block.

Eval Results

No prompt-related files changed — evals skipped.

Plan Completion

All 8 implementation tasks DONE per ~/.claude/plans/system-instruction-you-are-working-graceful-platypus.md.

Test plan

  • bun run verify (parallel): 21/21 green in 13s
  • bun test test/scripts/: 120/120 green
  • bun run test full fast loop: 10,195 unit pass + 475 serial pass
  • 6-shard balance projection from real mined weights: ~9 min per shard
  • First CI run on this PR validates the workflow restructure end-to-end
  • Second push (same SHA or CHANGELOG-only) validates cache hit → ≤2 min

🤖 Generated with Claude Code

garrytan and others added 3 commits May 25, 2026 12:58
Fans out the 21 pre-test grep guards via & + wait, captures per-check
exit codes in a tempdir, aggregates failures with named check + log
tail to stderr on miss. Wallclock 27s sequential → 13s parallel
locally (2x). Bigger CI win is shard 1 deload (workflow restructure
in a later commit).

Pinned by test/scripts/run-verify-parallel.test.ts (6 cases: CLI
contract + synthetic dispatcher failure-surfacing).
scripts/sharding.ts (NEW) — pure TypeScript LPT bin-packer. Sort
weights desc, assign each file to the shard with current minimum total.
Worst-case makespan within 4/3 of optimal, O(n log n). Missing weights
fall back to corpus median (not 0). New test file → ships immediately
without regenerating weights. Pinned by test/scripts/sharding.test.ts
(23 cases).

scripts/mine-shard-weights.ts (NEW) — scrapes per-file timing from
gh run view --log via timestamp delta between ##[group]test/foo.test.ts:
headers within a shard. Three input modes: --run <ID>, --from-file
<PATH>, stdin. Stable JSON output (sorted keys). Initial weights mined
from run 26398061007. Pinned by test/scripts/mine-shard-weights.test.ts
(15 cases).

scripts/ci-cache-hash.sh (NEW) — deterministic 16-char sha256 over
git ls-files -s minus deny-list (CHANGELOG/TODOS/README/LICENSE/
docs/**/*.md). CLAUDE.md, AGENTS.md, skills/**/* deliberately
INCLUDED (8+ test files read them; deny-listing would create
false-pass holes). ~40ms on 1891 files. Pinned by
test/scripts/ci-cache-hash.test.ts (24 cases: 8 CRITICAL false-pass
guards + 7 SAFE deny-list invariants + 9 edge cases).

scripts/test-weights.json (NEW) — 712 weights. Total 3306s observed
runtime; median 30ms; max 6 min outlier.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan merged commit 3a2605e into master May 25, 2026
14 checks passed
garrytan added a commit that referenced this pull request May 25, 2026
Master shipped v0.41.6.0 (CI speedup: 23min → ~9min via matrix 4→6 +
weight-aware sharding + auto SHA cache + parallel verify, #1444).
Master now holds the v0.41.6.0 slot that our branch previously claimed
before the v0.41.9.0 retarget.

Resolved VERSION + package.json + CHANGELOG conflicts. Our v0.41.9.0
remains correct — it deliberately skipped past master's allocator to
avoid collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master:
  v0.41.10.1 fix-wave: dream.* config + batch retry + extract_atoms idempotency + ze-switch env-gate (garrytan#1445)
  v0.41.10.0 feat: orphan reduction via --by-mention + UTF-16 surrogate-pair fix (garrytan#1442)
  v0.41.9.0 — UX/reliability fix wave (5 defects from production report) (garrytan#1440)
  v0.41.8.0 fix(pglite): search/query/get exit cleanly + garrytan#1340 hint + garrytan#1342 breadcrumbs (garrytan#1405)
  v0.41.7.0 feat: compact list-format resolver + 300-skill scaling tutorial (garrytan#1407)
  v0.41.6.0 feat(ci): CI test speedup — 23min → ~9min via matrix 4→6 + weight-aware sharding + auto SHA cache + parallel verify (garrytan#1444)
  v0.41.5.0 fix-wave: warm-narwhal — 6 community PRs + E2E reliability (garrytan#1374)

# Conflicts:
#	src/core/ai/recipes/openai.ts
garrytan-agents pushed a commit to garrytan-agents/gbrain that referenced this pull request Jun 13, 2026
…weight-aware sharding + auto SHA cache + parallel verify (garrytan#1444)

* feat(ci): scripts/run-verify-parallel.sh — parallel verify dispatcher

Fans out the 21 pre-test grep guards via & + wait, captures per-check
exit codes in a tempdir, aggregates failures with named check + log
tail to stderr on miss. Wallclock 27s sequential → 13s parallel
locally (2x). Bigger CI win is shard 1 deload (workflow restructure
in a later commit).

Pinned by test/scripts/run-verify-parallel.test.ts (6 cases: CLI
contract + synthetic dispatcher failure-surfacing).

* feat(ci): weight-aware LPT bin-packer + auto SHA cache hash

scripts/sharding.ts (NEW) — pure TypeScript LPT bin-packer. Sort
weights desc, assign each file to the shard with current minimum total.
Worst-case makespan within 4/3 of optimal, O(n log n). Missing weights
fall back to corpus median (not 0). New test file → ships immediately
without regenerating weights. Pinned by test/scripts/sharding.test.ts
(23 cases).

scripts/mine-shard-weights.ts (NEW) — scrapes per-file timing from
gh run view --log via timestamp delta between ##[group]test/foo.test.ts:
headers within a shard. Three input modes: --run <ID>, --from-file
<PATH>, stdin. Stable JSON output (sorted keys). Initial weights mined
from run 26398061007. Pinned by test/scripts/mine-shard-weights.test.ts
(15 cases).

scripts/ci-cache-hash.sh (NEW) — deterministic 16-char sha256 over
git ls-files -s minus deny-list (CHANGELOG/TODOS/README/LICENSE/
docs/**/*.md). CLAUDE.md, AGENTS.md, skills/**/* deliberately
INCLUDED (8+ test files read them; deny-listing would create
false-pass holes). ~40ms on 1891 files. Pinned by
test/scripts/ci-cache-hash.test.ts (24 cases: 8 CRITICAL false-pass
guards + 7 SAFE deny-list invariants + 9 edge cases).

scripts/test-weights.json (NEW) — 712 weights. Total 3306s observed
runtime; median 30ms; max 6 min outlier.

* chore: bump version and changelog (v0.41.6.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant