v0.26.4 test: parallel unit-test loop (12x speedup, failure-first logging) by garrytan · Pull Request #605 · garrytan/gbrain

garrytan · 2026-05-04T03:09:55Z

Summary

bun run test finishes in ~85 seconds. It was 18 minutes. 12x speedup via 8-shard parallel fan-out + dedicated failure-log file.

Test infra (commits 1-2):

New scripts/run-unit-parallel.sh (~340 LOC). Spawns N=min(8, cpu_count) shards via existing scripts/run-unit-shard.sh. Per-shard 600s gtimeout/timeout/bg-pid wallclock cap. Single-writer post-shard failure aggregation (no concurrent-write hazards). 10s heartbeat to stderr proving it isn't wedged.
New scripts/run-serial-tests.sh runs *.serial.test.ts files at --max-concurrency=1 after the parallel pass.
scripts/run-unit-shard.sh accepts --max-concurrency=N; excludes *.serial.test.ts alongside *.slow.test.ts.
package.json script tier split: test (fast loop) / verify (CI-narrow gates) / test:full (verify + parallel + slow + smart e2e) / test:serial / test:e2e / check:all.

Failure-first logging (commits 1-3):

.context/test-failures.log — extracted failure blocks per shard, prefixed --- shard N: <test name> ---. Falls back to /tmp/ if .context/ is unwritable.
.context/test-summary.txt — one-line-per-shard pass=X fail=Y skip=Z rc=W.
Stderr banner with absolute path + tail-30 of failure log on any failure. Survives | head / | tail / agent log truncation.
.gitignore adds .context/.

CI gate tightening (commit 3):

.github/workflows/test.yml now runs bun run verify (was: 4 specific scripts inlined). Privacy check now actually fires on every CI run; previously it ran only when somebody manually invoked the old bun run test chain. Caught two pre-existing Wintermute references in src/core/mounts-cache.ts and replaced with your OpenClaw per CLAUDE.md privacy rule.

Quarantine (commit 3):

test/brain-registry.test.ts → test/brain-registry.serial.test.ts (1 case fails under cross-file contention)
test/reconcile-links.test.ts → test/reconcile-links.serial.test.ts (beforeEach hook timeout under contention)

Both pass alone. The proper architectural fix (sweep ~58 PGLite + ~40 env-mutation + 2 mock.module sites + add --concurrent flag) is filed as a P0 TODO for v0.27+.

Regression tests (commit 4, 4 files, 13 cases):

test/scripts/run-unit-parallel.test.ts — exit-code propagation + failure-log contract (uses 4-fixture tempdir, ~500ms)
test/scripts/run-unit-shard.test.ts — exclusion symmetry (slow + serial + e2e all excluded)
test/scripts/serial-files.test.ts — discovery + concurrency=1 + disjoint from unit-shard set
test/privacy-script-wired.test.ts — updated to assert verify chains check:privacy AND test.yml calls bun run verify

Docs (commit 5):

CLAUDE.md Testing section rewritten with tier table, intentional CI-vs-local divergence section, failure-log contract, file taxonomy.
CHANGELOG.md ## [0.26.4] entry per voice rules.
TODOS.md adds P0: intra-file parallelism via --concurrent (~1-2 weeks; target bun run test <30s) and P1: E2E template-DB parallelism.
llms-full.txt regenerated.

Plus merge with master (v0.26.3 admin observability landed during this branch): Merged. verify extended to run master's check:admin-build.sh gate (vite build of admin React app). Tests still 88s green post-merge.

Test Coverage

4 new test files, 13 cases for the new wrapper / shard / serial / privacy contracts.
All ran green inline against a 4-fixture tempdir.
Tests: ~3650 → ~3650 + 13 (regression suite for v0.26.4 itself).

Pre-Landing Review

Eng review and Codex outside-voice review were run interactively during planning. Codex flagged 4 critical structural issues (Bun native --shard non-functional on file lists, parity test impossible by design, freshPglite() contradicts existing resetPglite() helper, verify was redefining the ship gate). All 4 resolved by user via AskUserQuestion. Plan file: ~/.claude/plans/system-instruction-you-are-working-tranquil-ladybug.md — full GSTACK REVIEW REPORT included.

TODOS

NEW P0: Intra-file parallelism via --concurrent flag. Sweep ~58 PGLite + ~40 env-mutation + 2 mock.module sites using existing test/helpers/reset-pglite.ts (do NOT introduce freshPglite() — codex correctly flagged that the repo already rejected that direction). Target bun run test <30s.
NEW P1: E2E parallelism via Postgres template databases. ~1-2 days.

Test plan

bun run verify green (privacy + jsonb + progress + wasm + admin-build + typecheck)
bun run test 88s, 3657 pass / 0 fail / 0 skip (8 parallel shards + 34 serial)
Failure-log contract verified: deliberately failing test produces --- shard N: prefix in .context/test-failures.log + loud stderr banner
All 13 regression tests green
CI workflow updated to call bun run verify (single-source-of-truth for ship gate)

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Lay foundation for v0.26.4 parallel test loop: - scripts/run-unit-parallel.sh: spawns N shards (default min(8, cpu_count)) via run-unit-shard.sh, captures per-shard logs, post-shard single-writer failure-log aggregation at .context/test-failures.log, 10s heartbeat to stderr, per-shard 600s timeout (gtimeout/timeout/bg-pid fallback chain), loud final banner with absolute path + tail-30 of failures, summary file for at-a-glance status. Single writer eliminates concurrent-write hazards on the failure log. - scripts/run-serial-tests.sh: discovers *.serial.test.ts files (concurrency- unsafe by design), runs them with --max-concurrency=1. Invoked after the parallel pass. - scripts/run-unit-shard.sh: now accepts --max-concurrency=N (forwarded to bun test); --dry-run-list moved into argv parsing alongside; excludes *.serial.test.ts in addition to *.slow.test.ts. - bunfig.toml: trim stale comment about typecheck-chained timeout. - .gitignore: add .context/ (Conductor workspace artifacts directory; the failure log + summary + per-shard logs all live here). No package.json changes yet (commit 2). No test reorganization yet (commits 4-7). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…commit 2/8) Per Codex Tension #4 (verify scope), distinguish three tiers cleanly: - `bun run test` = fast loop, file-level parallel fan-out via the new wrapper (scripts/run-unit-parallel.sh). No pre-checks, no typecheck, no wasm compile in the hot path. ~15s of pre-test gates removed. - `bun run verify` = CI's authoritative gate set: check:jsonb + check:progress + check:wasm + typecheck. Matches what .github/workflows/test.yml runs on shard 1, no scope drift. The 4 checks not in CI (privacy, no-legacy-getconnection, trailing-newline, exports-count) move to `bun run check:all` for opt-in local use. - `bun run test:full` = verify + parallel + slow + smart e2e (runs e2e only if DATABASE_URL is set; else loud skip notice to stderr per Open Item #7). The local equivalent of "everything CI runs." Adds `bun run test:serial` for the *.serial.test.ts subset (concurrency- unsafe files run with --max-concurrency=1). Bumps VERSION + package.json to 0.26.4. Both move together per the CI version-gate contract in CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wave: makes the new wrapper actually green and tightens the CI gate it exposed. Wrapper bug fixes (scripts/run-unit-parallel.sh): - grep_count helper: avoids the `grep -c | echo 0` double-output bug where 0 matches yields a 2-line "0\n0" string and breaks arithmetic. - bun_summary_count helper: parses Bun's actual end-of-shard summary format (`N pass` / `N fail` / `N skip`), not the per-test markers (which are `✓` / `(fail)`, never `(pass)` / `(skip)`). - Heartbeat now reads `^\s+✓` (Bun's per-test pass marker) for live progress mid-run; final summary still uses the summary-line counts for accuracy. Privacy gate tightening: - Move scripts/check-privacy.sh into `bun run verify` (was previously only in the now-removed `bun run test` chain). Without this, after commit 2 the privacy check ran in nothing automatic. - .github/workflows/test.yml now calls `bun run verify` instead of inlining the gate list. Single source of truth for "what's the ship gate." This is what verify == CI was supposed to mean per Codex T#4. - Pre-existing `Wintermute` references in src/core/mounts-cache.ts:6 and :324 caught by the now-running gate; replaced with `your OpenClaw` per CLAUDE.md privacy rule (verify gate now passes on master HEAD). - test/privacy-script-wired.test.ts updated: regression guard now asserts verify includes check:privacy AND that test.yml runs `bun run verify`, replacing the obsolete "test script includes check-privacy.sh" assertion. Quarantine 2 cross-file-contention flakes: - test/brain-registry.test.ts: 28 tests pass alone (41ms); 1 test ("empty/null/undefined id routes to host") fails when run alongside other files in the same shard. Renamed → *.serial.test.ts so it runs in scripts/run-serial-tests.sh's serial pass after the parallel pass completes. - test/reconcile-links.test.ts: 6 tests pass alone (1s); a beforeEach hook times out (~896s) under cross-file contention. Same treatment. Both flakes are bun-process-level shared-state leaks (PGLite singletons or top-level imports). Fixing them properly is the v0.27.0+ intra-file parallelism project (TODO P0 — see commit 5). Measurement after this commit: bun run test = 94s (was 18 min sequential) 3639 pass, 0 fail, 0 skip across 8 parallel shards + 34 serial tests Failure-log + heartbeat + summary all working Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…commit 4/5) Three regression suites pin the v0.26.4 contracts. Without these, future refactors of the wrapper or shard scripts could silently regress the work in commits 1-3. test/scripts/run-unit-shard.test.ts (4 cases — gap b): - Asserts the unit-shard `--dry-run-list` output excludes every *.slow.test.ts and *.serial.test.ts file, plus the test/e2e/ subtree. - Catches a future `find` expression that drops one of the `-not -name` clauses and silently un-quarantines slow/serial files into the parallel pass. test/scripts/serial-files.test.ts (3 cases — gap e): - Every checked-in *.serial.test.ts (via `git ls-files`) is listed by scripts/run-serial-tests.sh's `--dry-run-list`. - The script's source contains `bun test --max-concurrency=1` (the serial-pass guarantee that quarantined files don't run intra-file concurrent and reintroduce the contention they were quarantined for). - Disjoint set: a file is never in both the unit-shard list AND the serial list — pins the carve-out contract. test/scripts/run-unit-parallel.test.ts (6 cases — gaps a + d): - Exit-code propagation (a): wrapper exits non-zero when ANY shard has a failing test; exits zero when all pass. The hardest contract to silently break in a fan-out wrapper (`for ... &; wait` returns the LAST child's status, not any failure's). - Failure-log contract (d): on failure, .context/test-failures.log exists, is non-empty, contains the `--- shard N:` prefix and the failing test's describe text. Stderr banner contains the absolute log path. On success, the log is cleared (no stale content). - Summary file format: `shard N/M: pass=X fail=Y skip=Z rc=W` per shard, machine-parseable for future tooling. The wrapper test runs against a 4-file tempdir (3 pass + 1 fail) so it executes in ~500ms; spawning the wrapper against the real test suite would take ~90s and isn't worth the cost in a regression suite. All 13 cases pass on first run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mmit 5/5) Closes the v0.26.4 ship. CLAUDE.md Testing section rewritten: - New tier table: test (fast loop, 85s) / verify (CI gates, 12s) / test:full (everything local) / test:slow / test:serial / test:e2e / check:all. Each row names its scope, wallclock, and when to use. - Intentional CI vs local divergence section: CI matrix (test-shard.sh, hash-bucketed, includes slow) vs local fast loop (run-unit-shard.sh, round-robin, excludes slow + serial). Codex correctly flagged that a parity test would always fail by design — this is the documentation that explains why. - Failure-first logging contract: .context/test-failures.log format, stderr banner, summary file, wedge handling. - File taxonomy: *.test.ts / *.slow.test.ts / *.serial.test.ts / test/e2e/. Names the two currently-quarantined files and points at the intra-file P0 TODO for the proper fix. CHANGELOG.md `## [0.26.4]` entry per voice rules: - Two-line headline: "bun run test finishes in 85 seconds. Was 18 minutes." + failure-log directive. - Lead paragraph names what shipped and why. - Numbers-that-matter table: BEFORE / AFTER / Δ for wallclock, pre-test gates, failure visibility, shards, pipe-survival. - "What this means for you" closing tied to the inner-loop user. - "To take advantage of v0.26.4" block per the v0.13+ self-repair template (gbrain upgrade + contributor steps). - Itemized changes by area (new scripts, script extensions, package.json tier split, CI tightening, failure-first logging, quarantine, regression tests, bunfig). - "What did NOT ship" section names the intra-file project + E2E template-DB project as P0/P1 follow-ups with concrete acceptance criteria. - Process section names the codex review + scope-correction loop honestly: "snapped back to ship today once empirical measurement showed Bun's --max-concurrency does nothing on tests not marked test.concurrent()." - For-contributors note on portability + single-writer + fallback paths. TODOS.md adds two P-rated entries: - P0: intra-file parallelism via --concurrent flag. Sweep ~58 PGLite sites + ~40 env mutations + 2 mock.module sites. Target: bun run test < 30s. ~1-2 weeks. Detailed acceptance criteria. References Codex findings and plan-file rationale. - P1: E2E parallelism via Postgres template databases. CREATE DATABASE TEMPLATE gbrain_template per test file. ~1-2 days. llms.txt + llms-full.txt regenerated via `bun run build:llms` to absorb the CLAUDE.md changes (per CLAUDE.md's "After any release ship that touches the Key Files annotations in CLAUDE.md, run bun run build:llms" rule). The build-llms regression test was firing in shard 7 of the parallel pass — caught the drift, regeneration cleared it. Final measurement after fix: 94s wallclock, 3652 pass, 0 fail across 8 parallel shards + 34 serial tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ests # Conflicts: # CHANGELOG.md # VERSION # package.json # src/core/mounts-cache.ts

Master added two more commits: - d97f159 v0.26.4 test: parallel unit-test loop (12x speedup, #605) - 0de9eb6 v0.26.5 feat: destructive operation guard end-to-end (#600) Resolved three conflicts (all version bookkeeping): - VERSION: kept 0.26.6 (this branch's version, ahead of master's 0.26.5) - package.json: kept 0.26.6 - CHANGELOG.md: my v0.26.6 entry on top, then master's new v0.26.5 + v0.26.4 blocks below. Final order: 0.26.6 → 0.26.5 → 0.26.4 → 0.26.3 → 0.26.2 → 0.26.1 → 0.26.0 (top to bottom, contiguous). Schema-drift gate sanity check post-merge: - Master's v0.26.5 destructive-guard work added pages.deleted_at (with partial index pages_deleted_at_purge_idx) and three columns on sources (archived, archived_at, archive_expires_at). All four are present in BOTH src/schema.sql AND src/core/pglite-schema.ts — master kept them in lockstep, so the gate is satisfied automatically. - access_tokens.id is still UUID DEFAULT gen_random_uuid() in both engines (my v0.26.3 D6 fix preserved across the merge). - Typecheck clean, schema-diff unit tests 17/17 pass, privacy script clean (master's v0.26.4 work fixed the Wintermute references in mounts-cache.ts that I had patched earlier — converged independently). Regenerated llms-full.txt to match the merged CHANGELOG.

garrytan and others added 6 commits May 3, 2026 17:05

Merge remote-tracking branch 'origin/master' into garrytan/parallel-t…

401defc

…ests # Conflicts: # CHANGELOG.md # VERSION # package.json # src/core/mounts-cache.ts

garrytan merged commit d97f159 into master May 4, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.26.4 test: parallel unit-test loop (12x speedup, failure-first logging)#605

v0.26.4 test: parallel unit-test loop (12x speedup, failure-first logging)#605
garrytan merged 6 commits intomasterfrom
garrytan/parallel-tests

garrytan commented May 4, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 4, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

TODOS

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented May 4, 2026 •

edited by blacksmith-sh Bot

Loading