ci: shard the e2e gate (sequential + smoketest) into parallel matrices#990
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: No issues found. The PR successfully implements CI sharding with robust error handling. Key improvements in latest commit (323fe54):
Implementation quality:
History:
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-30 17:15 UTC |
…th 2 extra compiles)
…mpty test list Addresses PR bsv-blockchain#990 review findings bsv-blockchain#2 (shard/db interaction) and bsv-blockchain#3 (silent empty-list pass). Finding bsv-blockchain#1 (list_test_shard index increment) declined: the current skip-without-increment is correct even distribution; see PR thread.
|
ordishs
left a comment
There was a problem hiding this comment.
Approve. Solid, well-reasoned change — recovering the 4 orphaned sequential tests is a real correctness win beyond the speedup, and the write-up is honest about tradeoffs and rejected configs.
One suggested fix before/with merge (non-blocking but worth it):
Makefile smoketest recipe doesn't propagate a list_test_shard.sh failure. Make runs recipe lines with sh -c (no set -e), so if list_test_shard.sh exits non-zero the command substitution yields empty stdout, RUN_ARG="", and -run "" matches every test. A shard whose enumeration failed would silently run the full unsharded suite and pass green — the exact outcome the script's careful fail-loud handling was written to prevent.
Suggested guard:
RUN_ARG="$$(test/scripts/list_test_shard.sh ...)" || exit 1;(or set -e at the top of the recipe block). Quick check: make smoketest TOTAL=3 SHARD=9 should fail the job rather than run all tests.
Minor follow-ups, none blocking:
- Empty shard selection emits
^$→ runs 0 tests and passes green; consider warning/failing instead. - Wiring
shard_selftest.shinto CI (pure--list-only, no compile) would lock in the exhaustive+disjoint invariant.



Phase 2 of the CI-speed work (Phase 1 = #986). Targets the PR-feedback bottleneck:
pr-smoke≈ 21 min, set by its two longest jobs.Problem
sequential-postgres(~19 m) andsequential-aerospike(~21 m) are not two databases —run_tests_sequentially.sh --db Xis a substring match on the test function name, so the two jobs ran disjoint, uneven name-partitioned subsets (tests self-provision their backend via testcontainers).smoketest(~17 m) ran its package with-parallel 1. These three set the gate.Surfaced while mapping the suite: 4 sequential tests ran nowhere —
TestConcurrentDuplicateDetection,TestEarlyDuplicateAcrossSubtrees,TestMultipleEarlyDuplicatesInSameBlock,TestValidBlockWithSpentAndUnrelatedmatched none of the sqlite/postgres/aerospike name buckets. (The sqlite bucket was empty too — the no-op job removed in #986 tested zero.)Change
run_tests_sequentially.shgains--shard i --total N(even index partition) and--list-only. Newshard_selftest.shproves the partition is exhaustive and disjoint offline.sequential-*jobs collapse into onesequentialmatrix job, 7 even shards. The full 116-function suite is partitioned, so the 4 orphaned tests now run.smoketestbecomes a 3-shard matrix via newlist_test_shard.sh(go test -listpartition, enumerated with the same build config the run uses so no test is silently dropped). Existing-skiplist preserved.Results (tuned over 4 CI runs on this PR)
Shard counts were tuned empirically. Per-shard wall-clock, slowest shard per suite:
pr-smokegate: ~21 m → ~10 m. Finalseq=7 / smoke=3: both suites land ~9.2–9.3 m, at thepr-testsfloor (~9 m, unsharded). All checks green, including the 4 newly-run tests.seq=8/smoke=4was tested and rejected:smoke=4regressed (the even-index split isolated one heavy test into a shard; more shards can't balance a single fat test), andseq=8doesn't move the gate (smoke-bound). Going below ~9 m is wasted —pr-testsfloors the PR there. Further gains would need duration-weighted sharding or a build cache (out of scope).Risk
Sequential shards still run
-parallel 1internally — no in-job concurrency change, just spread across runners. Raising-parallelis a separate later step behind an isolation audit. More concurrent runner-minutes, roughly flat total compute; measured stagger from concurrency was negligible (~0.1–0.8 m at up to 15 concurrent jobs). Validated: shard self-test green, partition exhaustive + disjoint, all CI checks pass.