ci: shard the e2e gate (sequential + smoketest) into parallel matrices by oskarszoon · Pull Request #990 · bsv-blockchain/teranode

oskarszoon · 2026-05-29T19:12:45Z

Phase 2 of the CI-speed work (Phase 1 = #986). Targets the PR-feedback bottleneck: pr-smoke ≈ 21 min, set by its two longest jobs.

Problem

sequential-postgres (~19 m) and sequential-aerospike (~21 m) are not two databases — run_tests_sequentially.sh --db X is a substring match on the test function name, so the two jobs ran disjoint, uneven name-partitioned subsets (tests self-provision their backend via testcontainers). smoketest (~17 m) ran its package with -parallel 1. These three set the gate.

Surfaced while mapping the suite: 4 sequential tests ran nowhere — TestConcurrentDuplicateDetection, TestEarlyDuplicateAcrossSubtrees, TestMultipleEarlyDuplicatesInSameBlock, TestValidBlockWithSpentAndUnrelated matched none of the sqlite/postgres/aerospike name buckets. (The sqlite bucket was empty too — the no-op job removed in #986 tested zero.)

Change

run_tests_sequentially.sh gains --shard i --total N (even index partition) and --list-only. New shard_selftest.sh proves the partition is exhaustive and disjoint offline.
The two sequential-* jobs collapse into one sequential matrix job, 7 even shards. The full 116-function suite is partitioned, so the 4 orphaned tests now run.
smoketest becomes a 3-shard matrix via new list_test_shard.sh (go test -list partition, enumerated with the same build config the run uses so no test is silently dropped). Existing -skip list preserved.

Results (tuned over 4 CI runs on this PR)

Shard counts were tuned empirically. Per-shard wall-clock, slowest shard per suite:

seq N	smoke N	seq slowest	smoke slowest	workflow wall
4	3	13m09	9m35	~13 m
7	3	9m12	9m18	~10 m ← chosen
8	4	8m45	9m47	~10 m

pr-smoke gate: ~21 m → ~10 m. Final seq=7 / smoke=3: both suites land ~9.2–9.3 m, at the pr-tests floor (~9 m, unsharded). All checks green, including the 4 newly-run tests.

seq=8/smoke=4 was tested and rejected: smoke=4 regressed (the even-index split isolated one heavy test into a shard; more shards can't balance a single fat test), and seq=8 doesn't move the gate (smoke-bound). Going below ~9 m is wasted — pr-tests floors the PR there. Further gains would need duration-weighted sharding or a build cache (out of scope).

Risk

Sequential shards still run -parallel 1 internally — no in-job concurrency change, just spread across runners. Raising -parallel is a separate later step behind an isolation audit. More concurrent runner-minutes, roughly flat total compute; measured stagger from concurrency was negligible (~0.1–0.8 m at up to 15 concurrent jobs). Validated: shard self-test green, partition exhaustive + disjoint, all CI checks pass.

github-actions · 2026-05-29T19:13:45Z

🤖 Claude Code Review

Status: Complete

Current Review:

No issues found. The PR successfully implements CI sharding with robust error handling.

Key improvements in latest commit (323fe54):

Added mutual exclusion guard between --db and --shard/--total flags
Enhanced error handling in list_test_shard.sh to fail loudly on empty test lists or go test -list failures
Prevents silent green passes when no tests execute

Implementation quality:

Proper input validation with clear error messages
Exhaustive + disjoint partition verified by shard_selftest.sh
Correct handling of skipped tests (counter only increments for non-skipped tests)
Comprehensive documentation in comments

History:

✅ Previous review finding Bump esbuild, vite and @sveltejs/vite-plugin-svelte in /ui/dashboard #1 (index increment): Correctly declined - current logic achieves even distribution
✅ Previous review finding [FEAT] Peer service should sanity check asset_httpPublicAddress #2 (--db/--shard interaction): Fixed with mutual exclusion guard
✅ Previous review finding Why can mining nodes run without downloading the entire blockchain? #3 (silent empty-list pass): Fixed with explicit error checks

github-actions · 2026-05-29T19:28:15Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-990 (0197e40)

Summary

Regressions: 0
Improvements: 0
Unchanged: 144
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.639µ	1.730µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	73.45n	71.11n	~	0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	71.07n	71.36n	~	0.100
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	71.17n	71.35n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	34.40n	34.82n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	61.94n	57.63n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	157.5n	175.3n	~	0.700
MiningCandidate_Stringify_Short-4	222.6n	226.9n	~	0.100
MiningCandidate_Stringify_Long-4	1.661µ	1.665µ	~	0.300
MiningSolution_Stringify-4	863.3n	864.1n	~	1.000
BlockInfo_MarshalJSON-4	1.769µ	1.762µ	~	0.700
NewFromBytes-4	123.9n	132.7n	~	0.600
AddTxBatchColumnar_Validation-4	2.631µ	2.498µ	~	0.200
OffsetValidationLoop-4	543.6n	544.0n	~	1.000
Mine_EasyDifficulty-4	66.88µ	68.01µ	~	0.100
Mine_WithAddress-4	7.090µ	7.011µ	~	0.400
DiskTxMap_SetIfNotExists-4	3.530µ	3.278µ	~	0.100
DiskTxMap_SetIfNotExists_Parallel-4	3.310µ	3.221µ	~	0.400
DiskTxMap_ExistenceOnly-4	315.3n	300.3n	~	0.100
Queue-4	192.9n	193.3n	~	1.000
AtomicPointer-4	4.922n	4.583n	~	1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	952.8µ	916.6µ	~	0.400
ReorgOptimizations/DedupFilterPipeline/New/10K-4	851.5µ	837.8µ	~	0.200
ReorgOptimizations/AllMarkFalse/Old/10K-4	120.9µ	115.7µ	~	0.100
ReorgOptimizations/AllMarkFalse/New/10K-4	62.73µ	62.75µ	~	1.000
ReorgOptimizations/HashSlicePool/Old/10K-4	62.17µ	56.99µ	~	0.200
ReorgOptimizations/HashSlicePool/New/10K-4	12.79µ	12.23µ	~	0.100
ReorgOptimizations/NodeFlags/Old/10K-4	4.755µ	4.797µ	~	1.000
ReorgOptimizations/NodeFlags/New/10K-4	1.610µ	1.608µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	9.938m	9.754m	~	0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4	10.345m	9.958m	~	0.700
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.128m	1.143m	~	1.000
ReorgOptimizations/AllMarkFalse/New/100K-4	691.3µ	682.9µ	~	0.100
ReorgOptimizations/HashSlicePool/Old/100K-4	688.2µ	601.4µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	289.3µ	289.4µ	~	1.000
ReorgOptimizations/NodeFlags/Old/100K-4	50.51µ	51.29µ	~	1.000
ReorgOptimizations/NodeFlags/New/100K-4	17.83µ	17.96µ	~	0.700
TxMapSetIfNotExists-4	53.55n	52.39n	~	0.100
TxMapSetIfNotExistsDuplicate-4	41.30n	40.61n	~	0.400
ChannelSendReceive-4	630.9n	615.6n	~	0.100
BlockAssembler_AddTx-4	0.02931n	0.02551n	~	0.200
AddNode-4	10.81	11.04	~	0.100
AddNodeWithMap-4	11.28	11.96	~	0.400
DirectSubtreeAdd/4_per_subtree-4	55.69n	57.57n	~	1.000
DirectSubtreeAdd/64_per_subtree-4	29.03n	28.88n	~	0.700
DirectSubtreeAdd/256_per_subtree-4	27.77n	28.49n	~	0.100
DirectSubtreeAdd/1024_per_subtree-4	26.51n	26.48n	~	1.000
DirectSubtreeAdd/2048_per_subtree-4	26.10n	26.03n	~	0.700
SubtreeProcessorAdd/4_per_subtree-4	296.1n	299.9n	~	1.000
SubtreeProcessorAdd/64_per_subtree-4	288.6n	302.1n	~	0.400
SubtreeProcessorAdd/256_per_subtree-4	286.7n	298.6n	~	0.100
SubtreeProcessorAdd/1024_per_subtree-4	277.1n	276.8n	~	1.000
SubtreeProcessorAdd/2048_per_subtree-4	279.0n	276.7n	~	0.100
SubtreeProcessorRotate/4_per_subtree-4	284.8n	282.9n	~	0.100
SubtreeProcessorRotate/64_per_subtree-4	286.9n	281.5n	~	0.100
SubtreeProcessorRotate/256_per_subtree-4	283.8n	281.5n	~	0.100
SubtreeProcessorRotate/1024_per_subtree-4	284.4n	278.9n	~	0.200
SubtreeNodeAddOnly/4_per_subtree-4	54.94n	55.20n	~	0.700
SubtreeNodeAddOnly/64_per_subtree-4	36.04n	36.16n	~	0.100
SubtreeNodeAddOnly/256_per_subtree-4	35.14n	35.08n	~	1.000
SubtreeNodeAddOnly/1024_per_subtree-4	34.55n	34.82n	~	0.700
SubtreeCreationOnly/4_per_subtree-4	110.7n	111.7n	~	0.100
SubtreeCreationOnly/64_per_subtree-4	350.2n	350.5n	~	0.400
SubtreeCreationOnly/256_per_subtree-4	1.231µ	1.236µ	~	0.400
SubtreeCreationOnly/1024_per_subtree-4	3.780µ	3.765µ	~	0.600
SubtreeCreationOnly/2048_per_subtree-4	6.816µ	6.778µ	~	0.400
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	280.9n	279.8n	~	0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	279.7n	277.0n	~	0.200
ParallelGetAndSetIfNotExists/1k_nodes-4	2.014m	2.002m	~	0.400
ParallelGetAndSetIfNotExists/10k_nodes-4	5.219m	5.223m	~	1.000
ParallelGetAndSetIfNotExists/50k_nodes-4	7.108m	7.157m	~	0.200
ParallelGetAndSetIfNotExists/100k_nodes-4	9.665m	9.756m	~	0.700
SequentialGetAndSetIfNotExists/1k_nodes-4	1.795m	1.794m	~	1.000
SequentialGetAndSetIfNotExists/10k_nodes-4	4.418m	4.634m	~	0.700
SequentialGetAndSetIfNotExists/50k_nodes-4	13.41m	13.81m	~	0.100
SequentialGetAndSetIfNotExists/100k_nodes-4	24.80m	25.71m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	2.059m	2.039m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	8.483m	8.422m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	13.27m	13.32m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	1.821m	1.809m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	8.179m	7.981m	~	0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	46.59m	44.02m	~	0.100
CalcBlockWork-4	503.9n	500.5n	~	0.400
CalculateWork-4	687.4n	691.2n	~	1.000
BuildBlockLocatorString_Helpers/Size_10-4	1.366µ	1.356µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	16.12µ	16.39µ	~	1.000
BuildBlockLocatorString_Helpers/Size_1000-4	128.6µ	130.7µ	~	0.200
CatchupWithHeaderCache-4	104.6m	104.6m	~	1.000
_BufferPoolAllocation/16KB-4	4.099µ	4.009µ	~	0.200
_BufferPoolAllocation/32KB-4	8.506µ	11.504µ	~	0.100
_BufferPoolAllocation/64KB-4	16.80µ	17.44µ	~	0.700
_BufferPoolAllocation/128KB-4	26.11µ	32.41µ	~	0.100
_BufferPoolAllocation/512KB-4	120.3µ	114.3µ	~	0.100
_BufferPoolConcurrent/32KB-4	20.00µ	19.63µ	~	0.400
_BufferPoolConcurrent/64KB-4	31.71µ	32.89µ	~	0.700
_BufferPoolConcurrent/512KB-4	158.7µ	155.8µ	~	1.000
_SubtreeDeserializationWithBufferSizes/16KB-4	711.2µ	679.0µ	~	0.100
_SubtreeDeserializationWithBufferSizes/32KB-4	704.2µ	635.0µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	706.5µ	639.7µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	706.4µ	641.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	637.5µ	641.5µ	~	1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4	37.18m	37.28m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4	37.15m	37.52m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4	36.97m	37.44m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4	36.94m	37.40m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/512KB-4	36.69m	37.17m	~	0.100
_PooledVsNonPooled/Pooled-4	744.3n	739.5n	~	0.100
_PooledVsNonPooled/NonPooled-4	8.545µ	8.054µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.644µ	7.086µ	~	0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4	11.01µ	11.71µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.982µ	11.430µ	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.401m	1.453m	~	0.700
SubtreeSizes/10k_tx_16_per_subtree-4	333.9µ	342.0µ	~	0.100
SubtreeSizes/10k_tx_64_per_subtree-4	78.76µ	82.51µ	~	0.100
SubtreeSizes/10k_tx_256_per_subtree-4	19.65µ	20.22µ	~	0.100
SubtreeSizes/10k_tx_512_per_subtree-4	9.742µ	9.925µ	~	0.100
SubtreeSizes/10k_tx_1024_per_subtree-4	4.838µ	4.966µ	~	0.100
SubtreeSizes/10k_tx_2k_per_subtree-4	2.457µ	2.462µ	~	0.500
BlockSizeScaling/10k_tx_64_per_subtree-4	77.20µ	78.65µ	~	0.100
BlockSizeScaling/10k_tx_256_per_subtree-4	19.50µ	19.65µ	~	0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.876µ	4.835µ	~	0.700
BlockSizeScaling/50k_tx_64_per_subtree-4	406.5µ	405.1µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	96.57µ	96.80µ	~	0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4	24.16µ	24.28µ	~	0.700
SubtreeAllocations/small_subtrees_exists_check-4	163.0µ	161.5µ	~	0.400
SubtreeAllocations/small_subtrees_data_fetch-4	168.6µ	170.3µ	~	0.700
SubtreeAllocations/small_subtrees_full_validation-4	337.7µ	339.2µ	~	1.000
SubtreeAllocations/medium_subtrees_exists_check-4	9.747µ	9.652µ	~	0.200
SubtreeAllocations/medium_subtrees_data_fetch-4	10.19µ	10.04µ	~	0.100
SubtreeAllocations/medium_subtrees_full_validation-4	19.74µ	19.76µ	~	0.700
SubtreeAllocations/large_subtrees_exists_check-4	2.333µ	2.310µ	~	0.700
SubtreeAllocations/large_subtrees_data_fetch-4	2.460µ	2.453µ	~	0.800
SubtreeAllocations/large_subtrees_full_validation-4	4.917µ	4.919µ	~	1.000
_prepareTxsPerLevel-4	407.1m	402.4m	~	0.700
_prepareTxsPerLevelOrdered-4	3.764m	3.635m	~	0.100
_prepareTxsPerLevel_Comparison/Original-4	395.9m	397.5m	~	0.400
_prepareTxsPerLevel_Comparison/Optimized-4	3.637m	3.787m	~	0.200
StoreBlock_Sequential/BelowCSVHeight-4	328.9µ	329.5µ	~	1.000
StoreBlock_Sequential/AboveCSVHeight-4	331.7µ	332.9µ	~	0.700
GetUtxoHashes-4	263.9n	257.4n	~	0.700
GetUtxoHashes_ManyOutputs-4	42.46µ	42.51µ	~	1.000
_NewMetaDataFromBytes-4	228.1n	229.2n	~	0.200
_Bytes-4	399.1n	403.3n	~	0.100
_MetaBytes-4	138.0n	138.7n	~	0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-30 17:15 UTC

…th 2 extra compiles)

…mpty test list Addresses PR bsv-blockchain#990 review findings bsv-blockchain#2 (shard/db interaction) and bsv-blockchain#3 (silent empty-list pass). Finding bsv-blockchain#1 (list_test_shard index increment) declined: the current skip-without-increment is correct even distribution; see PR thread.

sonarqubecloud · 2026-05-30T17:12:30Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

ordishs

Approve. Solid, well-reasoned change — recovering the 4 orphaned sequential tests is a real correctness win beyond the speedup, and the write-up is honest about tradeoffs and rejected configs.

One suggested fix before/with merge (non-blocking but worth it):

Makefile smoketest recipe doesn't propagate a list_test_shard.sh failure. Make runs recipe lines with sh -c (no set -e), so if list_test_shard.sh exits non-zero the command substitution yields empty stdout, RUN_ARG="", and -run "" matches every test. A shard whose enumeration failed would silently run the full unsharded suite and pass green — the exact outcome the script's careful fail-loud handling was written to prevent.

Suggested guard:

RUN_ARG="$$(test/scripts/list_test_shard.sh ...)" || exit 1;

(or set -e at the top of the recipe block). Quick check: make smoketest TOTAL=3 SHARD=9 should fail the job rather than run all tests.

Minor follow-ups, none blocking:

Empty shard selection emits ^$ → runs 0 tests and passes green; consider warning/failing instead.
Wiring shard_selftest.sh into CI (pure --list-only, no compile) would lock in the exhaustive+disjoint invariant.

oskarszoon added 3 commits May 29, 2026 19:43

test: even N-way sharding + --list-only for sequential suite

81f2cae

ci: shard sequential e2e suite across parallel matrix jobs

a8640eb

ci: shard smoketest across parallel matrix jobs

9328915

github-actions Bot reviewed May 29, 2026

View reviewed changes

Comment thread test/scripts/list_test_shard.sh

oskarszoon added 4 commits May 29, 2026 21:40

ci: tune shard counts seq=7 smoke=3

c24988a

ci: tune shard counts seq=8 smoke=4

f702c84

ci: revert to seq=7 smoke=3 (cost-optimal; seq=8/smoke=4 gain not wor…

df0bfc2

…th 2 extra compiles)

oskarszoon requested review from icellan, ordishs and sugh01 May 30, 2026 19:06

ordishs approved these changes Jun 1, 2026

View reviewed changes

sugh01 approved these changes Jun 1, 2026

View reviewed changes

oskarszoon merged commit 3fa8134 into bsv-blockchain:main Jun 1, 2026
34 checks passed

oskarszoon mentioned this pull request Jun 2, 2026

test(ci): pilot smoketest -parallel 2 (measure t.Parallel speedup) #1014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: shard the e2e gate (sequential + smoketest) into parallel matrices#990

ci: shard the e2e gate (sequential + smoketest) into parallel matrices#990
oskarszoon merged 7 commits into
bsv-blockchain:mainfrom
oskarszoon:ci-speed/phase-2-gate

oskarszoon commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 30, 2026

Uh oh!

ordishs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oskarszoon commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Results (tuned over 4 CI runs on this PR)

Risk

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

sonarqubecloud Bot commented May 30, 2026

Quality Gate passed

Uh oh!

ordishs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oskarszoon commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading