Skip to content

ci: shard the e2e gate (sequential + smoketest) into parallel matrices#990

Merged
oskarszoon merged 7 commits into
bsv-blockchain:mainfrom
oskarszoon:ci-speed/phase-2-gate
Jun 1, 2026
Merged

ci: shard the e2e gate (sequential + smoketest) into parallel matrices#990
oskarszoon merged 7 commits into
bsv-blockchain:mainfrom
oskarszoon:ci-speed/phase-2-gate

Conversation

@oskarszoon

@oskarszoon oskarszoon commented May 29, 2026

Copy link
Copy Markdown
Contributor

Phase 2 of the CI-speed work (Phase 1 = #986). Targets the PR-feedback bottleneck: pr-smoke ≈ 21 min, set by its two longest jobs.

Problem

sequential-postgres (~19 m) and sequential-aerospike (~21 m) are not two databases — run_tests_sequentially.sh --db X is a substring match on the test function name, so the two jobs ran disjoint, uneven name-partitioned subsets (tests self-provision their backend via testcontainers). smoketest (~17 m) ran its package with -parallel 1. These three set the gate.

Surfaced while mapping the suite: 4 sequential tests ran nowhereTestConcurrentDuplicateDetection, TestEarlyDuplicateAcrossSubtrees, TestMultipleEarlyDuplicatesInSameBlock, TestValidBlockWithSpentAndUnrelated matched none of the sqlite/postgres/aerospike name buckets. (The sqlite bucket was empty too — the no-op job removed in #986 tested zero.)

Change

  • run_tests_sequentially.sh gains --shard i --total N (even index partition) and --list-only. New shard_selftest.sh proves the partition is exhaustive and disjoint offline.
  • The two sequential-* jobs collapse into one sequential matrix job, 7 even shards. The full 116-function suite is partitioned, so the 4 orphaned tests now run.
  • smoketest becomes a 3-shard matrix via new list_test_shard.sh (go test -list partition, enumerated with the same build config the run uses so no test is silently dropped). Existing -skip list preserved.

Results (tuned over 4 CI runs on this PR)

Shard counts were tuned empirically. Per-shard wall-clock, slowest shard per suite:

seq N smoke N seq slowest smoke slowest workflow wall
4 3 13m09 9m35 ~13 m
7 3 9m12 9m18 ~10 m ← chosen
8 4 8m45 9m47 ~10 m

pr-smoke gate: ~21 m → ~10 m. Final seq=7 / smoke=3: both suites land ~9.2–9.3 m, at the pr-tests floor (~9 m, unsharded). All checks green, including the 4 newly-run tests.

seq=8/smoke=4 was tested and rejected: smoke=4 regressed (the even-index split isolated one heavy test into a shard; more shards can't balance a single fat test), and seq=8 doesn't move the gate (smoke-bound). Going below ~9 m is wasted — pr-tests floors the PR there. Further gains would need duration-weighted sharding or a build cache (out of scope).

Risk

Sequential shards still run -parallel 1 internally — no in-job concurrency change, just spread across runners. Raising -parallel is a separate later step behind an isolation audit. More concurrent runner-minutes, roughly flat total compute; measured stagger from concurrency was negligible (~0.1–0.8 m at up to 15 concurrent jobs). Validated: shard self-test green, partition exhaustive + disjoint, all CI checks pass.

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. The PR successfully implements CI sharding with robust error handling.

Key improvements in latest commit (323fe54):

  • Added mutual exclusion guard between --db and --shard/--total flags
  • Enhanced error handling in list_test_shard.sh to fail loudly on empty test lists or go test -list failures
  • Prevents silent green passes when no tests execute

Implementation quality:

  • Proper input validation with clear error messages
  • Exhaustive + disjoint partition verified by shard_selftest.sh
  • Correct handling of skipped tests (counter only increments for non-skipped tests)
  • Comprehensive documentation in comments

History:

Comment thread test/scripts/list_test_shard.sh
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-990 (0197e40)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.639µ 1.730µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 73.45n 71.11n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.07n 71.36n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.17n 71.35n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 34.40n 34.82n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 61.94n 57.63n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 157.5n 175.3n ~ 0.700
MiningCandidate_Stringify_Short-4 222.6n 226.9n ~ 0.100
MiningCandidate_Stringify_Long-4 1.661µ 1.665µ ~ 0.300
MiningSolution_Stringify-4 863.3n 864.1n ~ 1.000
BlockInfo_MarshalJSON-4 1.769µ 1.762µ ~ 0.700
NewFromBytes-4 123.9n 132.7n ~ 0.600
AddTxBatchColumnar_Validation-4 2.631µ 2.498µ ~ 0.200
OffsetValidationLoop-4 543.6n 544.0n ~ 1.000
Mine_EasyDifficulty-4 66.88µ 68.01µ ~ 0.100
Mine_WithAddress-4 7.090µ 7.011µ ~ 0.400
DiskTxMap_SetIfNotExists-4 3.530µ 3.278µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.310µ 3.221µ ~ 0.400
DiskTxMap_ExistenceOnly-4 315.3n 300.3n ~ 0.100
Queue-4 192.9n 193.3n ~ 1.000
AtomicPointer-4 4.922n 4.583n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 952.8µ 916.6µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/10K-4 851.5µ 837.8µ ~ 0.200
ReorgOptimizations/AllMarkFalse/Old/10K-4 120.9µ 115.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 62.73µ 62.75µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 62.17µ 56.99µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/10K-4 12.79µ 12.23µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.755µ 4.797µ ~ 1.000
ReorgOptimizations/NodeFlags/New/10K-4 1.610µ 1.608µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.938m 9.754m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.345m 9.958m ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.128m 1.143m ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 691.3µ 682.9µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 688.2µ 601.4µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 289.3µ 289.4µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 50.51µ 51.29µ ~ 1.000
ReorgOptimizations/NodeFlags/New/100K-4 17.83µ 17.96µ ~ 0.700
TxMapSetIfNotExists-4 53.55n 52.39n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 41.30n 40.61n ~ 0.400
ChannelSendReceive-4 630.9n 615.6n ~ 0.100
BlockAssembler_AddTx-4 0.02931n 0.02551n ~ 0.200
AddNode-4 10.81 11.04 ~ 0.100
AddNodeWithMap-4 11.28 11.96 ~ 0.400
DirectSubtreeAdd/4_per_subtree-4 55.69n 57.57n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 29.03n 28.88n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 27.77n 28.49n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.51n 26.48n ~ 1.000
DirectSubtreeAdd/2048_per_subtree-4 26.10n 26.03n ~ 0.700
SubtreeProcessorAdd/4_per_subtree-4 296.1n 299.9n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 288.6n 302.1n ~ 0.400
SubtreeProcessorAdd/256_per_subtree-4 286.7n 298.6n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 277.1n 276.8n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 279.0n 276.7n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 284.8n 282.9n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 286.9n 281.5n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 283.8n 281.5n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 284.4n 278.9n ~ 0.200
SubtreeNodeAddOnly/4_per_subtree-4 54.94n 55.20n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 36.04n 36.16n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 35.14n 35.08n ~ 1.000
SubtreeNodeAddOnly/1024_per_subtree-4 34.55n 34.82n ~ 0.700
SubtreeCreationOnly/4_per_subtree-4 110.7n 111.7n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 350.2n 350.5n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.231µ 1.236µ ~ 0.400
SubtreeCreationOnly/1024_per_subtree-4 3.780µ 3.765µ ~ 0.600
SubtreeCreationOnly/2048_per_subtree-4 6.816µ 6.778µ ~ 0.400
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 280.9n 279.8n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 279.7n 277.0n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 2.014m 2.002m ~ 0.400
ParallelGetAndSetIfNotExists/10k_nodes-4 5.219m 5.223m ~ 1.000
ParallelGetAndSetIfNotExists/50k_nodes-4 7.108m 7.157m ~ 0.200
ParallelGetAndSetIfNotExists/100k_nodes-4 9.665m 9.756m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.795m 1.794m ~ 1.000
SequentialGetAndSetIfNotExists/10k_nodes-4 4.418m 4.634m ~ 0.700
SequentialGetAndSetIfNotExists/50k_nodes-4 13.41m 13.81m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 24.80m 25.71m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.059m 2.039m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.483m 8.422m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.27m 13.32m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.821m 1.809m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.179m 7.981m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 46.59m 44.02m ~ 0.100
CalcBlockWork-4 503.9n 500.5n ~ 0.400
CalculateWork-4 687.4n 691.2n ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.366µ 1.356µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 16.12µ 16.39µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 128.6µ 130.7µ ~ 0.200
CatchupWithHeaderCache-4 104.6m 104.6m ~ 1.000
_BufferPoolAllocation/16KB-4 4.099µ 4.009µ ~ 0.200
_BufferPoolAllocation/32KB-4 8.506µ 11.504µ ~ 0.100
_BufferPoolAllocation/64KB-4 16.80µ 17.44µ ~ 0.700
_BufferPoolAllocation/128KB-4 26.11µ 32.41µ ~ 0.100
_BufferPoolAllocation/512KB-4 120.3µ 114.3µ ~ 0.100
_BufferPoolConcurrent/32KB-4 20.00µ 19.63µ ~ 0.400
_BufferPoolConcurrent/64KB-4 31.71µ 32.89µ ~ 0.700
_BufferPoolConcurrent/512KB-4 158.7µ 155.8µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 711.2µ 679.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 704.2µ 635.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 706.5µ 639.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 706.4µ 641.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 637.5µ 641.5µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.18m 37.28m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.15m 37.52m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.97m 37.44m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.94m 37.40m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.69m 37.17m ~ 0.100
_PooledVsNonPooled/Pooled-4 744.3n 739.5n ~ 0.100
_PooledVsNonPooled/NonPooled-4 8.545µ 8.054µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.644µ 7.086µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.01µ 11.71µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.982µ 11.430µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.401m 1.453m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 333.9µ 342.0µ ~ 0.100
SubtreeSizes/10k_tx_64_per_subtree-4 78.76µ 82.51µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 19.65µ 20.22µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.742µ 9.925µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.838µ 4.966µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.457µ 2.462µ ~ 0.500
BlockSizeScaling/10k_tx_64_per_subtree-4 77.20µ 78.65µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 19.50µ 19.65µ ~ 0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.876µ 4.835µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 406.5µ 405.1µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 96.57µ 96.80µ ~ 0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.16µ 24.28µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 163.0µ 161.5µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 168.6µ 170.3µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 337.7µ 339.2µ ~ 1.000
SubtreeAllocations/medium_subtrees_exists_check-4 9.747µ 9.652µ ~ 0.200
SubtreeAllocations/medium_subtrees_data_fetch-4 10.19µ 10.04µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.74µ 19.76µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.333µ 2.310µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 2.460µ 2.453µ ~ 0.800
SubtreeAllocations/large_subtrees_full_validation-4 4.917µ 4.919µ ~ 1.000
_prepareTxsPerLevel-4 407.1m 402.4m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.764m 3.635m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 395.9m 397.5m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.637m 3.787m ~ 0.200
StoreBlock_Sequential/BelowCSVHeight-4 328.9µ 329.5µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 331.7µ 332.9µ ~ 0.700
GetUtxoHashes-4 263.9n 257.4n ~ 0.700
GetUtxoHashes_ManyOutputs-4 42.46µ 42.51µ ~ 1.000
_NewMetaDataFromBytes-4 228.1n 229.2n ~ 0.200
_Bytes-4 399.1n 403.3n ~ 0.100
_MetaBytes-4 138.0n 138.7n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-30 17:15 UTC

…mpty test list

Addresses PR bsv-blockchain#990 review findings bsv-blockchain#2 (shard/db interaction) and bsv-blockchain#3 (silent
empty-list pass). Finding bsv-blockchain#1 (list_test_shard index increment) declined: the
current skip-without-increment is correct even distribution; see PR thread.
@sonarqubecloud

Copy link
Copy Markdown

@oskarszoon oskarszoon requested review from icellan, ordishs and sugh01 May 30, 2026 19:06

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Solid, well-reasoned change — recovering the 4 orphaned sequential tests is a real correctness win beyond the speedup, and the write-up is honest about tradeoffs and rejected configs.

One suggested fix before/with merge (non-blocking but worth it):

Makefile smoketest recipe doesn't propagate a list_test_shard.sh failure. Make runs recipe lines with sh -c (no set -e), so if list_test_shard.sh exits non-zero the command substitution yields empty stdout, RUN_ARG="", and -run "" matches every test. A shard whose enumeration failed would silently run the full unsharded suite and pass green — the exact outcome the script's careful fail-loud handling was written to prevent.

Suggested guard:

RUN_ARG="$$(test/scripts/list_test_shard.sh ...)" || exit 1;

(or set -e at the top of the recipe block). Quick check: make smoketest TOTAL=3 SHARD=9 should fail the job rather than run all tests.

Minor follow-ups, none blocking:

  • Empty shard selection emits ^$ → runs 0 tests and passes green; consider warning/failing instead.
  • Wiring shard_selftest.sh into CI (pure --list-only, no compile) would lock in the exhaustive+disjoint invariant.

@oskarszoon oskarszoon merged commit 3fa8134 into bsv-blockchain:main Jun 1, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants