Skip to content

test(multinode): un-skip scenario_04 block-assembly isolation#995

Merged
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/unskip-scenario-04-followup
Jun 1, 2026
Merged

test(multinode): un-skip scenario_04 block-assembly isolation#995
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/unskip-scenario-04-followup

Conversation

@liam

@liam liam commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Un-skips scenario_04 (block-assembly isolation) in the split-per-service chaos suite. This was the capstone of the chaos-harness work: the scenario was merged skipped (#958) pending teranode robustness fixes, all of which have now landed.

What the scenario proves

In the split topology each node runs nine per-service containers, so we can kill blockassembly alone. The scenario kills teranode3-blockassembly, mines 5 blocks on node 1, and asserts node 3 stays pinned at baseline while its blockvalidation container is healthy — proving the hard runtime dependency (WaitForBlockAssemblyReady gates every inbound block). After restarting blockassembly, node 3 catches up and all three converge. This failure mode is invisible in the all-in-one topology, where you can't kill blockassembly without taking the whole node down.

Blockers, all now fixed

Bringing this scenario up surfaced a cascade of utxopersister bugs, each fix exposing the next:

The comment block is rewritten to record this full story.

Verification

Ran end-to-end against current main (which includes all three fixes):

✓  test/multinode_split (2m47s)
DONE 1 tests

Full run, no skip — node 3 stalls at baseline while blockassembly is down, catches up after restart, all three nodes converge.

Note: this suite is behind the network_chaos build tag and runs via make network-chaos-test-split (≈32 containers); it is not part of the standard PR checks, so CI here will not execute it.

The three teranode robustness bugs that blocked this split-per-service
chaos scenario have all landed:

  bsv-blockchain#969        - CreateUTXOSet nil-pointer panic at startup
  bsv-blockchain#971 / bsv-blockchain#979 - utxopersister double-reading the fileformat magic
                (the misdiagnosed "unknown magic" crash)
  bsv-blockchain#985        - CreateUTXOSet crashing the core sidecar on the 16-byte
                footer of a previous UTXO set during consolidation,
                surfaced by this very scenario

Remove the t.Skip and rewrite the blocker comment to record the full
story (including that the "unknown magic" was the fileformat double-read,
not a legacy wire bug as first assumed).

Verified end-to-end against current main: the scenario runs in full
(node 3 stalls at baseline while its blockassembly is down, catches up
to the survivors after restart, all three nodes converge) - 2m47s, no
skip. Runs via 'make network-chaos-test-split'; not part of the standard
PR checks.
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

No issues found.

This PR cleanly un-skips the block-assembly isolation test after all blocking bugs were fixed. The documentation accurately describes the test behavior and correctly references the three merged fix PRs (#969, #971/#979, #985). The test logic is sound: it verifies that killing the blockassembly service stalls block validation on node 3 (proving the WaitForBlockAssemblyReady gate works), then confirms catch-up occurs after restart.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-995 (cf33f96)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.636µ 1.627µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.17n 71.33n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 72.01n 71.25n ~ 0.200
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.20n 71.13n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 35.66n 34.07n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 58.55n 56.62n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 159.9n 149.9n ~ 0.400
MiningCandidate_Stringify_Short-4 220.8n 221.1n ~ 0.700
MiningCandidate_Stringify_Long-4 1.663µ 1.660µ ~ 0.800
MiningSolution_Stringify-4 855.0n 856.9n ~ 0.700
BlockInfo_MarshalJSON-4 1.767µ 1.757µ ~ 0.100
NewFromBytes-4 125.0n 144.9n ~ 0.100
AddTxBatchColumnar_Validation-4 2.706µ 2.530µ ~ 0.400
OffsetValidationLoop-4 545.6n 545.9n ~ 0.800
Mine_EasyDifficulty-4 67.03µ 66.90µ ~ 1.000
Mine_WithAddress-4 6.971µ 7.023µ ~ 0.100
BlockAssembler_AddTx-4 0.02625n 0.02652n ~ 0.400
AddNode-4 11.02 11.52 ~ 0.200
AddNodeWithMap-4 12.97 13.70 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 57.11n 58.24n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 46.41n 28.80n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 27.89n 28.51n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 26.51n 26.52n ~ 0.500
DirectSubtreeAdd/2048_per_subtree-4 26.15n 26.08n ~ 1.000
SubtreeProcessorAdd/4_per_subtree-4 297.2n 291.1n ~ 0.400
SubtreeProcessorAdd/64_per_subtree-4 290.9n 296.7n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 286.7n 290.5n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 278.6n 281.0n ~ 0.200
SubtreeProcessorAdd/2048_per_subtree-4 277.2n 281.3n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 286.2n 282.2n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 277.7n 282.0n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 277.9n 281.3n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 283.1n 283.2n ~ 1.000
SubtreeNodeAddOnly/4_per_subtree-4 55.11n 55.31n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 36.30n 36.21n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 35.21n 35.07n ~ 0.600
SubtreeNodeAddOnly/1024_per_subtree-4 34.55n 34.63n ~ 0.300
SubtreeCreationOnly/4_per_subtree-4 110.3n 111.9n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 354.8n 353.6n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.237µ 1.240µ ~ 0.400
SubtreeCreationOnly/1024_per_subtree-4 3.781µ 3.820µ ~ 1.000
SubtreeCreationOnly/2048_per_subtree-4 6.897µ 6.855µ ~ 0.600
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 284.3n 282.2n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 282.3n 280.0n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.018m 2.005m ~ 1.000
ParallelGetAndSetIfNotExists/10k_nodes-4 5.319m 5.193m ~ 0.200
ParallelGetAndSetIfNotExists/50k_nodes-4 7.349m 7.470m ~ 0.700
ParallelGetAndSetIfNotExists/100k_nodes-4 10.10m 10.25m ~ 0.400
SequentialGetAndSetIfNotExists/1k_nodes-4 1.794m 1.830m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 4.467m 4.624m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 13.48m 13.95m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 24.91m 25.19m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.064m 2.068m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.536m 8.436m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.68m 13.38m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.812m 1.827m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.038m 8.155m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 43.56m 44.49m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.043µ 3.940µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.514µ 3.550µ ~ 0.700
DiskTxMap_ExistenceOnly-4 303.2n 313.7n ~ 0.200
Queue-4 194.0n 194.7n ~ 1.000
AtomicPointer-4 8.119n 8.122n ~ 0.600
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 763.3µ 756.5µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 727.2µ 719.9µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 109.9µ 109.6µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 58.32µ 58.22µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 58.64µ 64.34µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/10K-4 11.77µ 11.79µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/10K-4 5.337µ 5.729µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.999µ 2.363µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.449m 9.975m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.341m 10.012m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.133m 1.096m ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 731.4µ 730.4µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 621.2µ 574.0µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 323.8µ 321.6µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 47.97µ 50.45µ ~ 0.700
ReorgOptimizations/NodeFlags/New/100K-4 16.59µ 17.22µ ~ 0.200
TxMapSetIfNotExists-4 52.13n 53.03n ~ 0.400
TxMapSetIfNotExistsDuplicate-4 48.14n 48.09n ~ 0.700
ChannelSendReceive-4 663.4n 651.3n ~ 0.100
CalcBlockWork-4 475.9n 472.3n ~ 0.700
CalculateWork-4 654.3n 639.8n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.369µ 1.362µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_100-4 16.19µ 13.41µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 130.0µ 129.5µ ~ 0.700
CatchupWithHeaderCache-4 104.5m 104.6m ~ 0.200
_BufferPoolAllocation/16KB-4 5.114µ 5.367µ ~ 1.000
_BufferPoolAllocation/32KB-4 8.565µ 9.109µ ~ 0.200
_BufferPoolAllocation/64KB-4 18.22µ 16.06µ ~ 0.100
_BufferPoolAllocation/128KB-4 33.40µ 28.06µ ~ 0.400
_BufferPoolAllocation/512KB-4 114.8µ 113.6µ ~ 0.100
_BufferPoolConcurrent/32KB-4 20.05µ 20.07µ ~ 1.000
_BufferPoolConcurrent/64KB-4 32.38µ 32.31µ ~ 1.000
_BufferPoolConcurrent/512KB-4 150.1µ 154.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 641.9µ 662.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 647.8µ 684.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 652.0µ 667.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 653.3µ 685.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 652.3µ 620.0µ ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.57m 37.13m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.75m 37.17m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.21m 37.27m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.07m 37.07m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 37.38m 36.88m ~ 0.400
_PooledVsNonPooled/Pooled-4 738.9n 740.0n ~ 0.200
_PooledVsNonPooled/NonPooled-4 8.105µ 8.003µ ~ 0.200
_MemoryFootprint/Current_512KB_32concurrent-4 6.870µ 6.858µ ~ 1.000
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.981µ 10.238µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.535µ 9.639µ ~ 0.200
_prepareTxsPerLevel-4 429.9m 412.9m ~ 0.100
_prepareTxsPerLevelOrdered-4 3.365m 3.466m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 412.6m 408.1m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.455m 3.800m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.277m 1.243m ~ 0.400
SubtreeSizes/10k_tx_16_per_subtree-4 299.6µ 303.1µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 71.91µ 70.93µ ~ 0.400
SubtreeSizes/10k_tx_256_per_subtree-4 18.27µ 17.49µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.049µ 8.687µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.463µ 4.311µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 2.193µ 2.148µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 69.66µ 69.96µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 17.63µ 17.38µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.353µ 4.304µ ~ 0.200
BlockSizeScaling/50k_tx_64_per_subtree-4 373.7µ 368.6µ ~ 0.200
BlockSizeScaling/50k_tx_256_per_subtree-4 88.29µ 87.51µ ~ 0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.78µ 21.46µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 149.3µ 149.2µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 157.1µ 159.3µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 304.9µ 306.2µ ~ 0.200
SubtreeAllocations/medium_subtrees_exists_check-4 8.837µ 8.750µ ~ 0.200
SubtreeAllocations/medium_subtrees_data_fetch-4 9.274µ 9.226µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 17.75µ 17.59µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.066µ 2.083µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.236µ 2.223µ ~ 0.200
SubtreeAllocations/large_subtrees_full_validation-4 4.395µ 4.313µ ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 314.7µ 315.3µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 321.7µ 318.4µ ~ 0.400
GetUtxoHashes-4 260.4n 251.7n ~ 0.700
GetUtxoHashes_ManyOutputs-4 42.97µ 42.95µ ~ 0.400
_NewMetaDataFromBytes-4 228.8n 230.8n ~ 0.400
_Bytes-4 397.6n 397.6n ~ 1.000
_MetaBytes-4 139.0n 137.0n ~ 0.200

Threshold: >10% with p < 0.05 | Generated: 2026-06-01 09:33 UTC

@sonarqubecloud

sonarqubecloud Bot commented Jun 1, 2026

Copy link
Copy Markdown

@liam liam requested review from freemans13 and ordishs June 1, 2026 10:22

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — clean, low-risk test-only change. Verified all four referenced fixes (#969, #971, #979, #985) are on main, all harness symbols exist, and go vet -tags network_chaos passes clean. The rewritten comment block preserving the debugging narrative (including the misdiagnosis of the 'unknown magic' crash) is excellent hygiene. No production code touched; gated behind the network_chaos tag.

@liam liam merged commit 97fd8b7 into bsv-blockchain:main Jun 1, 2026
29 of 30 checks passed
liam added a commit to liam/teranode that referenced this pull request Jun 11, 2026
…6 (subtreevalidation pause)

Two new split-topology chaos scenarios building on the harness landed in bsv-blockchain#958
and the un-skip work in bsv-blockchain#995, plus the split-mode settings fix that scenario 05
turns out to depend on.

Scenario 05: kills teranode3-validator, mines 5 blocks on node 1, asserts
node 3 stalls at baseline (block-tx validation walks blockvalidation ->
subtreevalidation -> validator), restarts validator, asserts catch-up and
3-node convergence.

This only holds when subtreevalidation calls the standalone validator
container over gRPC, NOT when it embeds an in-process validator.
settings.conf:1212 ships useLocalValidator=true (the right default for
all-in-one), so a vanilla docker.teranode{N}.test context would build the
in-process validator and ignore the validator container entirely - making
scenario 05 a no-op (raised in PR review on bsv-blockchain#1069). The split-mode overlay
generated by compose/cmd/gennodes/templates/settings.conf.tmpl now flips
useLocalValidator=false per node so the kill is actually observable. The
override is gated on {{if not $.AllInOne}}, so all-in-one mode is unchanged.

Scenario 06: PAUSES teranode3-subtreevalidation via docker pause (SIGSTOP)
rather than killing, so the gRPC call from blockvalidation hangs (the frozen
dependency failure mode, distinct from a process that has exited). First
scenario to exercise the pause/unpause verbs. Uses defer UnpauseService so a
failed assertion leaves the shared stack healthy for the next scenario's Reset.

Both are gated on //go:build network_chaos like the rest of the split-topology
suite.
liam added a commit to liam/teranode that referenced this pull request Jun 11, 2026
…6 (subtreevalidation pause)

Two new split-topology chaos scenarios building on the harness landed in bsv-blockchain#958
and the un-skip work in bsv-blockchain#995, plus the split-mode useLocalValidator override
that scenario 05 depends on and a chaos_unpause idempotency fix that
scenario 06 depends on.

Scenario 05: kills teranode3-validator, mines 5 blocks on node 1, asserts
node 3 stalls at baseline (block-tx validation walks blockvalidation ->
subtreevalidation -> validator), restarts validator, asserts catch-up and
3-node convergence.

This only holds when subtreevalidation calls the standalone validator
container over gRPC, NOT when it embeds an in-process validator.
settings.conf ships useLocalValidator=true (the right default for all-in-one),
so a vanilla docker.teranode{N}.test context would build the in-process
validator and ignore the validator container entirely - making scenario 05
a silent no-op (raised in PR review on bsv-blockchain#1069). The split-mode overlay
generated by compose/cmd/gennodes/templates/settings.conf.tmpl now flips
useLocalValidator=false per node so the kill is actually observable. The
override is gated on {{if not $.AllInOne}}, so all-in-one mode is unchanged.

Scenario 06: PAUSES teranode3-subtreevalidation via docker pause (SIGSTOP)
rather than killing, so the gRPC call from blockvalidation hangs (the frozen
dependency failure mode, distinct from a process that has exited). First
scenario to exercise the pause/unpause verbs. Uses defer UnpauseService so a
failed assertion leaves the shared stack healthy for the next scenario's Reset.

Bash fix: chaos_unpause is now idempotent across all three branches. The
single-service and bulk-aio branches were previously running a bare
'docker unpause', which exits non-zero on an already-running container.
Scenario 06 runs a defensive defer-unpause alongside an explicit one (defer
is the safety net for assertion failure; explicit unblocks the catch-up
assertion), and the bare docker call turned the second one into t.Fatalf,
failing the test on a green run. The bulk-split branch already had the
'|| true' pattern; this extends it for consistency. Same fix surfaced
independently in PR review on bsv-blockchain#1070 against scenario 08. Idempotent
semantics are what 'chaos cleanup' actually wants anyway.

Both scenarios are gated on //go:build network_chaos like the rest of the
split-topology suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants