Skip to content

fix(subtreevalidation): resolve cross-subtree tx dependencies in block order#717

Merged
freemans13 merged 6 commits into
bsv-blockchain:mainfrom
freemans13:stu/subtree-validation-convergence
May 11, 2026
Merged

fix(subtreevalidation): resolve cross-subtree tx dependencies in block order#717
freemans13 merged 6 commits into
bsv-blockchain:mainfrom
freemans13:stu/subtree-validation-convergence

Conversation

@freemans13

@freemans13 freemans13 commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

CheckBlockSubtrees validates a block's subtrees in three phases:

  1. Phase 1processTransactionsInLevels processes each batch of subtrees' transactions in dependency-level order, populating the UTXO store and validator cache.
  2. Phase 2 — all subtrees are validated in parallel (CheckBlockSubtreesConcurrency). Failures are collected for phase 3.
  3. Phase 3 — failed subtrees are re-validated sequentially.

Phase 2's parallelism is justified at scale (e.g. on dev-scale-1, ~600 subtrees × ~1M txs per block) because each subtree's validation is largely independent I/O + cache lookups. But concurrency races on cross-subtree parent dependencies: when subtree B spends an output created by a tx in subtree A and they run at the same time, B may observe the cache before A has populated it and fail with TX_MISSING_PARENT.

When this code path runs

This is the catch-up / sync path, not steady-state. A well-connected teranode processes each subtree via the subtree-notify flow and builds its local copy ahead of the block announcement; the missingSubtrees list at the top of CheckBlockSubtrees is empty and the function returns immediately. Only when a teranode falls behind (or missed a subtree notification) does it arrive here with subtrees that need to be fetched from a peer and validated — meaning this code primarily affects sync throughput, not block-time latency.

The bug

Phase 2 collected failures into a shared slice via a mutex-guarded append from inside the per-subtree goroutine:

revalidateSubtrees := make([]chainhash.Hash, 0, len(missingSubtrees))

for _, subtreeHash := range missingSubtrees {
    g.Go(func() error {
        ...
        if err != nil {
            revalidateSubtreesMutex.Lock()
            revalidateSubtrees = append(revalidateSubtrees, subtreeHash) // arrival order, not block order
            revalidateSubtreesMutex.Unlock()
            return nil
        }
        ...
    })
}

Goroutines run concurrently and append as they finish failing, so the order of revalidateSubtrees reflects goroutine-completion order, which is non-deterministic. Phase 3's for _, subtreeHash := range revalidateSubtrees does iterate sequentially, but it iterates the wrong list — a child subtree can sit ahead of its parent and fail re-validation for the same TX_MISSING_PARENT reason it failed in phase 2.

On top of that, phase 1 failed the whole block with a processing error as soon as any missing-parent tx appeared, so phase 3 often never ran at all.

Observed on teratestnet: block 15,631 (9,216 txs, 1,305 with cross-subtree parents) consistently failed and stalled sync at height 15,630.

Fix

Two small changes:

1. Phase 2 collects failures positionally (block order)

Instead of a mutex-guarded slice of failures in arrival order, phase 2 writes into a failedParallel []bool indexed by the subtree's position in missingSubtrees (the canonical block order).

2. Phase 3 becomes a single ordered pass

Iterate missingSubtrees and re-validate any entry where failedParallel[i] is set. Because block subtrees are in a dependency-respecting order and each earlier subtree is re-validated first, one pass suffices — no convergence loop, no retry limits, no error bookkeeping.

3. Phase 1 defers when all errors are missing-parent

processTransactionsInLevels still counts missing-parent errors separately. When every error is a missing-parent error, it returns nil instead of failing — letting later batches run and phase 3 resolve the dependencies. Non-recoverable errors (invalid txs, processing errors) still fail fatally.

Why this is simpler than a convergence loop

A convergence loop (retry until every subtree succeeds or no progress is made) also works, but its complexity exists only to compensate for unordered failure collection. Once phase 3 walks failures in block order, each subtree's predecessors have always been validated before it, so one pass is always sufficient. If a subtree fails in that pass, the failure is real — a retry wouldn't help, and surfacing it immediately is the correct behavior.

Net change: ~26 source lines added, half of which are explanatory comments.

Performance note: dense vs sparse dependency chains

Phase 2's parallelism value scales inversely with cross-subtree dependency density:

  • Sparse-dep sync (mixed real-world traffic) — most subtrees are independent, parallel validation is a genuine win, phase 3 cleans up a small residual.
  • Dense-dep sync (dev-scale-1 tx-blaster chains) — every subtree has many (1–10K) txs whose parent is in the previous subtree. Only subtree 0 succeeds in parallel; the other ~599 race on deps they can't satisfy, fail fast on cache misses, and phase 3 does essentially the entire block's work sequentially.

The fix is correct in both cases: phase 3's ordered single-pass converges in one sweep regardless of density, so worst-case wall time is bounded by the sequential cost of cross-subtree dep resolution (which is inescapable — chains this tight must serialize). The convergence loop was O(n) passes in the worst case; this is O(1).

Whether phase 2 is worth keeping for dense-chain sync is a separate perf question (a future tunable or density heuristic could skip it). For now it's cheap to leave in because failed attempts fail fast on cache misses, and removing it would regress sparse sync.

Test plan

  • go test -tags testtxmetacache ./services/subtreevalidation/... — 281 / 281 pass
  • go build ./... clean
  • Integration verification: deploy to teratestnet and observe sync past 15,631
  • Optional perf measurement on dev-scale-1 to confirm phase 2's overhead is negligible in the dense case

The change is additive on the happy path: blocks where every subtree succeeds in phase 2 skip phase 3 entirely, byte-for-byte identical to before.

Relationship to #715 / #646

This supersedes the earlier convergence-loop version of this PR (and the unrelated gate-logic work in #646, handled by #715).

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Summary: No issues found. This PR implements a well-reasoned fix for cross-subtree transaction dependency ordering during block validation catch-up.

Key changes verified:

  • Phase 2 parallel validation now tracks failures positionally (by block order) rather than in goroutine-completion order
  • Phase 3 sequential revalidation walks failures in strict block order, resolving cross-subtree parent dependencies in a single pass
  • processTransactionsInLevels defers missing-parent errors to phase 3 instead of failing fatally, allowing cross-batch parent resolution
  • Dead code cleanup: removed redundant nested errors.Is check

Test coverage:

  • New TestValidateMissingSubtreesWithOrderedRetry with 4 subtests rigorously validates the phase-2/phase-3 ordering contract
  • Updated 3 existing tests to reflect deferred (non-fatal) missing-parent behavior
  • Tests confirm the fix prevents the original goroutine-completion-order bug

Documentation: PR description and inline comments clearly explain the three-phase validation approach, performance characteristics across different dependency densities, and relationship to real-world failure (teratestnet block 15,631).


History:

  • ✅ Fixed: Copilots test coverage request addressed in commit bab48f3

@freemans13 freemans13 self-assigned this Apr 17, 2026
@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-717 (a4f3e93)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.573µ 1.573µ ~ 0.800
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.17n 72.26n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.68n 71.23n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.01n 71.16n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.38n 32.98n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 55.85n 55.66n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 125.5n 125.3n ~ 1.000
MiningCandidate_Stringify_Short-4 219.1n 227.0n ~ 0.100
MiningCandidate_Stringify_Long-4 1.635µ 1.640µ ~ 0.400
MiningSolution_Stringify-4 851.3n 850.9n ~ 1.000
BlockInfo_MarshalJSON-4 1.728µ 1.719µ ~ 0.100
NewFromBytes-4 128.7n 128.2n ~ 0.400
Mine_EasyDifficulty-4 60.54µ 60.31µ ~ 0.100
Mine_WithAddress-4 6.806µ 6.861µ ~ 0.700
DiskTxMap_SetIfNotExists-4 3.436µ 3.610µ ~ 0.200
DiskTxMap_SetIfNotExists_Parallel-4 3.336µ 3.302µ ~ 0.100
DiskTxMap_ExistenceOnly-4 295.7n 298.4n ~ 0.100
Queue-4 193.4n 191.4n ~ 0.100
AtomicPointer-4 4.163n 4.312n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 836.8µ 837.6µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4 810.5µ 807.9µ ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/10K-4 120.9µ 105.6µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 61.96µ 61.73µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 62.13µ 62.51µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.44µ 11.56µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/10K-4 5.235µ 5.219µ ~ 1.000
ReorgOptimizations/NodeFlags/New/10K-4 1.799µ 1.781µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.698m 9.405m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.749m 10.030m ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.132m 1.168m ~ 0.200
ReorgOptimizations/AllMarkFalse/New/100K-4 680.1µ 678.2µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 693.9µ 630.0µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 341.5µ 337.6µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/100K-4 56.67µ 55.23µ ~ 0.400
ReorgOptimizations/NodeFlags/New/100K-4 19.87µ 19.78µ ~ 0.400
TxMapSetIfNotExists-4 51.33n 51.41n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 38.47n 37.87n ~ 0.100
ChannelSendReceive-4 615.4n 621.9n ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 59.15n 61.19n ~ 0.100
DirectSubtreeAdd/64_per_subtree-4 29.28n 29.44n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 27.64n 27.70n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 26.45n 26.46n ~ 1.000
DirectSubtreeAdd/2048_per_subtree-4 26.09n 26.05n ~ 0.200
SubtreeProcessorAdd/4_per_subtree-4 283.1n 285.9n ~ 0.400
SubtreeProcessorAdd/64_per_subtree-4 277.8n 275.7n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 279.9n 276.4n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 269.7n 266.8n ~ 0.600
SubtreeProcessorAdd/2048_per_subtree-4 271.9n 269.0n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 273.1n 273.2n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 271.2n 277.0n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 271.0n 273.0n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 270.6n 271.7n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.45n 55.59n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 36.14n 36.12n ~ 0.500
SubtreeNodeAddOnly/256_per_subtree-4 35.22n 35.18n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 34.58n 34.55n ~ 0.600
SubtreeCreationOnly/4_per_subtree-4 109.0n 109.8n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 347.9n 344.6n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.203µ 1.302µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 3.750µ 4.017µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 6.797µ 7.329µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 272.1n 272.8n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 272.7n 274.3n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 538.5µ 540.8µ ~ 0.700
ParallelGetAndSetIfNotExists/10k_nodes-4 1.339m 1.409m ~ 0.200
ParallelGetAndSetIfNotExists/50k_nodes-4 6.613m 6.765m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 13.29m 13.45m ~ 0.400
SequentialGetAndSetIfNotExists/1k_nodes-4 623.0µ 670.6µ ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 2.949m 3.012m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 11.30m 11.56m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 21.69m 22.00m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 589.4µ 599.3µ ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.598m 4.545m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.79m 16.90m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 675.4µ 677.4µ ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.348m 6.318m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.63m 39.74m ~ 0.100
BlockAssembler_AddTx-4 0.03047n 0.02989n ~ 1.000
AddNode-4 12.05 12.03 ~ 0.700
AddNodeWithMap-4 12.43 12.28 ~ 0.400
CalcBlockWork-4 467.3n 467.2n ~ 1.000
CalculateWork-4 628.3n 622.4n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.521µ 1.922µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 14.63µ 14.61µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 158.1µ 145.3µ ~ 0.700
CatchupWithHeaderCache-4 105.0m 104.7m ~ 0.100
_prepareTxsPerLevel-4 415.0m 402.3m ~ 0.400
_prepareTxsPerLevelOrdered-4 3.525m 3.445m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 412.4m 412.7m ~ 1.000
_prepareTxsPerLevel_Comparison/Optimized-4 3.477m 3.468m ~ 0.400
SubtreeSizes/10k_tx_4_per_subtree-4 1.361m 1.365m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 327.8µ 320.2µ ~ 0.700
SubtreeSizes/10k_tx_64_per_subtree-4 76.37µ 77.21µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 19.09µ 19.17µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.618µ 9.594µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.752µ 4.739µ ~ 0.700
SubtreeSizes/10k_tx_2k_per_subtree-4 2.355µ 2.357µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 75.87µ 75.66µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 19.05µ 19.14µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.736µ 4.777µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 397.3µ 404.9µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 95.26µ 94.48µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.26µ 23.71µ ~ 0.200
SubtreeAllocations/small_subtrees_exists_check-4 158.9µ 160.8µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 165.1µ 166.1µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 328.8µ 327.5µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.417µ 9.517µ ~ 0.200
SubtreeAllocations/medium_subtrees_data_fetch-4 9.759µ 9.904µ ~ 0.200
SubtreeAllocations/medium_subtrees_full_validation-4 19.29µ 19.22µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.240µ 2.298µ ~ 0.400
SubtreeAllocations/large_subtrees_data_fetch-4 2.384µ 2.393µ ~ 1.000
SubtreeAllocations/large_subtrees_full_validation-4 4.817µ 4.787µ ~ 0.700
_BufferPoolAllocation/16KB-4 3.446µ 3.463µ ~ 0.700
_BufferPoolAllocation/32KB-4 9.128µ 7.428µ ~ 0.200
_BufferPoolAllocation/64KB-4 15.78µ 15.37µ ~ 0.700
_BufferPoolAllocation/128KB-4 32.03µ 29.29µ ~ 0.100
_BufferPoolAllocation/512KB-4 138.2µ 117.9µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.67µ 18.32µ ~ 0.400
_BufferPoolConcurrent/64KB-4 28.23µ 27.66µ ~ 0.100
_BufferPoolConcurrent/512KB-4 159.4µ 160.2µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 684.9µ 660.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 675.2µ 651.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 670.2µ 646.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 669.6µ 645.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 681.1µ 649.0µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.48m 37.05m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.33m 37.21m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.54m 36.91m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.40m 36.96m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 37.13m 36.93m ~ 0.400
_PooledVsNonPooled/Pooled-4 821.3n 818.4n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.455µ 6.828µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 8.069µ 7.156µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.72µ 10.54µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.43µ 10.06µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 265.6µ 258.1µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 262.4µ 257.5µ ~ 0.200
GetUtxoHashes-4 271.2n 270.7n ~ 1.000
GetUtxoHashes_ManyOutputs-4 46.45µ 46.92µ ~ 0.700
_NewMetaDataFromBytes-4 232.4n 231.4n ~ 0.200
_Bytes-4 609.4n 608.2n ~ 0.700
_MetaBytes-4 558.2n 565.7n ~ 0.400

Threshold: >10% with p < 0.05 | Generated: 2026-05-06 13:01 UTC

…k order

Block validation's parallel subtree pass raced on cross-subtree parent
dependencies. When subtree B spent an output from subtree A and both
were validated concurrently, B could observe the cache before A had
populated it and fail with TX_MISSING_PARENT. The subsequent sequential
pass walked failures in goroutine-completion order, so dependent
subtrees still appeared before their parents — producing the same
failure on retry.

Two small changes fix this:

1. Collect parallel-validation failures positionally into a bool slice
   indexed by block-subtree order. The sequential pass then iterates
   missingSubtrees and only re-runs entries whose slot is marked
   failed. Because each subtree's predecessors are validated before it,
   a single pass suffices — no retry loop, no convergence tracking.

2. processTransactionsInLevels defers the failure when every error is
   a missing-parent error, so later batches get a chance to run and
   surface the parent. Non-recoverable errors (invalid txs, processing
   errors) still fail fatally.

Observed on teratestnet at block 15,631 (9,216 txs, 1,305 with
cross-subtree parents) where the previous behaviour stalled sync
indefinitely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@freemans13 freemans13 force-pushed the stu/subtree-validation-convergence branch from 3d6d7d2 to 35da5ec Compare April 22, 2026 09:02
@freemans13 freemans13 changed the title fix(subtreevalidation): converge on cross-subtree tx dependencies fix(subtreevalidation): resolve cross-subtree tx dependencies in block order Apr 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes sync/catch-up failures caused by cross-subtree transaction dependencies by ensuring failed subtree revalidation runs in block order (so parents are validated before dependents).

Changes:

  • Collects parallel subtree-validation failures positionally (by subtree index) instead of goroutine-completion order.
  • Re-validates failed subtrees in a single ordered pass over missingSubtrees.
  • Updates processTransactionsInLevels to treat “all errors are missing-parent” as non-fatal (defer to ordered revalidation), and adjusts unit tests accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
services/subtreevalidation/check_block_subtrees.go Switches failure tracking to positional indexing and revalidates failures in block order; defers batch failure when errors are exclusively missing-parent.
services/subtreevalidation/check_block_subtrees_test.go Updates processTransactionsInLevels tests to reflect missing-parent errors being deferred (non-fatal).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +362 to +366
// Parallel validation failures are collected positionally so the sequential
// revalidation pass below walks them in block-subtree order. Cross-subtree
// parent dependencies within a block only resolve left-to-right; arbitrary
// goroutine-completion order would leave children ahead of their parents.
failedParallel := make([]bool, len(missingSubtrees))

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new phase-2 positional failure tracking + phase-3 ordered single-pass revalidation is not covered by a test in this PR. Given this is a consensus-adjacent validation path and the ordering behavior is the core fix, please add a test that fails in parallel due to a cross-subtree parent dependency and then succeeds when revalidated in missingSubtrees order (and would have failed if revalidated in goroutine-completion order). This helps prevent regressions in the phase-2/phase-3 interaction.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit test coverage in commit bab48f3: TestValidateMissingSubtreesWithOrderedRetry with four subtests covering the phase-2/phase-3 contract. The two ordering-critical subtests (CrossSubtreeDependencies_RevalidateInBlockOrder and RevalidationOrderWouldFailIfNotBlockOrder) model chain dependencies where phase 2 fails for everything except subtree 0, then assert phase 3 resolves the failures in strict missingSubtrees order — any other walk order would reproduce the original TxMissingParent bug. The PersistentFailureInPhase3_IsReturned subtest asserts real failures still surface. The phase-2/phase-3 logic was extracted into a helper validateMissingSubtreesWithOrderedRetry taking a validate closure so these invariants can be tested without full subtree/peer/store infrastructure.

Extracts the phase-2 parallel + phase-3 ordered-sequential revalidation
into a testable helper (validateMissingSubtreesWithOrderedRetry) that
takes a validate function, so the ordering contract can be unit tested
without full subtree/peer/store infrastructure.

Adds TestValidateMissingSubtreesWithOrderedRetry with four subtests:

- AllParallelSucceed_NoRevalidation — phase 3 is a no-op when phase 2
  succeeds for every subtree.
- CrossSubtreeDependencies_RevalidateInBlockOrder — chain-dependency
  case where phase 2 only succeeds for subtree 0, and phase 3 must
  resolve subtrees 1..n in block order.
- PersistentFailureInPhase3_IsReturned — a non-recoverable failure in
  phase 3 surfaces to the caller.
- RevalidationOrderWouldFailIfNotBlockOrder — every subtree fails in
  phase 2; phase 3's ordered walk is the only path to success. Any
  non-block-order walk would reproduce the original bug.

Addresses Copilot review comment on PR bsv-blockchain#717.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

[Critical] Redundant error check at line 868-869

In check_block_subtrees.go:868-869, there is a redundant errors.Is(err, errors.ErrTxInvalid) check that is already nested inside a conditional (line 864) that guarantees this condition is true.

This means the code will always return err for any invalid transaction, preventing the function from reaching line 875 and completing all levels.

Impact: This defeats the purpose of the missing-parent deferral logic added in lines 910-922. When a truly invalid transaction is encountered, the function returns early instead of continuing to process remaining levels and deferring to phase-3 revalidation.

Suggested fix: Remove the redundant nested check on line 868. The outer condition already ensures we are handling invalid (non-policy) transactions.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The outer branch at line 864 already guarantees ErrTxInvalid &&
!ErrTxPolicy, so the nested `if errors.Is(err, errors.ErrTxInvalid)`
was always true and the return unconditional. Inline the return and
note that phase 3 can't resolve these, so the fail-fast is intentional
(unlike missing-parent errors, which take the ErrTxMissingParent branch
and are deferred to phase 3).

Pre-existing dead check, flagged during PR bsv-blockchain#717 review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@freemans13

Copy link
Copy Markdown
Collaborator Author

Fixed in a0b2534 — inlined the return.

One correction on the impact analysis: this branch doesn't defeat the missing-parent deferral. Missing-parent errors take the ErrTxMissingParent branch at line 851 and never reach this ErrTxInvalid branch, so the deferral logic at lines 910-922 is unaffected. The early return for truly invalid (non-policy) txs is intentional — phase 3 revalidation can't resolve a genuinely invalid tx, so failing the level fast is the correct behavior.

Also noting for context: the redundant check was pre-existing (introduced 2025-08-28, not by this PR), but worth cleaning up while the file is open.

@sonarqubecloud

sonarqubecloud Bot commented May 6, 2026

Copy link
Copy Markdown

@freemans13 freemans13 requested review from icellan and liam May 7, 2026 12:24
@freemans13 freemans13 merged commit fff4425 into bsv-blockchain:main May 11, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants