Skip to content

feat: adaptive subtree fetch strategy for CheckBlockSubtrees#646

Closed
freemans13 wants to merge 16 commits into
bsv-blockchain:mainfrom
freemans13:feat/smart-subtree-fetch
Closed

feat: adaptive subtree fetch strategy for CheckBlockSubtrees#646
freemans13 wants to merge 16 commits into
bsv-blockchain:mainfrom
freemans13:feat/smart-subtree-fetch

Conversation

@freemans13

Copy link
Copy Markdown
Collaborator

Summary

  • Replace broken localTxsAvailable heuristic (count-based) with adaptive pre-check + sample strategy that checks actual TxMetaCache content
  • Block assembly pre-check fast-paths obvious cases (empty BA = bulk download); sample ~100 tx hashes verifies local availability when BA has data
  • Phase 2 sequential revalidation uses convergence loop instead of fail-on-first, handling cross-subtree dependencies where parent tx is in a different subtree

Problem

On dev-scale-2, blocks from dev-scale-1 were frequently marked invalid. The root cause: localTxsAvailable compared block assembly's total pool count against the block's transaction count. With TX blasters feeding dev-scale-2 continuously, the pool count was always high — but specific transactions (coinbase splitting txs created only on dev-scale-1) were missing. This caused processTransactionsInLevels() to be skipped entirely, and Phase 2 revalidation failed on the first subtree error without giving parent subtrees a chance to resolve.

Test plan

  • sampleLocalAvailability unit tests (6 cases: empty subtree, all cached, none cached, coinbase skip, sample size limits, no cache impl)
  • All existing TestCheckBlockSubtrees tests pass
  • Full subtreevalidation test suite passes (34.7s)
  • make lint passes (0 issues)
  • make test passes (8183 tests, no failures in changed code)
  • make smoketest — no failures in changed code

🤖 Generated with Claude Code

Replace the broken localTxsAvailable heuristic with an adaptive
pre-check + sample strategy. The old code compared block assembly's
total pool count against the block's transaction count — a false
positive that skipped downloading subtree data even when specific
transactions (e.g. coinbase splitting txs) were missing locally.

New approach:
- Block assembly pre-check fast-paths obvious cases (empty BA = bulk download)
- Sample ~100 tx hashes from TxMetaCache to verify local availability
- Phase 2 revalidation uses convergence loop instead of fail-on-first

This fixes frequent invalid block errors on dev-scale-2 caused by
cross-subtree dependencies on transactions only present on dev-scale-1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

Critical Issue:

  • Missing subtree hash verification at check_block_subtrees.go:209 (see inline comment)

Minor Issues:

  • All previously identified issues have been resolved

Summary:
The PR implements an adaptive subtree fetch strategy with proper tests and documentation. The approach is sound: pre-check block assembly size, then sample TxMetaCache hit rate to decide between bulk download vs. fallback. One critical security fix needed before merge.

History:

  • ✅ Fixed: Settings wiring, error handling, validation, clamping logic
  • ✅ Fixed: All test coverage gaps addressed
  • ⚠️ Remains: Hash verification for peer-supplied sample subtree

@freemans13 freemans13 self-assigned this Mar 30, 2026
BSV peers take 7+ minutes to prepare block data for blocks deep in
the chain, but the stall handler disconnects after 5 minutes. This
causes an infinite sync loop where the node never receives block data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
if hash.Equal(subtreepkg.CoinbasePlaceholderHashValue) {
continue
}
found, _ := cache.GetMetaCached(ctx, hash, &meta.Data{})

@github-actions github-actions Bot Mar 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor - Acceptable] GetMetaCached returns (data, found bool) without error parameter in current implementation. The sampling logic treats cache misses and errors uniformly as found=false, which is appropriate for a best-effort heuristic. No change needed.

@github-actions

github-actions Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-646 (23ff248)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.644µ 1.605µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.12n 71.37n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.30n 71.53n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.27n 71.23n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.28n 33.20n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 55.81n 54.68n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 131.3n 128.7n ~ 0.400
MiningCandidate_Stringify_Short-4 220.7n 220.9n ~ 1.000
MiningCandidate_Stringify_Long-4 1.643µ 1.637µ ~ 1.000
MiningSolution_Stringify-4 851.7n 846.2n ~ 0.100
BlockInfo_MarshalJSON-4 1.723µ 1.732µ ~ 0.200
NewFromBytes-4 131.5n 146.0n ~ 0.400
AddTxBatchColumnar_Validation-4 2.482µ 2.442µ ~ 0.200
OffsetValidationLoop-4 636.2n 641.1n ~ 0.100
Mine_EasyDifficulty-4 60.69µ 60.54µ ~ 0.700
Mine_WithAddress-4 6.939µ 6.893µ ~ 0.400
BlockAssembler_AddTx-4 0.02775n 0.02766n ~ 0.700
AddNode-4 10.90 10.30 ~ 0.100
AddNodeWithMap-4 12.12 11.19 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 61.45n 58.48n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 29.83n 31.74n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 28.95n 30.76n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 27.89n 29.28n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 27.38n 28.78n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 282.2n 284.0n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 278.4n 276.6n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 279.4n 278.8n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 270.5n 270.8n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 271.0n 268.4n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 277.2n 275.2n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 276.8n 273.0n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 274.1n 273.3n ~ 0.700
SubtreeProcessorRotate/1024_per_subtree-4 276.4n 275.1n ~ 0.500
SubtreeNodeAddOnly/4_per_subtree-4 54.12n 54.12n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 34.21n 34.26n ~ 0.400
SubtreeNodeAddOnly/256_per_subtree-4 33.37n 33.28n ~ 0.500
SubtreeNodeAddOnly/1024_per_subtree-4 32.51n 32.44n ~ 0.400
SubtreeCreationOnly/4_per_subtree-4 114.1n 113.0n ~ 0.400
SubtreeCreationOnly/64_per_subtree-4 394.8n 389.8n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.338µ 1.323µ ~ 0.200
SubtreeCreationOnly/1024_per_subtree-4 4.448µ 4.450µ ~ 1.000
SubtreeCreationOnly/2048_per_subtree-4 8.041µ 8.073µ ~ 1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 270.8n 285.0n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 270.8n 277.9n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.197m 2.207m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.274m 5.373m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.312m 7.970m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 10.22m 10.71m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.967m 1.942m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 4.381m 4.348m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 12.16m 12.73m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 22.03m 22.45m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.230m 2.270m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.317m 8.231m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.78m 13.20m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.994m 1.990m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.663m 7.505m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.63m 39.93m ~ 0.400
DiskTxMap_SetIfNotExists-4 4.001µ 4.006µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.825µ 3.710µ ~ 0.200
DiskTxMap_ExistenceOnly-4 540.3n 415.2n ~ 0.400
Queue-4 192.4n 189.5n ~ 0.700
AtomicPointer-4 3.627n 3.276n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 858.3µ 851.5µ ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4 759.6µ 804.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 109.5µ 119.6µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.90µ 64.24µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 63.84µ 62.64µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.21µ 10.95µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 5.103µ 4.781µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.987µ 1.569µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.543m 10.696m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.35m 11.02m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.148m 1.110m ~ 0.400
ReorgOptimizations/AllMarkFalse/New/100K-4 709.9µ 707.4µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/100K-4 532.8µ 602.8µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 224.4µ 214.6µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/100K-4 48.66µ 46.81µ ~ 1.000
ReorgOptimizations/NodeFlags/New/100K-4 17.45µ 17.10µ ~ 0.200
TxMapSetIfNotExists-4 49.96n 49.63n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 41.19n 41.29n ~ 0.700
ChannelSendReceive-4 625.7n 615.9n ~ 0.200
CalcBlockWork-4 472.3n 478.0n ~ 0.100
CalculateWork-4 652.3n 676.3n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.344µ 1.355µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_100-4 12.84µ 13.14µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 127.0µ 130.7µ ~ 0.400
CatchupWithHeaderCache-4 104.7m 104.6m ~ 1.000
_prepareTxsPerLevel-4 405.7m 402.8m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.583m 4.066m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 390.9m 410.4m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.687m 4.072m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.003m 1.003m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 238.9µ 233.9µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 57.33µ 56.52µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 14.21µ 14.11µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 6.820µ 7.159µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 3.418µ 3.442µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 1.726µ 1.740µ ~ 0.400
BlockSizeScaling/10k_tx_64_per_subtree-4 54.67µ 55.12µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 13.75µ 14.16µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 3.410µ 3.459µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 289.6µ 293.0µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 69.40µ 69.05µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 17.33µ 17.15µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 123.4µ 116.5µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 132.2µ 123.6µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 255.5µ 236.2µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 7.504µ 6.995µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 7.833µ 7.336µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 14.79µ 13.66µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 1.781µ 1.643µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 1.922µ 1.754µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 3.683µ 3.466µ ~ 0.100
_BufferPoolAllocation/16KB-4 4.355µ 3.834µ ~ 0.700
_BufferPoolAllocation/32KB-4 9.908µ 9.371µ ~ 0.700
_BufferPoolAllocation/64KB-4 20.58µ 16.46µ ~ 0.100
_BufferPoolAllocation/128KB-4 38.06µ 32.60µ ~ 0.100
_BufferPoolAllocation/512KB-4 139.6µ 130.2µ ~ 0.400
_BufferPoolConcurrent/32KB-4 23.16µ 20.89µ ~ 0.100
_BufferPoolConcurrent/64KB-4 35.36µ 31.72µ ~ 0.100
_BufferPoolConcurrent/512KB-4 176.4µ 183.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 706.0µ 712.8µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/32KB-4 709.7µ 698.7µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/64KB-4 696.7µ 688.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 691.4µ 694.3µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/512KB-4 703.0µ 703.7µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 38.70m 38.62m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 38.39m 38.08m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4 38.49m 38.46m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 38.65m 38.56m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 38.89m 38.54m ~ 1.000
_PooledVsNonPooled/Pooled-4 827.0n 829.0n ~ 0.400
_PooledVsNonPooled/NonPooled-4 9.010µ 8.222µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 8.633µ 8.571µ ~ 1.000
_MemoryFootprint/Proposed_32KB_32concurrent-4 13.38µ 13.18µ ~ 0.200
_MemoryFootprint/Alternative_64KB_32concurrent-4 12.12µ 12.17µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 330.5µ 329.8µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 332.6µ 331.6µ ~ 0.700
GetUtxoHashes-4 275.2n 271.2n ~ 0.400
GetUtxoHashes_ManyOutputs-4 48.64µ 45.13µ ~ 0.100
_NewMetaDataFromBytes-4 213.7n 213.0n ~ 0.400
_Bytes-4 396.2n 391.8n ~ 0.400
_MetaBytes-4 139.2n 138.6n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-06-02 09:10 UTC

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CheckBlockSubtrees to make subtree fetching/validation more robust by replacing the previous block-assembly count heuristic with an adaptive pre-check + TxMetaCache sampling strategy, and by adding a convergence-style sequential retry loop for phase-2 revalidation to better handle cross-subtree dependencies.

Changes:

  • Add configurable sampling parameters (LocalAvailabilitySampleSize, LocalAvailabilityThreshold) to decide whether to bulk-download subtree_data or rely on local availability + fallback fetching.
  • Introduce sampleLocalAvailability (+ unit tests) to estimate local TxMetaCache coverage using evenly-spaced hash sampling.
  • Change phase-2 sequential revalidation to retry failed subtrees until progress stops (convergence loop).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
settings/subtreevalidation_settings.go Adds new settings for sampling size/threshold used by the adaptive subtree fetch decision.
services/subtreevalidation/sample_local_availability.go Implements TxMetaCache hit-rate sampling for subtree tx-hash availability.
services/subtreevalidation/sample_local_availability_test.go Unit tests covering key sampling behaviors (empty/all/none/coinbase/sample limits/no cache).
services/subtreevalidation/check_block_subtrees.go Replaces old heuristic with pre-check + sampling-based bulk-download decision; adds convergence loop for phase-2 revalidation; adjusts subtree_data fetch/extraction flow based on bulkDownload.
services/legacy/peer/peer.go Increases block stall response timeout from 5m to 15m.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +32 to +33
LocalAvailabilitySampleSize int `key:"subtreevalidation_local_availability_sample_size" desc:"Number of tx hashes to sample for local availability check" default:"100" category:"SubtreeValidation" usage:"Sample size for deciding bulk vs skip download" type:"int" longdesc:"### Purpose\nControls how many transaction hashes are sampled from a subtree to estimate local availability during block validation.\n\n### How It Works\nWhen a block arrives and block assembly has data, this many evenly-spaced transaction hashes are checked against the TxMetaCache. If the hit rate exceeds LocalAvailabilityThreshold, the bulk subtree_data download is skipped.\n\n### Recommendations\n- **100** (default) - Fast (~microseconds) and statistically reliable\n- Increase for very large subtrees where sampling accuracy matters more"`
LocalAvailabilityThreshold float64 `key:"subtreevalidation_local_availability_threshold" desc:"Hit rate above which bulk subtree_data download is skipped" default:"0.9" category:"SubtreeValidation" usage:"Threshold for local availability sampling" type:"float64" longdesc:"### Purpose\nSets the minimum TxMetaCache hit rate from sampling that triggers skipping the bulk subtree_data download.\n\n### How It Works\nAfter sampling transaction hashes from a subtree, if the hit rate is above this threshold, the system assumes most transactions are available locally and skips the expensive subtree_data download. ValidateSubtreeInternal handles the few missing transactions via its 3-stage fallback.\n\n### Recommendations\n- **0.9** (default) - Skip download when 90%+ of sampled txs are cached locally\n- Lower values skip downloads more aggressively (risk more fallback work)\n- Higher values trigger bulk downloads more often (safer but more bandwidth)"`

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resolved] Settings are now properly wired in settings.go:579-580 with getInt and getFloat64.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added getInt and getFloat64 wiring in settings.go for both LocalAvailabilitySampleSize (default 100) and LocalAvailabilityThreshold (default 0.9). Also added clamping at the call site as a safety net.

Comment on lines +28 to +35
if sampleSize > len(nodes) {
sampleSize = len(nodes)
}

// Evenly spaced sampling across the subtree for deterministic, representative coverage
step := len(nodes) / sampleSize
if step == 0 {
step = 1

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resolved] Guard added at line 27-29 of sample_local_availability.go clamping sampleSize to 100 when <= 0.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a guard that clamps sampleSize to 100 when <= 0, preventing the divide-by-zero.

Comment on lines +179 to +183
hitRate := u.sampleLocalAvailability(ctx, sampleSubtree, u.settings.SubtreeValidation.LocalAvailabilitySampleSize)
if hitRate < u.settings.SubtreeValidation.LocalAvailabilityThreshold {
bulkDownload = true
u.logger.Infof("[CheckBlockSubtrees] Sample check: %.1f%% hit rate (threshold %.1f%%) — switching to bulk download",
hitRate*100, u.settings.SubtreeValidation.LocalAvailabilityThreshold*100)

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resolved] Settings properly wired and clamping logic added at check_block_subtrees.go:206-213.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added clamping with safe defaults (sampleSize=100, threshold=0.9) before use, and also wired the settings in settings.go with getInt/getFloat64.

Comment on lines +153 to +154
sampleSubtree, _ = subtreepkg.NewSubtreeFromReader(subtreeReader)
subtreeReader.Close()

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resolved] Errors are now handled gracefully at check_block_subtrees.go:158-163, falling back to nil/bulk download on failure.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — now handling the NewSubtreeFromReader error (falls back to nil/bulk download on failure) and using defer for proper Close with error logging.

Comment on lines +163 to +172
sampleSubtree, _ = subtreepkg.NewIncompleteTreeByLeafCount(len(subtreeNodeBytes) / chainhash.HashSize)
if sampleSubtree != nil {
var nodeHash chainhash.Hash
for i := 0; i < len(subtreeNodeBytes)/chainhash.HashSize; i++ {
copy(nodeHash[:], subtreeNodeBytes[i*chainhash.HashSize:(i+1)*chainhash.HashSize])
if nodeHash.Equal(subtreepkg.CoinbasePlaceholderHashValue) {
_ = sampleSubtree.AddCoinbaseNode()
} else {
_ = sampleSubtree.AddNode(nodeHash, 0, 0)
}

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resolved] Validation added at check_block_subtrees.go:171-173, with proper error handling for AddCoinbaseNode and AddNode at lines 186-197.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added validation for response size (must be multiple of hash size), error handling for AddCoinbaseNode/AddNode calls (sets sampleSubtree to nil on failure), and error handling for NewIncompleteTreeByLeafCount.

Comment on lines +530 to +535
if len(remainingSubtrees) > 0 && len(remainingSubtrees) >= previousCount {
// No progress — return error for the first still-failing subtree
return nil, errors.WrapGRPC(errors.NewProcessingError(
"[CheckBlockSubtreesRequest] Failed to validate %d subtrees after %d convergence attempts, first: %s",
len(remainingSubtrees), attempt+1, remainingSubtrees[0].String(), lastErr,
))

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Resolved] The error tracking was fixed - no longer applicable to current code structure.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — now tracking errors per subtree in a map so the wrapped error corresponds to the reported subtree hash.

Comment on lines 75 to +78
// stallResponseTimeoutBlocks is the maximum amount of time a peer will
// wait for a block message to be received after sending a getdata
// message.
stallResponseTimeoutBlocks = 5 * time.Minute
stallResponseTimeoutBlocks = 15 * time.Minute

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Acknowledged] Timeout change is intentional per commit a2387a5 to prevent premature peer disconnects during large block validation.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout change from 5m to 15m was an intentional separate fix (commit a2387a5) to prevent premature disconnections from slower peers during large block downloads. It's on this branch because it was discovered during testing. Happy to move it to a separate PR if preferred.

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
68.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

- Wire LocalAvailabilitySampleSize and LocalAvailabilityThreshold in settings.go
- Add safe default clamping for sample size and threshold at call site
- Guard against sampleSize <= 0 in sampleLocalAvailability
- Validate peer response size (multiple of hash size) and handle AddNode errors
- Handle NewSubtreeFromReader errors with proper defer Close
- Track per-subtree errors so error message matches reported subtree hash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +175 to +179
leafCount := len(subtreeNodeBytes) / chainhash.HashSize
var treeErr error
sampleSubtree, treeErr = subtreepkg.NewIncompleteTreeByLeafCount(leafCount)
if treeErr != nil {
u.logger.Infof("[CheckBlockSubtrees] Failed to create sample subtree with %d leaves: %v", leafCount, treeErr)

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Duplicate] See updated assessment in comment 3011363207.

Copilot uses AI. Check for mistakes.

@github-actions github-actions Bot Mar 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Security Issue: Missing subtree hash verification

After constructing sampleSubtree from peer-supplied bytes (lines 176-211), the code must verify the computed root hash matches firstHash before using it for sampling decisions.

Risk: A malicious peer can send arbitrary subtree data that does not match the expected hash, causing incorrect bulk download decisions that waste bandwidth or skip necessary downloads.

Fix: Add validation after line 209 to verify sampleSubtree.RootHash().Equal(firstHash), setting sampleSubtree to nil if the hashes do not match (falling back to bulk download).

This mirrors the hash verification already present at line 417 for the main subtree fetch path.

if hash.Equal(subtreepkg.CoinbasePlaceholderHashValue) {
continue
}
found, _ := cache.GetMetaCached(ctx, hash, &meta.Data{})

Copilot AI Mar 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor - Acceptable] GetMetaCached returns (data, found bool) - the error parameter was removed in the current implementation. Errors manifest as found=false, which is the correct behavior for sampling (treat errors as misses). Current code is correct.

Copilot uses AI. Check for mistakes.
freemans13 and others added 4 commits March 30, 2026 18:57
Re-add blockAssemblyClient to subtreevalidation Server after PR bsv-blockchain#647
revert removed it. Wire it up in daemon and fix test callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
freemans13 added a commit to freemans13/teranode that referenced this pull request Apr 17, 2026
Block validation's sequential revalidation step (end of CheckBlockSubtrees)
did a single pass through the list of subtrees and returned on the first
validation failure. This breaks for blocks with many cross-subtree
transaction dependencies: if subtree B spends an output created by a tx
in subtree A, and B is attempted before A finishes populating the
internal tx cache, B fails with TX_MISSING_PARENT even though the parent
becomes available as soon as A's revalidation completes.

Symptom observed on teratestnet: block 15631 (9,216 txs, 1,305 with
cross-subtree parents) consistently failed with:

    [processTransactionsInLevels] Completed processing with 1305 errors,
    1305 transactions added to orphanage

The orphanage kept the children but block validation is all-or-nothing —
they could never be retried, so teratestnet stalled indefinitely at 15,630.

Fix: replace the single-pass revalidation with a convergence loop that
retries still-failing subtrees until either every subtree succeeds or
an iteration makes no progress (meaning the failures are real, not
ordering artefacts). Maximum iterations is bounded by the number of
subtrees since each iteration must make at least one unit of progress
to continue.

This is the "Phase 2 convergence loop" from PR bsv-blockchain#646, which I extracted
into this standalone PR as recommended in the spec for PR bsv-blockchain#715.

Testing:
- make lint: 0 issues
- go test -tags testtxmetacache ./services/subtreevalidation/... -run TestCheckBlockSubtrees: 22 passed
- Integration test: deploy to teratestnet and verify sync past 15631

🤖 Generated with [Claude Code](https://claude.com/claude-code)

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has a merge conflict with main — please rebase before review.

# Conflicts:
#	services/subtreevalidation/check_block_subtrees.go
#	test/utils/aerospike/aerospike.go
# Conflicts:
#	services/subtreevalidation/check_block_subtrees.go
@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
32.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@freemans13

Copy link
Copy Markdown
Collaborator Author

Superseded by #745.

Both PRs rewrite the same decision point in CheckBlockSubtrees — whether to download subtreeData from the peer — and they conflict directly:

#745 is the more general approach and sits in the maintained lineage (replaces #539 and #715, deliberately avoids the wall-clock/FSM gating that got #598 reverted via #647, and ships a regression guard against reintroducing it). Keeping both would mean two competing adaptive-fetch mechanisms in the same code path, so closing this one in favour of #745.

Note for the #745 deploy-verify: this PR was the fix for blocks from dev-scale-1 being marked invalid on dev-scale-2 — the old count-based localTxsAvailable heuristic skipped processTransactionsInLevels entirely when specific txs (e.g. coinbase-splitting txs) were absent despite a high pool count. Please confirm #745's pessimistic bootstrap + miss-recording actually recovers that exact scenario before relying on it, since #745's own description flags that pessimistic-mode observations are currently synthetic and the mode gauge can stick on optimistic. The dev-scale verification checkbox on #745 is still unticked.

@freemans13 freemans13 closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants