Skip to content

perf(blockchain): main-chain fast-path for block locator builder (#1021)#1023

Merged
oskarszoon merged 13 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/sql-sync-perf-3
Jun 4, 2026
Merged

perf(blockchain): main-chain fast-path for block locator builder (#1021)#1023
oskarszoon merged 13 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/sql-sync-perf-3

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Summary

Adds a main-chain fast path to the block locator builder (getBlockLocator), closing #1021.

Previously, building a locator from the tip issued ~12–21 sequential recursive-CTE calls (GetBlockInChainByHeightHash per entry), a cumulative O(chain-height) parent_id traversal to produce ~33 hashes. For a start hash on the main chain this is now a single indexed query.

  • computeLocatorHeights — the height schedule extracted as pure arithmetic (no DB), unit-tested against golden sequences including the height-skip boundary.
  • MainChainBlockHashesByHeights (new store method) — fetches the main-chain block hash at every locator height in one query over idx_on_main_chain_height. Gated by an on_main_chain preflight and the mainChainRebuilding guard; the height ceiling is derived from the start hash's own height (not trusted from the caller). Mirrors the existing GetLatestBlockHeaderFromBlockLocator pattern.
  • Fork tips, mid-rebuild, unknown hashes, or any incomplete result fall back to the original per-height walk — produced locator is byte-identical to before.

Performance

Cold benchmark (5000-deep chain, in-memory SQLite, no network), fast path vs per-height CTE walk:

ns/op B/op allocs/op
fast-path-batch 59,913 9,394 314
per-height-cte-walk 3,426,367 145,972 3,627

~57× faster, ~15× fewer bytes, ~11× fewer allocs. The gap widens with chain depth and with real PostgreSQL network round-trips (12–21 round-trips collapse to one).

Test Plan

  • computeLocatorHeights golden test (heights 0, 1, 12, 13, 255, 1000)
  • Fast-path vs walk equivalence (MockStore, linked chain, tips 0/5/12/13/255/1000)
  • Store unit tests (sqlitememory): fork-tip / mid-rebuild / missing-height / unknown-hash fallback; fast-path agrees with CTE walk; hash-derived ceiling
  • PostgreSQL integration (testcontainers): same hashes as CTE; EXPLAIN confirms idx_on_main_chain_height (planner pinned via enable_seqscan=off); fork-tip fallback
  • Consumer packages (asset / blockvalidation / legacy) green — locator hash sequence unchanged
  • go build ./..., go vet, -race, golangci-lint clean

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. The implementation is well-structured with comprehensive tests and defensive fallback logic.

Highlights:

  • Correctness: Main-chain linearity guarantees matched by height ceiling enforcement
  • Safety: Multiple fallback conditions (fork tips, reorg guard, incomplete results)
  • Testing: Unit, integration, PostgreSQL, benchmark, and equivalence tests all present
  • Performance: ~57× speedup with proper index usage verification

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1023 (dc2d522)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.725µ 1.724µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.58n 61.65n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.59n 61.55n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.63n 61.75n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.44n 30.48n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 51.77n 51.61n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 112.9n 109.0n ~ 0.100
MiningCandidate_Stringify_Short-4 261.0n 260.7n ~ 0.600
MiningCandidate_Stringify_Long-4 1.888µ 1.879µ ~ 0.400
MiningSolution_Stringify-4 962.2n 960.5n ~ 0.700
BlockInfo_MarshalJSON-4 1.775µ 1.773µ ~ 0.800
NewFromBytes-4 131.2n 130.1n ~ 0.100
AddTxBatchColumnar_Validation-4 2.505µ 2.478µ ~ 0.400
OffsetValidationLoop-4 636.4n 640.3n ~ 0.100
Mine_EasyDifficulty-4 60.68µ 61.11µ ~ 0.100
Mine_WithAddress-4 7.752µ 6.899µ ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 57.89n 58.20n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 30.10n 30.09n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 29.01n 29.12n ~ 0.800
DirectSubtreeAdd/1024_per_subtree-4 27.98n 28.03n ~ 0.800
DirectSubtreeAdd/2048_per_subtree-4 27.61n 27.61n ~ 1.000
SubtreeProcessorAdd/4_per_subtree-4 283.8n 279.1n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 276.0n 277.8n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 279.5n 278.3n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 269.3n 269.3n ~ 0.800
SubtreeProcessorAdd/2048_per_subtree-4 270.2n 271.7n ~ 0.500
SubtreeProcessorRotate/4_per_subtree-4 275.4n 276.8n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 275.0n 277.0n ~ 0.200
SubtreeProcessorRotate/256_per_subtree-4 274.3n 275.3n ~ 0.500
SubtreeProcessorRotate/1024_per_subtree-4 272.0n 278.5n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.58n 54.35n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 34.43n 34.29n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.42n 33.26n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.62n 32.55n ~ 0.400
SubtreeCreationOnly/4_per_subtree-4 116.0n 113.6n ~ 0.200
SubtreeCreationOnly/64_per_subtree-4 421.0n 397.4n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.419µ 1.349µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.390µ 4.444µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.322µ 8.144µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 273.1n 273.2n ~ 1.000
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 270.8n 272.5n ~ 0.400
ParallelGetAndSetIfNotExists/1k_nodes-4 2.171m 2.213m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.303m 5.379m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.183m 7.224m ~ 0.200
ParallelGetAndSetIfNotExists/100k_nodes-4 9.830m 9.896m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.936m 1.947m ~ 0.700
SequentialGetAndSetIfNotExists/10k_nodes-4 4.350m 4.351m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 12.21m 12.45m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 21.91m 22.16m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.242m 2.244m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.149m 8.158m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.01m 12.99m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.988m 1.969m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.544m 7.457m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.90m 40.89m ~ 0.700
DiskTxMap_SetIfNotExists-4 3.835µ 3.887µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.502µ 3.639µ ~ 0.700
DiskTxMap_ExistenceOnly-4 344.3n 399.5n ~ 0.400
Queue-4 189.0n 193.2n ~ 0.100
AtomicPointer-4 3.669n 3.647n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 854.5µ 842.6µ ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/10K-4 777.6µ 809.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 108.3µ 115.9µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.13µ 64.09µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 60.34µ 68.84µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.13µ 11.29µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 4.553µ 5.137µ ~ 0.200
ReorgOptimizations/NodeFlags/New/10K-4 1.564µ 1.944µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.322m 10.468m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.928m 11.269m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.105m 1.093m ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 707.0µ 706.5µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/100K-4 610.7µ 527.3µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 198.4µ 197.5µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 49.73µ 48.23µ ~ 0.200
ReorgOptimizations/NodeFlags/New/100K-4 17.30µ 16.51µ ~ 0.100
TxMapSetIfNotExists-4 49.44n 49.60n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 41.41n 41.33n ~ 1.000
ChannelSendReceive-4 620.2n 621.0n ~ 1.000
BlockAssembler_AddTx-4 0.02865n 0.02913n ~ 0.600
AddNode-4 12.28 11.99 ~ 0.100
AddNodeWithMap-4 12.42 12.63 ~ 0.100
CalcBlockWork-4 516.5n 510.8n ~ 0.700
CalculateWork-4 705.1n 715.1n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.467µ 1.345µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_100-4 12.82µ 15.44µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 126.8µ 127.5µ ~ 1.000
CatchupWithHeaderCache-4 104.4m 104.3m ~ 0.700
_prepareTxsPerLevel-4 416.7m 416.0m ~ 1.000
_prepareTxsPerLevelOrdered-4 4.076m 3.671m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 421.7m 421.0m ~ 1.000
_prepareTxsPerLevel_Comparison/Optimized-4 3.740m 3.846m ~ 0.400
_BufferPoolAllocation/16KB-4 4.008µ 3.791µ ~ 0.100
_BufferPoolAllocation/32KB-4 9.439µ 9.940µ ~ 0.700
_BufferPoolAllocation/64KB-4 20.16µ 19.60µ ~ 0.700
_BufferPoolAllocation/128KB-4 37.63µ 33.89µ ~ 0.100
_BufferPoolAllocation/512KB-4 127.8µ 126.3µ ~ 0.400
_BufferPoolConcurrent/32KB-4 19.03µ 18.65µ ~ 0.700
_BufferPoolConcurrent/64KB-4 30.53µ 29.40µ ~ 0.100
_BufferPoolConcurrent/512KB-4 142.6µ 147.1µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/16KB-4 619.0µ 665.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 635.7µ 642.2µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/64KB-4 645.8µ 652.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 643.7µ 654.5µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/512KB-4 643.2µ 583.5µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.76m 37.86m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.83m 37.58m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.68m 37.38m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.76m 37.72m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.24m 37.15m ~ 0.100
_PooledVsNonPooled/Pooled-4 830.7n 833.0n ~ 0.700
_PooledVsNonPooled/NonPooled-4 7.909µ 8.575µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.679µ 6.513µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.567µ 10.585µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.174µ 9.521µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.310m 1.284m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 304.2µ 304.2µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 74.06µ 72.54µ ~ 0.200
SubtreeSizes/10k_tx_256_per_subtree-4 18.30µ 18.43µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 9.225µ 8.994µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.473µ 4.479µ ~ 0.700
SubtreeSizes/10k_tx_2k_per_subtree-4 2.227µ 2.201µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 70.24µ 71.03µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 17.91µ 17.87µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.419µ 4.399µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 373.2µ 368.8µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 88.53µ 87.57µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.71µ 21.50µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 149.4µ 151.7µ ~ 0.200
SubtreeAllocations/small_subtrees_data_fetch-4 160.4µ 160.8µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 306.1µ 309.9µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.068µ 8.883µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.447µ 9.326µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 17.86µ 17.41µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.135µ 2.116µ ~ 0.400
SubtreeAllocations/large_subtrees_data_fetch-4 2.280µ 2.240µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.436µ 4.364µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 337.3µ 334.9µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 334.9µ 343.8µ ~ 0.200
GetUtxoHashes-4 282.8n 284.1n ~ 1.000
GetUtxoHashes_ManyOutputs-4 46.55µ 47.59µ ~ 0.100
_NewMetaDataFromBytes-4 214.1n 215.6n ~ 0.700
_Bytes-4 393.9n 401.5n ~ 0.100
_MetaBytes-4 139.2n 139.6n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-06-04 07:38 UTC

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — well-scoped, well-tested perf change with a conservative correctness posture (fall back whenever the fast path can't be proven equivalent). Built the changed packages and ran the locator + store unit tests locally, all green.

Correctness — solid

  • Fast-path equivalence is sound: the main chain is linear, so the on_main_chain=true block at height h <= startHeight is the unique ancestor of startHash at h, identical to the CTE walk. Verified by Test_getBlockLocator_FastPathMatchesWalk and TestMainChainBlockHashesByHeights_AgreesWithCTEWalk.
  • Ceiling derived from the start hash's own height rather than trusting the caller — Test...CeilingFromHashHeight pins this. Good defensive design.
  • Fallback coverage is complete (fork tip, mid-rebuild, unknown hash, empty inputs, incomplete/short result, collapsed duplicate heights) and closely mirrors the existing GetLatestBlockHeaderFromBlockLocator pattern.
  • No injection risk — both branches fully parameterized.

Comments (non-blocking)

  1. uint8 cast replaces the safe-conversion helper (Server.go:3167, maxEntries = uint8(blockHeaderHeight) + 1). Functionally safe (guarded by <= 12) but diverges from the codebase's safeconversion convention and is exactly the uint32->uint8 narrowing gosec G115 flags. Please confirm gosec config; since maxEntries is only a capacity hint, computing it as an int would sidestep it entirely.
  2. TOCTOU note dropped. The reference GetLatestBlockHeaderFromBlockLocator documents the non-atomic gap between the rebuild-guard check and the main query (bounded/self-healing under the single-writer model). The same race exists in MainChainBlockHashesByHeights but isn't commented — worth porting the paragraph for parity.
  3. Preflight swallows all DB errors as "not on main chain." Acceptable and documented, but there's no signal distinguishing an expected fork-tip fallback from an unhealthy DB silently disabling the fast path. Consider a trace attribute / debug counter on the fallback reason.
  4. Nit: genesis being on_main_chain=true is the load-bearing assumption behind the benchmark numbers (otherwise every build silently falls back — still correct, just un-optimized). Covered by the rebuild walking to height 0; flagging as the key assumption.

Tests — strong

Golden height schedules, fast-vs-walk equivalence, all fallback branches, hash-derived ceiling, PostgreSQL testcontainer parity, and an EXPLAIN test that pins the index path with enable_seqscan=off. The cold benchmark honestly clears responseCache outside the timer and documents why, keeping the 57x claim credible. Minor smell: benchLocatorHeights duplicates computeLocatorHeights to avoid a service->store import — commented and benchmark-only, fine as-is.

…gnitive complexity

Rename MainChainBlockHashesByHeights{,_test}.go to snake_case per
docs/references/codingConventions.md, and split the store method into
mainChainHeightsQuery + scanHeightHashes helpers to drop SonarQube
cognitive complexity from 19 to below the 15 threshold.
@sonarqubecloud

sonarqubecloud Bot commented Jun 4, 2026

Copy link
Copy Markdown

@freemans13 freemans13 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Well-scoped main-chain fast path that faithfully mirrors the existing GetLatestBlockHeaderFromBlockLocator/GetBlockHeaders pattern, with byte-identical output guaranteed by an exhaustive fallback (fork tip, mid-rebuild, unknown hash, missing/incomplete result).

Verified locally: go build + go vet clean on both packages; targeted unit tests (computeLocatorHeights, FastPathMatchesWalk, MainChainBlockHashesByHeights*) pass. Worked through the hash-derived ceiling in both H_caller>H_actual and H_caller<H_actual directions — consistent with the walk in both.

Non-blocking nit: the real-store equivalence tests (sqlitememory/Postgres) request only heights {3,2,1}/{2,1}; since every production locator ends at genesis (0), adding height 0 to one real-store assertion would guard against a silent always-fallback regression if genesis were ever not flagged on_main_chain.

@oskarszoon oskarszoon merged commit 17a9777 into bsv-blockchain:main Jun 4, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants