Skip to content

perf(blockchain/sql): fast-path GetBlockInChainByHeightHash on on_main_chain (#1018)#1022

Merged
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/sql-sync-perf
Jun 2, 2026
Merged

perf(blockchain/sql): fast-path GetBlockInChainByHeightHash on on_main_chain (#1018)#1022
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/sql-sync-perf

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Closes #1018 (for the remaining sites).

What

8 of the 10 sites in #1018 already had the on_main_chain fast path merged. This finishes the list:

  • GetBlockInChainByHeightHash — added the fast path. Because the function is parameterized by a startHash that may be a fork tip, it uses the preflight variant (same as GetLatestBlockHeaderFromBlockLocator), not the plain mainChainRebuilding-only gate: it checks whether startHash is itself on_main_chain and only then replaces the recursive ChainBlocks walk with a single WHERE on_main_chain = true AND height = $1 lookup on idx_on_main_chain_height. Fork-tip starts and reorg windows (mainChainRebuilding > 0) fall back to the CTE.
    • The fast path keeps AND height <= startHash.height so it is exactly equivalent to the CTE (no row when height > startHash.height).
  • GetForkedBlockHeaders — intentionally kept on the CTE. Its semantic is "all blocks NOT in the ancestor set of startHash", which has no equivalent on_main_chain predicate (NOT on_main_chain drops the main-chain ancestors when the start is a main tip; on_main_chain = false is wrong when the start is a fork tip). Documented inline. It also has no production callers outside the store.

The other 6 WITH RECURSIVE sites in the package (GetSuitableBlock, GetBlockHeadersFromTill, CheckBlockIsAncestorOfBlock, GetBlockHeadersFromOldest, GetHashOfAncestorBlock, GetBlocks) are not in the #1018 list and are genuinely fork-aware — left unchanged.

Correctness

The fast path is only taken when startHash is on the main chain, so the ancestor at any lower height is also on the main chain by definition. The mainChainRebuilding reference counter brackets every window where on_main_chain is transiently inconsistent (StoreBlock, InvalidateBlock, RevalidateBlock, rebuildOnMainChainFlag); when it is non-zero the CTE path is forced. The preflight/main-query TOCTOU is bounded and self-healing under the store's single-writer model (same reasoning as the existing locator fast path).

Tests

Added TestSQL_GetBlockInChainByHeightHash_FastPathEquivalence: asserts the fast path, the CTE (forced via the rebuilding flag), and GetBlockByHeight all agree across every height for an on-main start, plus the height == startHash.height and height > startHash.height edges. The pre-existing fork-tip test already covers the CTE fallback.

  • go build ./stores/blockchain/... ./services/blockchain/... — ok
  • go vet / staticcheck / golangci-lint — clean
  • go test ./stores/blockchain/sql/ — ok
  • go test -race ./stores/blockchain/sql/ — ok

Note on #1019

#1019 (drop 4 indexes) was investigated alongside this. Conclusion: do not drop them — all four have real readers; the "0 scans" was a measurement-window artefact. Details posted on that issue.

…n_chain (bsv-blockchain#1018)

GetBlockInChainByHeightHash unconditionally walked the parent_id chain via the
recursive ChainBlocks CTE even when startHash sits on the canonical chain. Add
the same on_main_chain fast path the other locator/height queries already use:
preflight whether startHash is on the main chain and, when it is (and no
main-chain rebuild is in progress), replace the recursive walk with a single
WHERE on_main_chain = true AND height = $1 index lookup on idx_on_main_chain_height.

The fast path keeps the AND height <= startHash.height bound so it stays exactly
equivalent to the CTE (no row returned when height > startHash.height). Fork-tip
starts (on_main_chain = false) and reorg windows (mainChainRebuilding > 0) fall
back to the recursive CTE, preserving fork-aware behaviour.

GetForkedBlockHeaders keeps its CTE: its "all blocks NOT in the ancestor set of
startHash" semantic has no equivalent on_main_chain predicate; documented inline.

Adds an equivalence test asserting the fast path, the CTE, and GetBlockByHeight
agree across all heights for an on-main start, plus the rebuilding-gate and
height-boundary edges.
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Summary: No issues found. This is a well-crafted performance optimization with solid correctness guarantees.

Key Strengths:

  1. Correct fast-path guard: The preflight check verifies startHash is on the main chain before using the fast path, ensuring semantic equivalence with the CTE for fork-aware queries.

  2. Proper concurrency control: The mainChainRebuilding counter correctly brackets all windows where on_main_chain flags are transiently inconsistent, forcing fallback to the authoritative CTE path during reorgs.

  3. Comprehensive test coverage: TestSQL_GetBlockInChainByHeightHash_FastPathEquivalence verifies both paths agree across all heights and correctly tests the guard forcing CTE fallback.

  4. Documentation clarity: The inline comments explain the TOCTOU window, why GetForkedBlockHeaders cannot use the fast path, and the correctness reasoning for the height upper bound.

Verification per AGENTS.md:

The PR states verification commands were run (go test, go test -race, go vet, staticcheck, golangci-lint). Test implementation confirms the equivalence property under the documented concurrency model.

@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1022 (a1030dd)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.821µ 1.749µ ~ 0.500
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 63.73n 61.52n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.56n 61.64n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.70n 61.64n ~ 0.800
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.14n 30.55n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 52.13n 50.50n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 104.9n 106.5n ~ 0.400
MiningCandidate_Stringify_Short-4 259.5n 260.1n ~ 0.400
MiningCandidate_Stringify_Long-4 1.891µ 1.892µ ~ 1.000
MiningSolution_Stringify-4 971.2n 969.4n ~ 0.400
BlockInfo_MarshalJSON-4 1.766µ 1.761µ ~ 0.200
NewFromBytes-4 125.1n 124.5n ~ 0.700
AddTxBatchColumnar_Validation-4 2.471µ 2.415µ ~ 1.000
OffsetValidationLoop-4 718.3n 720.0n ~ 0.700
Mine_EasyDifficulty-4 66.99µ 66.92µ ~ 0.200
Mine_WithAddress-4 7.061µ 7.003µ ~ 1.000
BlockAssembler_AddTx-4 0.02773n 0.02614n ~ 0.400
AddNode-4 10.99 11.31 ~ 0.100
AddNodeWithMap-4 11.84 11.80 ~ 1.000
DiskTxMap_SetIfNotExists-4 3.609µ 3.594µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.414µ 3.425µ ~ 1.000
DiskTxMap_ExistenceOnly-4 321.5n 320.8n ~ 0.700
Queue-4 190.5n 189.7n ~ 1.000
AtomicPointer-4 4.427n 4.319n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 931.4µ 904.8µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/10K-4 840.5µ 844.2µ ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/10K-4 111.4µ 111.1µ ~ 1.000
ReorgOptimizations/AllMarkFalse/New/10K-4 62.67µ 61.65µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 57.41µ 71.73µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 12.21µ 11.98µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.863µ 5.355µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.631µ 1.646µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.802m 10.207m ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.17m 10.13m ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.167m 1.115m ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 685.6µ 693.5µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 551.2µ 552.3µ ~ 1.000
ReorgOptimizations/HashSlicePool/New/100K-4 290.0µ 290.9µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 48.55µ 50.41µ ~ 0.400
ReorgOptimizations/NodeFlags/New/100K-4 16.57µ 15.96µ ~ 0.100
TxMapSetIfNotExists-4 52.92n 54.14n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 39.69n 39.68n ~ 1.000
ChannelSendReceive-4 600.4n 628.8n ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 59.50n 56.92n ~ 0.200
DirectSubtreeAdd/64_per_subtree-4 29.02n 28.98n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 28.58n 28.10n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 26.57n 26.56n ~ 1.000
DirectSubtreeAdd/2048_per_subtree-4 26.14n 26.19n ~ 0.700
SubtreeProcessorAdd/4_per_subtree-4 291.9n 297.0n ~ 0.200
SubtreeProcessorAdd/64_per_subtree-4 287.8n 305.6n ~ 0.100
SubtreeProcessorAdd/256_per_subtree-4 285.8n 295.7n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 282.0n 282.5n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 279.1n 280.4n ~ 0.700
SubtreeProcessorRotate/4_per_subtree-4 280.7n 288.4n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 280.2n 284.0n ~ 0.200
SubtreeProcessorRotate/256_per_subtree-4 277.4n 279.8n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 277.0n 280.8n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.02n 54.81n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 36.03n 36.71n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 35.12n 35.16n ~ 1.000
SubtreeNodeAddOnly/1024_per_subtree-4 34.50n 34.40n ~ 0.500
SubtreeCreationOnly/4_per_subtree-4 111.0n 109.9n ~ 0.600
SubtreeCreationOnly/64_per_subtree-4 351.5n 350.6n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.236µ 1.240µ ~ 0.200
SubtreeCreationOnly/1024_per_subtree-4 3.792µ 3.841µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 6.841µ 6.796µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 278.4n 279.9n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 277.0n 280.5n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 1.988m 1.992m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.162m 5.165m ~ 0.700
ParallelGetAndSetIfNotExists/50k_nodes-4 7.244m 7.667m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 9.723m 10.345m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.787m 1.771m ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 4.452m 4.495m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 13.41m 13.52m ~ 0.400
SequentialGetAndSetIfNotExists/100k_nodes-4 24.80m 25.32m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.037m 2.038m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.345m 8.376m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.17m 13.49m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.792m 1.816m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.950m 8.006m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 43.21m 43.62m ~ 0.400
CalcBlockWork-4 485.2n 477.3n ~ 0.700
CalculateWork-4 651.3n 653.2n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.322µ 1.315µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_100-4 12.57µ 12.60µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 156.7µ 147.5µ ~ 1.000
CatchupWithHeaderCache-4 103.8m 103.9m ~ 0.400
SubtreeSizes/10k_tx_4_per_subtree-4 1.334m 1.339m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 312.8µ 318.5µ ~ 0.100
SubtreeSizes/10k_tx_64_per_subtree-4 74.79µ 75.30µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 18.57µ 18.70µ ~ 0.700
SubtreeSizes/10k_tx_512_per_subtree-4 9.222µ 9.275µ ~ 0.700
SubtreeSizes/10k_tx_1024_per_subtree-4 4.617µ 4.701µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.297µ 2.331µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 73.94µ 75.35µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 18.52µ 18.99µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.591µ 4.647µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 387.5µ 392.6µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 93.79µ 93.65µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 22.83µ 23.10µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 156.8µ 161.0µ ~ 0.200
SubtreeAllocations/small_subtrees_data_fetch-4 163.7µ 162.2µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 321.5µ 323.3µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 9.272µ 9.608µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.523µ 9.463µ ~ 0.400
SubtreeAllocations/medium_subtrees_full_validation-4 18.70µ 19.01µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.217µ 2.274µ ~ 0.400
SubtreeAllocations/large_subtrees_data_fetch-4 2.299µ 2.296µ ~ 0.700
SubtreeAllocations/large_subtrees_full_validation-4 4.652µ 4.767µ ~ 0.100
_BufferPoolAllocation/16KB-4 5.840µ 3.852µ ~ 0.100
_BufferPoolAllocation/32KB-4 8.436µ 8.151µ ~ 0.400
_BufferPoolAllocation/64KB-4 17.36µ 20.02µ ~ 0.700
_BufferPoolAllocation/128KB-4 35.22µ 29.39µ ~ 0.100
_BufferPoolAllocation/512KB-4 125.9µ 127.0µ ~ 1.000
_BufferPoolConcurrent/32KB-4 19.45µ 20.11µ ~ 0.400
_BufferPoolConcurrent/64KB-4 30.88µ 32.27µ ~ 0.200
_BufferPoolConcurrent/512KB-4 155.6µ 158.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 640.2µ 729.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 634.8µ 716.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 634.6µ 742.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 658.2µ 783.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 660.8µ 628.8µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 38.28m 37.32m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.99m 37.33m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.72m 37.07m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.64m 37.01m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/512KB-4 37.45m 36.89m ~ 0.100
_PooledVsNonPooled/Pooled-4 833.2n 842.8n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.847µ 8.316µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 7.427µ 8.197µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.586µ 12.377µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.436µ 10.891µ ~ 0.100
_prepareTxsPerLevel-4 405.9m 402.0m ~ 0.400
_prepareTxsPerLevelOrdered-4 3.785m 3.491m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 405.0m 394.7m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.863m 3.469m ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 321.4µ 326.9µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 327.0µ 325.4µ ~ 1.000
GetUtxoHashes-4 271.0n 269.6n ~ 1.000
GetUtxoHashes_ManyOutputs-4 44.19µ 45.38µ ~ 0.100
_NewMetaDataFromBytes-4 229.5n 227.4n ~ 0.100
_Bytes-4 399.3n 401.9n ~ 0.400
_MetaBytes-4 138.3n 138.9n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-06-02 12:11 UTC

@oskarszoon oskarszoon self-assigned this Jun 2, 2026

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Clean, conservative read-path optimization behind a correct preflight guard, mirroring the established GetLatestHeaderFromBlockLocator pattern line-for-line.

  • Path selection is sound: fast path only when !rebuilding && startOnMain; CTE fallback in every ambiguous case (fork-tip start, rebuilding window, DB error, unknown hash).
  • Parameter binding ($1=height, $2=hash) is consistent across both branches.
  • CTE equivalence verified at the edges (height == / > startHash.height, unknown hash).
  • TOCTOU correctly bounded by mainChainRebuilding and self-healing under the single-writer model.
  • Good test: equivalence across every height + edges, dynamic tip discovery, and cache flush between branches so the CTE path isn't masked. Passes -race locally.
  • Correctly leaves GetForkedBlockHeaders and the 6 fork-aware sites on the CTE, with clear inline justification.

Optional follow-ups (non-blocking): assert BlockNotFound type on the height>tip edge instead of a bare error.

@freemans13 freemans13 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Focused, well-documented performance change that completes #1018.

Correctness verified:

  • Fast-path/CTE equivalence holds: outside a rebuild window there is exactly one on_main_chain=true block per height, so for an on-main startHash the ancestor at any lower height is that block. The height <= startHash.height bound preserves the CTE's walk-from-seed semantics, including the no-row-above-seed edge.
  • Fork-tip and reorg windows correctly fall back to the CTE via the startOnMain preflight and the mainChainRebuilding guard. Pattern matches the already-merged GetLatestHeaderFromBlockLocator fast path line-for-line.
  • The on_main_chain column and mainChainRebuilding guard the fast path depends on are maintained unconditionally (independent of blockchain_use_in_memory_chain_check), so behaviour is identical with the in-memory chain check on or off.
  • GetForkedBlockHeaders correctly left on the CTE — its complement-of-ancestor-set semantic has no on_main_chain equivalent; good inline rationale.

Test coverage is solid: the new equivalence test asserts fast path == forced-CTE == GetBlockByHeight across every height plus the height edges, discovers the tip dynamically, and clears the response cache between branches. Fork-tip fallback is exercised by the pre-existing test's losing fork.

Minor non-blocking nits: preflight uses startHash.CloneBytes() where startHash[:] would avoid the alloc (locator path uses the slice form); fast-path calls now make one extra indexed round-trip for the preflight (still a clear win over the recursive CTE).

@oskarszoon oskarszoon merged commit 16075f1 into bsv-blockchain:main Jun 2, 2026
34 checks passed
freemans13 added a commit to freemans13/teranode that referenced this pull request Jun 2, 2026
…tegration

Merge stu/blockvalidation-inmemory-checkold (ca7ef31) — off-chain prefetch
for checkOldBlockIDs plus the upstream/main it carried (bsv-blockchain#1005, bsv-blockchain#993, bsv-blockchain#1022).

Conflict resolutions:
- services/legacy/netsync/manager.go: kept integration's gap-watchdog
  self-heal (requestBlocksFromTip / maybeResyncOnMissingParent /
  missingParentResyncInterval); the incoming side had no content there.
- stores/utxo/aerospike/batch_create.go: pass nil for the new GetBinsToStore
  arena param (main added *bt.Arena; integration-only BatchCreate caller
  predates it). nil = non-arena route, output-equivalent per create_arena_test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

blockchain/sql: recursive ChainBlocks CTE doesn't fast-path on_main_chain — billions of wasted PK scans on every locator build

3 participants