Skip to content

fix(blockchain/sql): snapshot-isolate on_main_chain rebuild to prevent startup race#1044

Merged
oskarszoon merged 5 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/blockchain-on-main-chain-startup-race
Jun 8, 2026
Merged

fix(blockchain/sql): snapshot-isolate on_main_chain rebuild to prevent startup race#1044
oskarszoon merged 5 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/blockchain-on-main-chain-startup-race

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Summary

rebuildOnMainChainFlag now runs its rebuild inside a REPEATABLE READ transaction (with a bounded retry on serialization conflicts) instead of READ COMMITTED. This closes a startup race that could leave a main-chain block flagged on_main_chain = false — corrupting the canonical chain view.

The bug

New() launches a startup goroutine that calls rebuildOnMainChainFlag(ctx, false) without holding slowPathMu. The function opens a READ COMMITTED transaction, so each statement gets its own snapshot:

  1. bestQ reads bestBlockID = genesis (captured before any blocks are stored).
  2. A concurrent StoreBlock(block1) commits block1 with on_main_chain = true.
  3. q1 then runs. Under READ COMMITTED its statement snapshot now sees the committed block1, but the main_chain CTE is still rooted at the stale bestBlockID = genesis, which walks to {genesis} only. block1 (id != genesis) matches q1's ... AND id NOT IN (SELECT id FROM main_chain) clause and is cleared to on_main_chain = false.
  4. q2 cannot repair it (block1 is not in main_chain = {genesis}).

Final state: genesis = true, block1 = false, block2 = true — an impossible chain (an on-chain block whose parent is off-chain). GetLatestBlockHeaderFromBlockLocator then can't find the common ancestor and returns BLOCK_NOT_FOUND, wedging header sync. It reproduces ~60% of the time under -race on a node that starts syncing immediately after boot, and is the dominant source of stores/blockchain/sql unit-test flakiness.

The fix

  • rebuildOnMainChainFlag's transaction now uses sql.LevelRepeatableRead, so bestQ, q1 and q2 all share one snapshot. Blocks committed after bestQ are invisible to the UPDATEs and can no longer be wrongly cleared.
  • REPEATABLE READ can abort with 40001 (serialization_failure) / 40P01 (deadlock_detected) on a write-write conflict. The one caller without slowPathMu (the startup goroutine) can hit this against a slow-path StoreBlock, so the whole transaction is retried (up to 3×). *sql.Tx bypasses usql's per-statement retry, hence the retry lives here. The raw driver error is returned on those codes so the wrapper can classify it (errors.NewStorageError would otherwise flatten the typed *pgconn.PgError).
  • SQLite never returns those SQLSTATEs, so the retry is a single pass there.
  • The hot reorg path (reconcileOnMainChain, a single-statement UPDATE that already shares one snapshot) is unchanged. InvalidateBlock/RevalidateBlock already hold slowPathMu and are unaffected.

Verification

  • Root-caused with an instrumented -race repro (block2's plain extend flipping parent block1 off-main).
  • Post-fix, under -race: the targeted repro ×60 → 0 failures (was ~60%); GetLatestBlockHeaderFromBlockLocator fork tests (postgres + sqlite) ×20 → 0 failures; full stores/blockchain/sql package → green.

Note

The retry is load-bearing: backgroundRefreshLoop does not re-run rebuildOnMainChainFlag, so a rebuild that exhausted its retries would self-heal only on the next restart. Startup forks are rare, but worth a metric/alert if this retry is ever observed firing in production.

Test plan

  • go test -race -tags testtxmetacache ./stores/blockchain/sql/...
  • header sync from a node that begins syncing immediately on boot (no BLOCK_NOT_FOUND from GetLatestBlockHeaderFromBlockLocator)

…t startup race

rebuildOnMainChainFlag ran under READ COMMITTED, so its bestQ SELECT and
q1/q2 UPDATEs used independent snapshots. The startup goroutine calls it
without slowPathMu; a block committed by a concurrent StoreBlock between
bestQ and q1 is visible to q1 but absent from the main_chain CTE (rooted at
the stale pre-block bestBlockID), so q1's NOT-IN clause clears that block's
on_main_chain flag — leaving an on-chain block with an off-chain parent and
wedging GetLatestBlockHeaderFromBlockLocator (BLOCK_NOT_FOUND).

Run the rebuild at REPEATABLE READ so all three statements share one
snapshot; retry up to 3x on 40001/40P01 (the startup goroutine holds no
lock and can conflict with a slow-path StoreBlock). Return the raw driver
error on those codes so the retry wrapper can classify it. SQLite is
unaffected (single pass); the single-statement reconcileOnMainChain reorg
path is unchanged.
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:
No issues found. The implementation correctly addresses the race condition with proper transaction isolation and retry logic.

Key Observations:

  • REPEATABLE READ isolation ensures consistent snapshot across all statements within the transaction
  • Retry logic properly handles PostgreSQL serialization conflicts (40001/40P01) from concurrent slow-path StoreBlock operations during startup
  • Both pgx and lib/pq driver error types are correctly detected via errors.As
  • SQLite compatibility maintained (modernc driver ignores isolation level and never returns these error codes)
  • Exponential backoff with context cancellation properly implemented
  • Raw driver errors correctly returned for caller classification without losing type information
  • Test coverage comprehensive, including wrapped error handling

History:

  • ✅ Fixed: Misleading retry logging on final attempt (previously flagged by Copilot, resolved in 021b934)

This comment was marked as outdated.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1044 (dd02a65)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.979µ 1.937µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 59.87n 60.45n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 59.76n 59.73n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 62.75n 60.03n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 35.52n 57.23n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 78.38n 100.10n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 218.9n 261.6n ~ 0.200
MiningCandidate_Stringify_Short-4 282.1n 278.0n ~ 0.700
MiningCandidate_Stringify_Long-4 1.920µ 2.064µ ~ 0.100
MiningSolution_Stringify-4 1.018µ 1.055µ ~ 0.400
BlockInfo_MarshalJSON-4 1.935µ 1.805µ ~ 0.200
NewFromBytes-4 129.7n 130.0n ~ 0.700
AddTxBatchColumnar_Validation-4 2.495µ 2.777µ ~ 0.100
OffsetValidationLoop-4 640.6n 644.4n ~ 0.700
Mine_EasyDifficulty-4 65.58µ 65.50µ ~ 1.000
Mine_WithAddress-4 6.962µ 7.741µ ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 59.79n 58.67n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 31.49n 31.55n ~ 0.400
DirectSubtreeAdd/256_per_subtree-4 30.45n 30.68n ~ 0.400
DirectSubtreeAdd/1024_per_subtree-4 29.28n 29.29n ~ 0.800
DirectSubtreeAdd/2048_per_subtree-4 28.86n 28.90n ~ 0.600
SubtreeProcessorAdd/4_per_subtree-4 280.2n 284.3n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 276.1n 268.2n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 265.2n 265.4n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 266.3n 267.6n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 268.5n 265.9n ~ 0.800
SubtreeProcessorRotate/4_per_subtree-4 268.0n 269.3n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 263.8n 268.0n ~ 0.400
SubtreeProcessorRotate/256_per_subtree-4 265.6n 269.8n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 268.1n 268.3n ~ 1.000
SubtreeNodeAddOnly/4_per_subtree-4 53.66n 53.71n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 33.96n 34.19n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 32.98n 33.13n ~ 0.200
SubtreeNodeAddOnly/1024_per_subtree-4 32.27n 32.38n ~ 0.700
SubtreeCreationOnly/4_per_subtree-4 99.27n 123.50n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 466.5n 460.3n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.387µ 1.354µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.462µ 5.206µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.239µ 9.057µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 266.8n 266.7n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 265.2n 268.7n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 10.71m 13.24m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 13.58m 16.29m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 15.82m 20.10m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 18.84m 18.15m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 10.40m 12.69m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 13.64m 16.33m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 17.77m 19.44m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 26.30m 25.78m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 11.90m 11.78m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 13.46m 14.36m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 18.09m 19.20m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 12.03m 13.76m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 13.99m 14.21m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 47.55m 48.04m ~ 0.200
BlockAssembler_AddTx-4 0.02350n 0.02261n ~ 0.700
AddNode-4 9.949 9.974 ~ 1.000
AddNodeWithMap-4 10.341 9.830 ~ 0.100
DiskTxMap_SetIfNotExists-4 3.719µ 3.631µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.394µ 3.414µ ~ 0.700
DiskTxMap_ExistenceOnly-4 313.9n 338.4n ~ 0.100
Queue-4 186.2n 186.9n ~ 0.400
AtomicPointer-4 3.618n 3.628n ~ 0.700
TxMapSetIfNotExists-4 50.16n 49.85n ~ 0.400
TxMapSetIfNotExistsDuplicate-4 41.60n 41.40n ~ 0.700
ChannelSendReceive-4 557.2n 556.9n ~ 1.000
CalcBlockWork-4 478.5n 473.8n ~ 0.700
CalculateWork-4 651.5n 657.2n ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/1000-4 30.92µ 30.73µ ~ 0.200
CheckOldBlockIDs/off-chain-prefetch/1000-4 28.72µ 26.38µ ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/10000-4 223.3µ 224.1µ ~ 0.400
CheckOldBlockIDs/off-chain-prefetch/10000-4 181.2µ 182.4µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 752.9n 776.0n ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 7.180µ 7.384µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 70.05µ 72.46µ ~ 0.100
CatchupWithHeaderCache-4 102.5m 102.6m ~ 0.200
_BufferPoolAllocation/16KB-4 4.078µ 4.167µ ~ 0.200
_BufferPoolAllocation/32KB-4 9.133µ 8.787µ ~ 0.200
_BufferPoolAllocation/64KB-4 21.55µ 21.78µ ~ 0.700
_BufferPoolAllocation/128KB-4 35.79µ 35.93µ ~ 1.000
_BufferPoolAllocation/512KB-4 145.3µ 135.7µ ~ 0.400
_BufferPoolConcurrent/32KB-4 22.27µ 20.71µ ~ 0.400
_BufferPoolConcurrent/64KB-4 32.34µ 31.40µ ~ 0.700
_BufferPoolConcurrent/512KB-4 156.4µ 154.7µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 654.7µ 694.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 652.5µ 743.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 656.6µ 759.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 652.0µ 738.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 624.3µ 661.3µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.28m 37.60m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.09m 37.38m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.21m 37.12m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.24m 37.21m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.97m 37.10m ~ 0.700
_PooledVsNonPooled/Pooled-4 741.6n 738.6n ~ 0.100
_PooledVsNonPooled/NonPooled-4 8.557µ 8.483µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 7.029µ 6.934µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.052µ 9.881µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.879µ 9.470µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.300m 1.326m ~ 0.400
SubtreeSizes/10k_tx_16_per_subtree-4 316.4µ 315.7µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 73.50µ 75.41µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 18.46µ 18.97µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.143µ 9.353µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.541µ 4.634µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.255µ 2.318µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 73.40µ 74.09µ ~ 0.200
BlockSizeScaling/10k_tx_256_per_subtree-4 18.45µ 18.78µ ~ 0.200
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.658µ 4.642µ ~ 1.000
BlockSizeScaling/50k_tx_64_per_subtree-4 389.6µ 385.4µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 90.93µ 93.59µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 22.40µ 23.11µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 155.7µ 155.2µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 161.6µ 160.8µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 318.2µ 322.5µ ~ 1.000
SubtreeAllocations/medium_subtrees_exists_check-4 9.101µ 9.104µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.373µ 9.450µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 18.50µ 18.85µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.176µ 2.218µ ~ 0.400
SubtreeAllocations/large_subtrees_data_fetch-4 2.301µ 2.298µ ~ 1.000
SubtreeAllocations/large_subtrees_full_validation-4 4.583µ 4.773µ ~ 0.100
_prepareTxsPerLevel-4 317.1m 333.1m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.233m 3.307m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 317.2m 330.3m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.167m 3.462m ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 334.2µ 337.5µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 336.6µ 345.7µ ~ 0.200
GetUtxoHashes-4 280.4n 280.2n ~ 0.800
GetUtxoHashes_ManyOutputs-4 47.31µ 49.22µ ~ 1.000
_NewMetaDataFromBytes-4 213.4n 214.6n ~ 0.100
_Bytes-4 401.7n 400.3n ~ 0.700
_MetaBytes-4 140.3n 139.0n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-06-08 13:06 UTC

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — the fix is correct, minimal, and well-documented.

Verified the load-bearing claims independently:

  • Root cause is accurate: under READ COMMITTED the recursive CTE roots at the stale bestBlockID = genesis while q1's WHERE on_main_chain = true statement-snapshot sees the freshly committed block1, so it's wrongly cleared. REPEATABLE READ makes the post-snapshot block fully invisible to the UPDATEs, and StoreBlock already owns its own flag — consistent either way.
  • SQLite no-op confirmed against the pinned modernc/sqlite v1.47.0: newTx (tx.go) ignores opts.Isolation entirely, so LevelRepeatableRead never errors there and the retry loop is a single pass.
  • usql.PgErrSerializationFail/PgErrDeadlockDetected (40001/40P01) match exactly what a REPEATABLE READ write-write conflict raises; dual pgx + lib/pq handling mirrors the existing pattern in util/usql/retry.go.
  • Returning the raw typed driver error on the retry codes is the right call — errors.NewStorageError flattens the cause and would defeat errors.As(*pgconn.PgError). The deferred rollback still fires via the named return.
  • Both other callers (Invalidate/Revalidate) hold slowPathMu and only log the result, so the unwrapped error on exhaustion is benign.

One request before merge: please add at least a unit test for isSerializationRetry. It's pure and trivially testable (the existing util/usql/retry_test.go cases are a ready template) — feed it &pq.Error{Code:"40001"}, &pgconn.PgError{Code:"40P01"}, sql.ErrNoRows, and a plain error. This satisfies the repo's test-first guardrail and locks in the classification logic that the retry path depends on.

Minor / non-blocking:

  • The retry loop re-runs immediately with no backoff, whereas usql.retryOperation uses a 100ms base delay — a small sleep/jitter would match house style.
  • isSerializationRetry partially duplicates the unexported usql.isRetriable; exporting a narrow usql.IsSerializationConflict(err) later would give both sites one source of truth.

…y logging

Address review on bsv-blockchain#1044:

- Add TestIsSerializationRetry (Simon's pre-merge request): table-driven
  coverage of the 40001/40P01 classification across both pgx (*pgconn.PgError)
  and lib/pq (*pq.Error) drivers, plus wrapped errors, non-retryable SQLSTATEs,
  sql.ErrNoRows, a plain error, and nil. Pure, no DB.

- Fix the retry loop's misleading log (Copilot): "retrying" no longer fires on
  the final attempt where no retry follows; a distinct one-shot log records
  exhaustion instead. The exhausted error is still returned raw (unwrapped) so
  callers can classify it via errors.As — wrapping would flatten the typed
  *pgconn.PgError.

- Add context-aware exponential backoff (100ms, 200ms) between attempts to match
  util/usql house style, aborting early on ctx cancellation.
…forbidigo)

golangci forbidigo forbids fmt.Errorf repo-wide. The wrapped-error case needs a
chain errors.As can traverse (the teranode errors package flattens the typed
*pgconn.PgError cause), so use a local Unwrap helper; the plain-error case uses
the sql.ErrConnDone stdlib sentinel. No behaviour change.
@oskarszoon oskarszoon requested a review from icellan June 8, 2026 12:52
@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
72.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@oskarszoon oskarszoon enabled auto-merge (squash) June 8, 2026 13:29
@oskarszoon oskarszoon merged commit 83807f8 into bsv-blockchain:main Jun 8, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants