Skip to content

fix(blockchain): assign block ids idempotently per hash to prevent phantom-id sync wedge#1043

Merged
oskarszoon merged 20 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/blockchain-phantom-block-id
Jun 8, 2026
Merged

fix(blockchain): assign block ids idempotently per hash to prevent phantom-id sync wedge#1043
oskarszoon merged 20 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/blockchain-phantom-block-id

Conversation

@oskarszoon

@oskarszoon oskarszoon commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Both block-ingestion paths (legacy netsync, blockvalidation catchup quick-validate) now resolve a block hash to ONE stable id via an idempotent AssignBlockID gRPC call, instead of independently calling GetNextBlockID and minting two ids for the same block.
  • CheckBlockIsInCurrentChain's in-memory fast path now treats the off-chain set as a negative filter only and confirms survivors against the authoritative on_main_chain flag, so a phantom / id-gap id can't be accepted as on-chain by a useInMemoryChainCheck=on node when an always-SQL node would reject it.

Why

A fresh testnet node wedged during IBD around height 1038100: a block was marked invalid by checkOldBlockIDs because a transaction spent an output whose UTXO mined-in block id had no row in the blocks table — a phantom id. Once a block is invalid it is cached, so the tip can never advance.

Root cause: below the highest checkpoint, two ingestion paths process the same block concurrently and each allocated an independent id from the Postgres sequence — legacy netsync wrote the UTXO mined-info under one id before commit, blockvalidation catchup committed the block row under a different id. The UTXO mined-info id and the committed block-row id diverged; the orphaned id is the phantom that later fails the on-chain check and wedges the tip.

AssignBlockID(ctx, hash) is the single idempotent authority both paths now use (committed-id lookup → in-memory reservation → nextval). Same hash always returns the same id; concurrent callers converge. The UTXO mined-info id and the committed row can no longer diverge. The chain-check change makes the toggled-on in-memory path agree with the authoritative SQL route on dangling ids.

Scope

The core change is the phantom-block-id fix (idempotent AssignBlockID + the Option C CheckBlockIsInCurrentChain consistency fix). The diff also carries a few changes that travelled with this work — calling them out for accurate scope (all on the block-ingestion / catchup path):

No peer-registry / FSM-state / catchup-orchestration changes — the sync-lease (legacy-priority coordination) was deliberately dropped.

Not in this PR

  • No sync-lease / catchup coordination. Idempotent ids make concurrent ingestion safe — it is just wasteful during IBD. The redundant double-work is an efficiency concern handled separately by the delegated-catchup + central-registry-poller orchestration work, not here.
  • checkOldBlockIDsOffChainPrefetch keeps the same latent survivor-accept residual (documented inline); fixing it obsoletes the prefetch optimization, a separate perf/design call. Phantom creation is already prevented here. Tracked as blockvalidation: off-chain-prefetch fast path can accept an id-gap block id the SQL route rejects #1055.

Follow-ups (tracked, non-blocking)

Test plan

  • go build ./... clean; go test -race -tags testtxmetacache green across stores/blockchain/..., services/blockchain/..., services/legacy/netsync/..., services/blockvalidation/...
  • Two-path race test: concurrent assigns for one hash yield a single id, no phantom row (AssignBlockID_test.go)
  • CheckBlockIsInCurrentChain: in-memory vs SQL agree on a phantom / id-gap id (CheckBlockIsInCurrentChain_test.go)

oskarszoon added 13 commits June 5, 2026 15:59
…warn on ambiguous block-id reuse

Addresses consensus-review findings: reservation entries now expire a fixed
time after creation regardless of lookups (M1), and reuseBlockIDFromUTXO logs
when a tx carries multiple mined-in ids so an invariant break surfaces (L2).
…on_main_chain

Closes the toggled/untoggled consensus split (Option C): the in-memory
CheckBlockIsInCurrentChain off-chain-set filter is a fast NEGATIVE only; a
non-existent (phantom) id <= maxBlockID survived it and was wrongly treated as
on-chain, disagreeing with the authoritative SQL route. Now survivors are
confirmed against on_main_chain (indexed fast path), so a dangling id is
rejected identically on toggled and untoggled nodes.
…t (Option C residual)

The store-level CheckBlockIsInCurrentChain is fixed to confirm survivors against
on_main_chain. The toggle-on off-chain-prefetch fast path still treats an
absent-from-off-chain id as on-chain, so it retains the latent split for a
dangling id. Making it sound removes its positive short-circuit (obsoleting a
profiled optimization) — left as a documented team decision. Phantom creation is
already prevented by the idempotent AssignBlockID fix, so this is latent.
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. This PR implements a robust solution to the phantom block ID problem with comprehensive coverage:

Core Changes:

  • AssignBlockID (stores/blockchain/sql/assign_block_id.go:22): Idempotent block ID assignment prevents phantom IDs through hash-keyed reservations with proper double-check locking pattern
  • CheckBlockIsInCurrentChain (stores/blockchain/sql/CheckBlockIsInCurrentChain.go:24): Now confirms survivors against authoritative on_main_chain flag, preventing phantom IDs from being accepted as on-chain

Test Coverage:

  • Comprehensive unit tests verify idempotency (assign_block_id_test.go:16-41)
  • Concurrent access correctly converges to single ID (assign_block_id_test.go:43-72)
  • Two-path race simulation confirms no phantom creation (assign_block_id_test.go:103-141)
  • Phantom ID detection verified in CheckBlockIsInCurrentChain tests (CheckBlockIsInCurrentChain_test.go:135-161)

Integration:

  • Both ingestion paths (legacy netsync, blockvalidation catchup) correctly use AssignBlockID
  • Proper error handling with storage errors surfaced rather than silently minting IDs
  • Safe uint64→uint32 narrowing with explicit overflow checks

Quality:

  • Follows AGENTS.md verification requirements
  • Clear documentation explains the idempotency mechanism
  • Appropriate use of locking for critical sections
  • TTL-based reservation cleanup prevents memory leaks

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1043 (89b6c8f)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.626µ 1.615µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.08n 71.26n ~ 0.200
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.24n 71.04n ~ 0.200
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 70.97n 70.93n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.46n 32.60n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 56.17n 55.02n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 126.9n 127.3n ~ 0.400
MiningCandidate_Stringify_Short-4 218.8n 218.5n ~ 0.600
MiningCandidate_Stringify_Long-4 1.627µ 1.623µ ~ 0.300
MiningSolution_Stringify-4 860.7n 857.3n ~ 0.100
BlockInfo_MarshalJSON-4 1.724µ 1.717µ ~ 0.100
NewFromBytes-4 130.0n 154.2n ~ 0.100
AddTxBatchColumnar_Validation-4 2.422µ 2.397µ ~ 0.400
OffsetValidationLoop-4 637.9n 639.5n ~ 0.800
Mine_EasyDifficulty-4 68.70µ 67.54µ ~ 0.200
Mine_WithAddress-4 7.010µ 7.954µ ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 57.44n 57.15n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 30.09n 29.94n ~ 0.400
DirectSubtreeAdd/256_per_subtree-4 29.06n 29.12n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 27.73n 27.94n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 27.28n 27.57n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 287.8n 279.8n ~ 0.200
SubtreeProcessorAdd/64_per_subtree-4 282.2n 273.0n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 270.8n 267.7n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 271.2n 270.4n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 273.0n 272.6n ~ 1.000
SubtreeProcessorRotate/4_per_subtree-4 277.4n 274.3n ~ 0.400
SubtreeProcessorRotate/64_per_subtree-4 276.4n 272.3n ~ 0.200
SubtreeProcessorRotate/256_per_subtree-4 280.2n 270.3n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 275.3n 269.2n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.26n 53.74n ~ 0.200
SubtreeNodeAddOnly/64_per_subtree-4 34.04n 34.27n ~ 0.400
SubtreeNodeAddOnly/256_per_subtree-4 33.13n 33.38n ~ 0.400
SubtreeNodeAddOnly/1024_per_subtree-4 32.49n 32.61n ~ 0.700
SubtreeCreationOnly/4_per_subtree-4 100.9n 117.3n ~ 0.400
SubtreeCreationOnly/64_per_subtree-4 460.9n 445.5n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.397µ 1.350µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.536µ 5.230µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.189µ 9.091µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 273.7n 276.2n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 276.2n 278.7n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 11.07m 13.58m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 14.01m 16.19m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 16.90m 18.49m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 18.20m 22.60m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 10.68m 14.94m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 14.47m 19.16m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 21.74m 20.64m ~ 0.400
SequentialGetAndSetIfNotExists/100k_nodes-4 30.25m 28.01m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 13.04m 12.79m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 14.29m 14.20m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 19.93m 19.67m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 12.91m 15.67m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 15.92m 15.09m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 71.08m 50.86m ~ 0.100
BlockAssembler_AddTx-4 0.02234n 0.02301n ~ 1.000
AddNode-4 9.960 10.100 ~ 1.000
AddNodeWithMap-4 10.36 10.69 ~ 1.000
DiskTxMap_SetIfNotExists-4 4.123µ 4.497µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.673µ 3.846µ ~ 0.100
DiskTxMap_ExistenceOnly-4 326.7n 357.0n ~ 0.100
Queue-4 198.8n 202.7n ~ 0.200
AtomicPointer-4 8.120n 8.130n ~ 1.000
TxMapSetIfNotExists-4 52.13n 52.86n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 48.21n 48.12n ~ 0.400
ChannelSendReceive-4 646.2n 642.3n ~ 0.400
CalcBlockWork-4 470.0n 476.2n ~ 0.200
CalculateWork-4 657.8n 702.4n ~ 0.400
CheckOldBlockIDs/on-chain-prefetch/1000-4 65.87µ 65.24µ ~ 0.200
CheckOldBlockIDs/off-chain-prefetch/1000-4 54.34µ 64.65µ ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/10000-4 467.2µ 491.7µ ~ 0.100
CheckOldBlockIDs/off-chain-prefetch/10000-4 355.4µ 378.8µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.342µ 1.478µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 12.90µ 14.12µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 127.2µ 136.0µ ~ 0.100
CatchupWithHeaderCache-4 104.8m 105.5m ~ 0.100
_BufferPoolAllocation/16KB-4 4.159µ 4.545µ ~ 0.100
_BufferPoolAllocation/32KB-4 9.216µ 10.218µ ~ 0.100
_BufferPoolAllocation/64KB-4 21.56µ 18.96µ ~ 0.700
_BufferPoolAllocation/128KB-4 36.09µ 39.12µ ~ 0.100
_BufferPoolAllocation/512KB-4 139.8µ 140.5µ ~ 1.000
_BufferPoolConcurrent/32KB-4 20.49µ 24.73µ ~ 0.200
_BufferPoolConcurrent/64KB-4 32.96µ 34.28µ ~ 0.100
_BufferPoolConcurrent/512KB-4 157.3µ 155.4µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 687.5µ 632.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 720.4µ 623.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 726.2µ 629.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 740.3µ 616.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 663.1µ 620.6µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.40m 36.84m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.47m 36.95m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.17m 36.85m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.29m 36.69m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 37.32m 36.62m ~ 0.400
_PooledVsNonPooled/Pooled-4 741.7n 740.4n ~ 0.400
_PooledVsNonPooled/NonPooled-4 8.829µ 8.565µ ~ 0.400
_MemoryFootprint/Current_512KB_32concurrent-4 6.933µ 6.895µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.803µ 9.664µ ~ 0.400
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.535µ 9.271µ ~ 0.100
_prepareTxsPerLevel-4 410.0m 402.7m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.946m 3.615m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 397.6m 403.5m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.818m 3.709m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.363m 1.414m ~ 0.400
SubtreeSizes/10k_tx_16_per_subtree-4 318.3µ 325.6µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 77.07µ 78.91µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 19.31µ 19.97µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.603µ 9.714µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.701µ 4.925µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.352µ 2.428µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 76.29µ 77.70µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 18.97µ 19.50µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.743µ 4.861µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 388.6µ 396.5µ ~ 0.200
BlockSizeScaling/50k_tx_256_per_subtree-4 93.67µ 97.46µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.26µ 24.54µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 153.8µ 156.0µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 164.7µ 169.7µ ~ 0.200
SubtreeAllocations/small_subtrees_full_validation-4 314.3µ 323.8µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 9.506µ 9.529µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 10.08µ 10.41µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.81µ 20.24µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.322µ 2.350µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.475µ 2.572µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.871µ 5.036µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 329.0µ 330.7µ ~ 0.400
StoreBlock_Sequential/AboveCSVHeight-4 339.9µ 327.8µ ~ 0.100
GetUtxoHashes-4 280.9n 282.0n ~ 0.800
GetUtxoHashes_ManyOutputs-4 46.06µ 46.75µ ~ 1.000
_NewMetaDataFromBytes-4 212.4n 217.8n ~ 0.100
_Bytes-4 396.2n 398.3n ~ 0.700
_MetaBytes-4 139.3n 139.3n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-06-08 14:44 UTC

…over DB-error path

Follow the documented snake_case file convention (codingConventions.md:118);
old CamelCase siblings are tolerated legacy. Add a DB-error test covering
AssignBlockID's storage-error return (AssignBlockID 76.5%->82.4%,
blockIDByHash->100%).
…cal client)

Server.AssignBlockID 0%->88.9% (success/idempotent/invalid-hash),
Client.AssignBlockID 0%->100%, LocalClient.AssignBlockID 0%->100% — the
new-code coverage gap flagged by Sonar on the gRPC surface.
@oskarszoon oskarszoon requested review from icellan and ordishs June 8, 2026 12:53

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with follow-ups.

The central fix is correct and verified locally: go build ./... clean, and go test -race -tags testtxmetacache green across stores/blockchain/..., services/blockchain/..., services/legacy/netsync/..., and services/blockvalidation/.... The core invariant holds — StoreBlock commits under WithID(block.ID) with no second nextval, and block.ID is sourced from AssignBlockID/UTXO-recovery, so the UTXO mined-info id and the committed row id are provably the same. The mutex correctly serializes check-then-reserve with a post-lock committed re-check, and the reservation is dropped only after commit. New tests are meaningful (two-path race → single committed id, concurrent convergence, cleared-on-commit, DB-error, phantom-below-maxID rejection).

Non-blocking follow-ups:

  • (HIGH, latent) off-chain-prefetch fastPath split. checkOldBlockIDsOffChainPrefetch still treats id <= maxBlockID && id not-in off-chain-set as on-chain (the documented "Option C residual"). An id-sequence gap (reserved-but-never-committed → TTL expiry/crash) is reachable independently of the double-mint path this PR closes; if a tx's BlockIDs ever references a gap id, a UseInMemoryChainCheck=on node would accept what an =off node rejects. Phantom creation is closed, but this residual is the one place the fix is left incomplete.
  • (MEDIUM) reservation durability. Convergence holds only while the reservation is live or committed. If single-block processing exceeds the 10-min TTL, a racing caller can take the nextval branch before the first attempt writes UTXO mined-info (assign precedes createUtxos), reintroducing the split — rarer, not eliminated. Size the TTL above worst-case single-block processing, or land the block_id_reservations table already noted as a follow-up.
  • (MEDIUM) transient-error retry loop. Reclassifying ErrTxMissingParent/ErrTxNotFound as BlockIncomplete is the right call (never persists invalid), but a genuinely-invalid block spending a never-existed output now retries indefinitely rather than fast-rejecting. Consider a bounded retry/backoff so a crafted block can't pin a catchup worker.
  • (LOW) uint64→uint32 truncation at the block.ID call sites has no math.MaxUint32 guard; cheap insurance against a silent wrap breaking idempotency.
  • (LOW) reuseBlockIDFromUTXO trusts BlockIDs[0] on multi-id (reorg/re-mine) txs — guarded and logged, but the one spot the one-id-per-hash invariant is taken on faith.

Process note: the PR description says "correctness fix only ... no other changes," but the diff also includes the #1031 transient-error reclassification, a sync-peer updateLastBlockTime() refresh, and a sizable checkOldBlockIDs refactor. All look correct and related — worth aligning the description so the scope is accurate.

Replace the 6 unchecked uint32(id) // nolint:gosec narrowings of the
AssignBlockID result with safeconversion.Uint64ToUint32 (via a small
blockIDToUint32 helper in quick_validate.go), erroring on overflow rather
than silently wrapping — a wrap would alias another block's id and break
the one-id-per-hash idempotency. Addresses review note from bsv-blockchain#1043.
@oskarszoon

Copy link
Copy Markdown
Contributor Author

Addressed @ordishs's review (07b224f3e213d1ec0a):

In this PR:

Filed as tracked follow-ups (non-blocking, per the review):

The reuseBlockIDFromUTXO BlockIDs[0]-on-multi-id note is already guarded and logged — leaving as-is.

@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
78.6% Coverage on New Code (required ≥ 80%)
4.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@oskarszoon oskarszoon merged commit 2becc73 into bsv-blockchain:main Jun 8, 2026
34 checks passed
oskarszoon added a commit to oskarszoon/teranode that referenced this pull request Jun 8, 2026
…-legacysyncing

Conflict in the generated services/blockchain/blockchain_api/blockchain_api.pb.go
(upstream bsv-blockchain#1043 added the AssignBlockID RPC; this branch removed LegacySync).
Resolved by regenerating from the cleanly-merged .proto via `make gen` — both
AssignBlockID and the LegacySync/LEGACYSYNCING removal (reserved 3) are present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants