Skip to content

fix(utxopersister): terminate previous-set read at the 16-byte footer#985

Merged
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/utxopersister-consolidation-footer
May 31, 2026
Merged

fix(utxopersister): terminate previous-set read at the 16-byte footer#985
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/utxopersister-consolidation-footer

Conversation

@liam

@liam liam commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Problem

CreateUTXOSet writes a 16-byte footer (txCount + utxoCount, both little-endian uint64) after the final UTXOWrapper in every UTXO-set file. But the OUTER loop that reads a previous block's UTXO set during consolidation (services/utxopersister/UTXOSet.go) only broke out on a clean io.EOF. With the footer present, the next txid read came up 16 bytes short and io.ReadFull returned io.ErrUnexpectedEOF, which the loop treated as fatal:

STORAGE_ERROR (69): error reading previous utxo-set (...) at iteration 1
  -> STORAGE_ERROR (69): failed to read txid, expected 32 bytes got 16
  -> UNKNOWN (0): unexpected EOF

That error propagates out of the UTXOPersister service start and, via the errgroup, tears down the entire core sidecar on any non-genesis consolidation.

Why it surfaced now

This OUTER loop was dead code for non-genesis previous sets — verifyLastSet crashed earlier on the fileformat-magic double-read. #971 fixed that crash, which made this path reachable for the first time and exposed the latent footer bug. Found by the split-per-service chaos harness: a node receiving peer blocks while its blockassembly is down triggers the consolidation read, and the core sidecar dies with RPC connection-refused.

Fix

Break the loop when the wrapper read short-reads the footer, in addition to the clean io.EOF case.

The short read is matched by substring ("unexpected EOF"), and the comment is explicit that this is not a structural match: teranode's errors.New flattens a non-*Error cause to its message string (errors/errors.go:334-336), discarding the io.ErrUnexpectedEOF sentinel, so errors.Is(err, io.ErrUnexpectedEOF) would itself reduce to the same strings.Contains. This is intentionally narrower than cmd/utxovalidator (which also matches "failed to read txid"), so a real non-EOF read error stays loud rather than being swallowed. The comment also warns not to fold the io.EOF clause into errors.Is"EOF" is a substring of "unexpected EOF" and would swallow the footer error too.

Known limitation (pre-existing, not worsened here): a genuinely truncated tail is indistinguishable from the footer and is silently accepted — same as cmd/utxovalidator today. A structural fix — FromReader returning a typed sentinel, plus validating records-read == txCount against the footer across the consolidation loop, the seeder, and utxovalidator — is a follow-up. Realistic impact of the limitation is a seeded node stalling on a missing UTXO (recoverable by re-seed), not double-spend acceptance.

Test

Added TestCreateUTXOSet_PreviousSetWithFooterTerminatesCleanly, which stages a realistic previous set (68-byte header + one wrapper + 16-byte footer). It reproduces the exact production error without the fix and passes with it.

The pre-existing previous-set test passed only because it staged zero wrappers and no footer, hitting a clean io.EOF and never exercising the footer path.

Verification

  • go test ./services/utxopersister/ (full package) and -race — pass
  • go vet, gofmt, gci — clean
  • Test-first: confirmed the test fails without the fix (exact got 16 -> unexpected EOF) and passes with it
  • End-to-end: rebuilt the image with this fix and ran the split-per-service chaos scenario (scenario_04, currently skipped) with the skip removed locally — full run passed (node stalls while blockassembly is down, catches up after restart, all three nodes converge). Un-skipping that scenario is a follow-up PR.

CreateUTXOSet appends a 16-byte footer (txCount + utxoCount) after the
final UTXOWrapper, but the OUTER loop that reads a previous block's UTXO
set during consolidation only broke on a clean io.EOF. With the footer
present, the next txid read came up 16 bytes short and returned an
unexpected-EOF, which the loop treated as fatal:

  STORAGE_ERROR: error reading previous utxo-set (...) at iteration 1
  -> failed to read txid, expected 32 bytes got 16 -> unexpected EOF

That killed the UTXOPersister service and, via errgroup, the whole core
sidecar on any non-genesis consolidation. The path was dead code until
the verifyLastSet/CreateUTXOSet magic double-read fixes (bsv-blockchain#971) made it
reachable, so this latent bug only surfaced afterwards.

Break the loop on io.ErrUnexpectedEOF as well, mirroring the reader in
cmd/utxovalidator. Add a unit test that stages a realistic previous set
(header + one wrapper + footer); it reproduces the exact crash without
the fix and passes with it.

The pre-existing previous-set test passed only because it staged zero
wrappers and no footer, hitting a clean io.EOF.
@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:

Minor documentation inaccuracy found:

The inline comment at UTXOSet.go:641 states the fix "mirrors the reader in cmd/utxovalidator", but the validator actually uses string matching (strings.Contains(errStr, "unexpected EOF")) rather than errors.Is(err, io.ErrUnexpectedEOF). The fix here is correct and uses the proper Go pattern; consider updating the comment to say "improves upon" or filing a follow-up to fix the validator's brittle string matching.

Analysis:

  • Logic: Correct. The fix properly handles both io.EOF and io.ErrUnexpectedEOF to terminate the loop when reading the 16-byte footer
  • Test coverage: Excellent. The new test stages a realistic previous set with header + wrapper + footer and would reproduce the production crash without the fix
  • Error handling: Appropriate. The distinction between clean EOF and unexpected EOF at footer boundaries is correctly handled
  • Root cause: Well explained in the PR description—latent bug exposed by previous fix in fix(utxopersister): drop double-read of fileformat magic in verifyLastSet #971

No blocking issues. The code change is sound; the comment is just slightly overstated.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-985 (a1e813d)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.814µ 1.731µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.62n 61.61n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.41n 61.38n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.74n 61.56n ~ 0.300
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 29.98n 30.29n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 50.59n 50.53n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 106.4n 104.5n ~ 0.700
MiningCandidate_Stringify_Short-4 260.2n 260.9n ~ 0.400
MiningCandidate_Stringify_Long-4 1.898µ 1.919µ ~ 0.400
MiningSolution_Stringify-4 966.2n 971.6n ~ 0.200
BlockInfo_MarshalJSON-4 1.760µ 1.768µ ~ 0.700
NewFromBytes-4 132.6n 165.2n ~ 0.700
AddTxBatchColumnar_Validation-4 2.577µ 2.533µ ~ 0.700
OffsetValidationLoop-4 598.7n 599.6n ~ 0.800
Mine_EasyDifficulty-4 65.77µ 65.86µ ~ 0.700
Mine_WithAddress-4 7.085µ 7.469µ ~ 0.200
BlockAssembler_AddTx-4 0.02642n 0.02715n ~ 1.000
AddNode-4 10.57 10.59 ~ 0.400
AddNodeWithMap-4 11.57 10.95 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 58.87n 59.76n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 30.19n 30.21n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 28.92n 29.06n ~ 0.600
DirectSubtreeAdd/1024_per_subtree-4 27.86n 27.87n ~ 0.500
DirectSubtreeAdd/2048_per_subtree-4 27.46n 27.53n ~ 0.400
SubtreeProcessorAdd/4_per_subtree-4 284.2n 279.5n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 278.7n 282.0n ~ 0.400
SubtreeProcessorAdd/256_per_subtree-4 279.7n 281.5n ~ 0.200
SubtreeProcessorAdd/1024_per_subtree-4 270.4n 271.3n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 270.0n 272.5n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 273.7n 277.7n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 273.0n 275.4n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 274.3n 276.9n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 272.9n 274.2n ~ 0.200
SubtreeNodeAddOnly/4_per_subtree-4 54.30n 54.60n ~ 0.200
SubtreeNodeAddOnly/64_per_subtree-4 34.24n 34.22n ~ 0.800
SubtreeNodeAddOnly/256_per_subtree-4 33.33n 33.32n ~ 0.500
SubtreeNodeAddOnly/1024_per_subtree-4 32.63n 32.58n ~ 0.700
SubtreeCreationOnly/4_per_subtree-4 113.7n 114.6n ~ 0.200
SubtreeCreationOnly/64_per_subtree-4 396.8n 395.0n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.357µ 1.352µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.454µ 4.438µ ~ 0.700
SubtreeCreationOnly/2048_per_subtree-4 7.983µ 8.129µ ~ 0.200
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 272.6n 274.2n ~ 0.200
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 274.1n 272.2n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.212m 2.252m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.525m 5.677m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.469m 7.636m ~ 0.700
ParallelGetAndSetIfNotExists/100k_nodes-4 10.16m 10.30m ~ 0.200
SequentialGetAndSetIfNotExists/1k_nodes-4 1.942m 1.957m ~ 0.200
SequentialGetAndSetIfNotExists/10k_nodes-4 4.323m 4.462m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 12.13m 12.32m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 22.07m 22.23m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.307m 2.298m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.615m 8.498m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.56m 13.26m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.988m 2.023m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.095m 7.590m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 41.92m 41.47m ~ 0.700
DiskTxMap_SetIfNotExists-4 3.698µ 3.653µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.398µ 3.403µ ~ 0.500
DiskTxMap_ExistenceOnly-4 314.2n 308.9n ~ 0.600
Queue-4 185.3n 185.7n ~ 1.000
AtomicPointer-4 3.755n 3.746n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 854.5µ 823.2µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 827.9µ 750.2µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 107.7µ 117.3µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.45µ 63.86µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 59.08µ 55.48µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.08µ 10.91µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.937µ 4.513µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.920µ 1.531µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.417m 9.259m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.207m 9.167m ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.088m 1.175m ~ 0.200
ReorgOptimizations/AllMarkFalse/New/100K-4 704.8µ 703.0µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 578.7µ 618.2µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 202.9µ 202.0µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 49.37µ 48.37µ ~ 0.700
ReorgOptimizations/NodeFlags/New/100K-4 16.77µ 15.88µ ~ 0.200
TxMapSetIfNotExists-4 49.37n 49.37n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 41.11n 41.32n ~ 0.100
ChannelSendReceive-4 584.6n 591.6n ~ 0.200
CalcBlockWork-4 469.8n 474.4n ~ 0.200
CalculateWork-4 695.2n 646.6n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.359µ 1.363µ ~ 0.800
BuildBlockLocatorString_Helpers/Size_100-4 12.95µ 13.08µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_1000-4 148.6µ 150.5µ ~ 0.700
CatchupWithHeaderCache-4 104.5m 104.6m ~ 0.700
_prepareTxsPerLevel-4 502.4m 520.1m ~ 0.700
_prepareTxsPerLevelOrdered-4 4.525m 4.785m ~ 0.400
_prepareTxsPerLevel_Comparison/Original-4 496.6m 482.1m ~ 1.000
_prepareTxsPerLevel_Comparison/Optimized-4 4.499m 4.395m ~ 1.000
SubtreeSizes/10k_tx_4_per_subtree-4 1.387m 1.360m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 335.2µ 325.4µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 78.45µ 77.34µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 19.63µ 19.17µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.861µ 9.655µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.885µ 4.811µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.396µ 2.388µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 77.18µ 77.45µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 19.33µ 19.42µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.749µ 4.873µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 405.2µ 400.6µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 96.92µ 95.70µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.04µ 23.85µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 161.6µ 161.5µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 167.3µ 167.8µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 336.9µ 334.7µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.548µ 9.604µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.804µ 9.800µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 19.40µ 19.52µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.310µ 2.323µ ~ 1.000
SubtreeAllocations/large_subtrees_data_fetch-4 2.383µ 2.381µ ~ 1.000
SubtreeAllocations/large_subtrees_full_validation-4 4.910µ 4.867µ ~ 0.400
_BufferPoolAllocation/16KB-4 3.912µ 3.913µ ~ 1.000
_BufferPoolAllocation/32KB-4 9.085µ 10.758µ ~ 0.400
_BufferPoolAllocation/64KB-4 22.37µ 18.59µ ~ 0.200
_BufferPoolAllocation/128KB-4 37.14µ 36.13µ ~ 0.200
_BufferPoolAllocation/512KB-4 124.5µ 131.3µ ~ 0.100
_BufferPoolConcurrent/32KB-4 19.51µ 21.86µ ~ 0.100
_BufferPoolConcurrent/64KB-4 30.51µ 30.91µ ~ 0.400
_BufferPoolConcurrent/512KB-4 147.1µ 145.6µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 609.4µ 620.3µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/32KB-4 624.2µ 627.2µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/64KB-4 638.3µ 626.9µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/128KB-4 618.9µ 623.3µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/512KB-4 637.2µ 597.3µ ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.05m 37.07m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.01m 36.99m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.64m 36.99m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.26m 36.53m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.99m 36.24m ~ 0.100
_PooledVsNonPooled/Pooled-4 833.0n 829.8n ~ 0.200
_PooledVsNonPooled/NonPooled-4 8.292µ 8.274µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 7.019µ 6.599µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.924µ 9.453µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.869µ 9.195µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 337.9µ 347.1µ ~ 0.400
StoreBlock_Sequential/AboveCSVHeight-4 339.7µ 356.9µ ~ 0.200
GetUtxoHashes-4 258.1n 258.6n ~ 1.000
GetUtxoHashes_ManyOutputs-4 49.75µ 49.74µ ~ 1.000
_NewMetaDataFromBytes-4 227.4n 228.8n ~ 0.700
_Bytes-4 417.8n 428.7n ~ 0.100
_MetaBytes-4 138.3n 138.6n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-29 14:48 UTC

@liam liam requested review from freemans13, ordishs and oskarszoon May 29, 2026 18:05

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers. Fix is correct and the regression test is faithful — verified locally it fails without the change (exact production error) and passes with it; full package test + go vet green.

Before merge — comment is misleading (services/utxopersister/UTXOSet.go:635-647):

  • "the wrapper stream carries no record count" — the 16-byte footer you skip past is txCount+utxoCount (written at L701-714). The count isn't absent; the reader just doesn't consult it.
  • "mirroring … cmd/utxovalidatorerrors.Is" — cmd/utxovalidator/utxo_validator.go:127 uses strings.Contains(errStr, "unexpected EOF"), and teranode's errors.Is here reduces to the same substring match: New() (errors/errors.go:334-336) drops the non-*Error cause, so (*Error).Is falls through to strings.Contains(e.Error(), "unexpected EOF"). It's the same fragile pattern, not a cleaner/structural one — reword so the next person doesn't trust a match that isn't there. (Side note: errors.Is(err, io.EOF) also returns true on this footer error, since "EOF" is a substring of "unexpected EOF" — a footgun if anyone later refactors the first clause.)

Follow-up (don't block this PR): the footer carries txCount — validate records-read == txCount on the terminating short read, in the consolidation loop, the seeder, and utxovalidator, and have FromReader return a real io.ErrUnexpectedEOF / typed sentinel so the match is structural rather than string-based. Today a truncated .utxo-set is silently accepted (tail dropped) on all three readers — this is pre-existing and the seeder already behaves this way (its errors.Is(err, io.EOF) matches the footer via the same substring quirk), so the PR doesn't worsen it, but the footer makes it cheaply detectable. Realistic impact is a seeded node stalling on a missing UTXO, recoverable by re-seed — not double-spend acceptance.

Nit: L632 "Read the next 36 bytes…" — a txid is 32 bytes (pre-existing, sits right above the change).

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Tight, well-tested, minimal fix.

Verified the correctness hinge: the footer-hit error is a StorageError wrapping io.ErrUnexpectedEOF (UTXO.go:218), so errors.Is (not ==) is required — and the PR uses it correctly. Both match paths hold: the stdlib walk reaches io.ErrUnexpectedEOF via Error.Unwrap(), and the custom Error.Is matches via its strings.Contains fallback. The err == io.EOF || errors.Is(err, io.ErrUnexpectedEOF) asymmetry is intentional and correct given NewUTXOWrapperFromReader returns a bare io.EOF at a clean boundary but a wrapped error on a short read.

Ran the new test locally: go test -race -run TestCreateUTXOSet_PreviousSetWithFooterTerminatesCleanly ./services/utxopersister/ → ok. The test stages the realistic on-disk shape (header + real wrapper.Bytes() + 16-byte footer) and its docstring explains why the pre-existing magic-double-read test didn't catch this.

One follow-up worth filing (non-blocking): the 16-byte footer is txCount + utxoCount, so a stronger version could read those bytes on ErrUnexpectedEOF and assert txCount equals the wrappers consumed — distinguishing the footer from a genuinely truncated final record and closing the one residual silent-truncation gap the PR acknowledges.

@liam liam merged commit c60ffea into bsv-blockchain:main May 31, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants