fix(utxopersister): terminate previous-set read at the 16-byte footer#985
Conversation
CreateUTXOSet appends a 16-byte footer (txCount + utxoCount) after the final UTXOWrapper, but the OUTER loop that reads a previous block's UTXO set during consolidation only broke on a clean io.EOF. With the footer present, the next txid read came up 16 bytes short and returned an unexpected-EOF, which the loop treated as fatal: STORAGE_ERROR: error reading previous utxo-set (...) at iteration 1 -> failed to read txid, expected 32 bytes got 16 -> unexpected EOF That killed the UTXOPersister service and, via errgroup, the whole core sidecar on any non-genesis consolidation. The path was dead code until the verifyLastSet/CreateUTXOSet magic double-read fixes (bsv-blockchain#971) made it reachable, so this latent bug only surfaced afterwards. Break the loop on io.ErrUnexpectedEOF as well, mirroring the reader in cmd/utxovalidator. Add a unit test that stages a realistic previous set (header + one wrapper + footer); it reproduces the exact crash without the fix and passes with it. The pre-existing previous-set test passed only because it staged zero wrappers and no footer, hitting a clean io.EOF.
|
🤖 Claude Code Review Status: Complete Current Review: Minor documentation inaccuracy found: The inline comment at Analysis:
No blocking issues. The code change is sound; the comment is just slightly overstated. |
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-29 14:48 UTC |
oskarszoon
left a comment
There was a problem hiding this comment.
No blockers. Fix is correct and the regression test is faithful — verified locally it fails without the change (exact production error) and passes with it; full package test + go vet green.
Before merge — comment is misleading (services/utxopersister/UTXOSet.go:635-647):
- "the wrapper stream carries no record count" — the 16-byte footer you skip past is
txCount+utxoCount(written at L701-714). The count isn't absent; the reader just doesn't consult it. - "mirroring …
cmd/utxovalidator…errors.Is" —cmd/utxovalidator/utxo_validator.go:127usesstrings.Contains(errStr, "unexpected EOF"), and teranode'serrors.Ishere reduces to the same substring match:New()(errors/errors.go:334-336) drops the non-*Errorcause, so(*Error).Isfalls through tostrings.Contains(e.Error(), "unexpected EOF"). It's the same fragile pattern, not a cleaner/structural one — reword so the next person doesn't trust a match that isn't there. (Side note:errors.Is(err, io.EOF)also returns true on this footer error, since"EOF"is a substring of"unexpected EOF"— a footgun if anyone later refactors the first clause.)
Follow-up (don't block this PR): the footer carries txCount — validate records-read == txCount on the terminating short read, in the consolidation loop, the seeder, and utxovalidator, and have FromReader return a real io.ErrUnexpectedEOF / typed sentinel so the match is structural rather than string-based. Today a truncated .utxo-set is silently accepted (tail dropped) on all three readers — this is pre-existing and the seeder already behaves this way (its errors.Is(err, io.EOF) matches the footer via the same substring quirk), so the PR doesn't worsen it, but the footer makes it cheaply detectable. Realistic impact is a seeded node stalling on a missing UTXO, recoverable by re-seed — not double-spend acceptance.
Nit: L632 "Read the next 36 bytes…" — a txid is 32 bytes (pre-existing, sits right above the change).
ordishs
left a comment
There was a problem hiding this comment.
Approve. Tight, well-tested, minimal fix.
Verified the correctness hinge: the footer-hit error is a StorageError wrapping io.ErrUnexpectedEOF (UTXO.go:218), so errors.Is (not ==) is required — and the PR uses it correctly. Both match paths hold: the stdlib walk reaches io.ErrUnexpectedEOF via Error.Unwrap(), and the custom Error.Is matches via its strings.Contains fallback. The err == io.EOF || errors.Is(err, io.ErrUnexpectedEOF) asymmetry is intentional and correct given NewUTXOWrapperFromReader returns a bare io.EOF at a clean boundary but a wrapped error on a short read.
Ran the new test locally: go test -race -run TestCreateUTXOSet_PreviousSetWithFooterTerminatesCleanly ./services/utxopersister/ → ok. The test stages the realistic on-disk shape (header + real wrapper.Bytes() + 16-byte footer) and its docstring explains why the pre-existing magic-double-read test didn't catch this.
One follow-up worth filing (non-blocking): the 16-byte footer is txCount + utxoCount, so a stronger version could read those bytes on ErrUnexpectedEOF and assert txCount equals the wrappers consumed — distinguishing the footer from a genuinely truncated final record and closing the one residual silent-truncation gap the PR acknowledges.



Problem
CreateUTXOSetwrites a 16-byte footer (txCount+utxoCount, both little-endianuint64) after the finalUTXOWrapperin every UTXO-set file. But the OUTER loop that reads a previous block's UTXO set during consolidation (services/utxopersister/UTXOSet.go) only broke out on a cleanio.EOF. With the footer present, the next txid read came up 16 bytes short andio.ReadFullreturnedio.ErrUnexpectedEOF, which the loop treated as fatal:That error propagates out of the UTXOPersister service start and, via the errgroup, tears down the entire
coresidecar on any non-genesis consolidation.Why it surfaced now
This OUTER loop was dead code for non-genesis previous sets —
verifyLastSetcrashed earlier on the fileformat-magic double-read. #971 fixed that crash, which made this path reachable for the first time and exposed the latent footer bug. Found by the split-per-service chaos harness: a node receiving peer blocks while itsblockassemblyis down triggers the consolidation read, and thecoresidecar dies with RPC connection-refused.Fix
Break the loop when the wrapper read short-reads the footer, in addition to the clean
io.EOFcase.The short read is matched by substring (
"unexpected EOF"), and the comment is explicit that this is not a structural match: teranode'serrors.Newflattens a non-*Errorcause to its message string (errors/errors.go:334-336), discarding theio.ErrUnexpectedEOFsentinel, soerrors.Is(err, io.ErrUnexpectedEOF)would itself reduce to the samestrings.Contains. This is intentionally narrower thancmd/utxovalidator(which also matches"failed to read txid"), so a real non-EOF read error stays loud rather than being swallowed. The comment also warns not to fold theio.EOFclause intoerrors.Is—"EOF"is a substring of"unexpected EOF"and would swallow the footer error too.Known limitation (pre-existing, not worsened here): a genuinely truncated tail is indistinguishable from the footer and is silently accepted — same as
cmd/utxovalidatortoday. A structural fix —FromReaderreturning a typed sentinel, plus validatingrecords-read == txCountagainst the footer across the consolidation loop, the seeder, andutxovalidator— is a follow-up. Realistic impact of the limitation is a seeded node stalling on a missing UTXO (recoverable by re-seed), not double-spend acceptance.Test
Added
TestCreateUTXOSet_PreviousSetWithFooterTerminatesCleanly, which stages a realistic previous set (68-byte header + one wrapper + 16-byte footer). It reproduces the exact production error without the fix and passes with it.The pre-existing previous-set test passed only because it staged zero wrappers and no footer, hitting a clean
io.EOFand never exercising the footer path.Verification
go test ./services/utxopersister/(full package) and-race— passgo vet,gofmt,gci— cleangot 16 -> unexpected EOF) and passes with itscenario_04, currently skipped) with the skip removed locally — full run passed (node stalls whileblockassemblyis down, catches up after restart, all three nodes converge). Un-skipping that scenario is a follow-up PR.