Skip to content

fix(utxo): propagate panic recovery as error in Spend#762

Merged
ordishs merged 5 commits into
bsv-blockchain:mainfrom
ordishs:security/4560-spend-panic-nil-error
May 12, 2026
Merged

fix(utxo): propagate panic recovery as error in Spend#762
ordishs merged 5 commits into
bsv-blockchain:mainfrom
ordishs:security/4560-spend-panic-nil-error

Conversation

@ordishs

@ordishs ordishs commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Closes #4560.

Summary

Both Aerospike and SQL Spend() implementations wrap their body in a defer recover() that logs the panic but does not surface it. With unnamed returns, a recovered panic returns (nil, nil) — success — even though UTXO state may be partially mutated. Validator / ProcessConflicting / reorg paths then proceed as if the spend completed, opening a UTXO-corruption / double-spend window.

Fix

Convert both Spend signatures to named returns and set err in the deferred recover:

  • func (s *Store) Spend(...) (spends []*utxo.Spend, err error)
  • On recover: if err == nil { err = errors.NewProcessingError("panic in Spend: %v", r) }

Applied identically in stores/utxo/aerospike/spend.go and stores/utxo/sql/sql.go. The recover body is extracted into a small handleSpendPanic helper per package so the contract can be exercised directly in unit tests.

Test plan

  • Regression test that triggers a panic inside Spend; asserts the function returns a non-nil error containing "panic".
  • go test -race ./stores/utxo/sql/... passes.
  • (Aerospike if available) go test -race -tags aerospike ./stores/utxo/aerospike/... passes.

@github-actions

github-actions Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:
No critical issues found. The panic recovery implementation is sound:

  • Blocker fixes verified: All three blockers from previous review are addressed (worker goroutine recovery, ERR_UNKNOWN classification, no fmt.Errorf)
  • Dual recovery layers: Both parent and per-worker defer/recover paths correctly propagate panics as errors via named returns
  • Error propagation: Workers store errors in spends[idx].Err, parent collects and returns them after g.Wait()
  • Test coverage: Both stores have end-to-end panic recovery tests plus helper unit tests
  • Stack traces: debug.Stack() included in error logs for post-mortem analysis

Known limitation (documented, acceptable for merge):
Recovered panics return ErrUnknown which does not trigger needsSpendRollback, potentially leaving partial UTXO state. The spend operations are idempotent, and callers can retry. A follow-up could add explicit panic handling to rollback logic if needed.

Security assessment: The fix correctly closes the UTXO corruption window described in #4560 by surfacing panics instead of returning silent success.

@github-actions

github-actions Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-762 (49cf524)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.682µ 1.762µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.75n 61.81n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.65n 61.65n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.70n 61.69n ~ 0.900
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.10n 29.96n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 55.45n 51.02n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 128.0n 116.9n ~ 0.100
MiningCandidate_Stringify_Short-4 264.7n 263.4n ~ 0.200
MiningCandidate_Stringify_Long-4 1.928µ 1.909µ ~ 0.100
MiningSolution_Stringify-4 1003.0n 996.1n ~ 0.100
BlockInfo_MarshalJSON-4 1.804µ 1.763µ ~ 0.100
NewFromBytes-4 127.3n 131.5n ~ 0.100
Mine_EasyDifficulty-4 61.04µ 60.77µ ~ 0.700
Mine_WithAddress-4 6.712µ 6.727µ ~ 0.400
BlockAssembler_AddTx-4 0.02978n 0.03091n ~ 1.000
AddNode-4 11.17 11.36 ~ 0.400
AddNodeWithMap-4 11.30 11.38 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 63.14n 60.10n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 31.33n 31.45n ~ 0.600
DirectSubtreeAdd/256_per_subtree-4 30.30n 30.34n ~ 0.400
DirectSubtreeAdd/1024_per_subtree-4 29.10n 29.25n ~ 0.300
DirectSubtreeAdd/2048_per_subtree-4 28.62n 28.69n ~ 0.300
SubtreeProcessorAdd/4_per_subtree-4 278.5n 279.6n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 273.9n 273.0n ~ 0.400
SubtreeProcessorAdd/256_per_subtree-4 276.6n 274.1n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 268.4n 268.0n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 267.2n 268.9n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 271.8n 273.3n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 272.9n 270.8n ~ 0.500
SubtreeProcessorRotate/256_per_subtree-4 270.8n 271.3n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 270.3n 270.4n ~ 1.000
SubtreeNodeAddOnly/4_per_subtree-4 54.08n 54.62n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 34.44n 34.53n ~ 0.300
SubtreeNodeAddOnly/256_per_subtree-4 33.42n 33.57n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.73n 32.87n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 112.5n 112.9n ~ 0.300
SubtreeCreationOnly/64_per_subtree-4 393.0n 391.1n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.328µ 1.440µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.417µ 4.580µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.010µ 8.104µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 269.8n 269.5n ~ 1.000
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 269.4n 269.7n ~ 1.000
ParallelGetAndSetIfNotExists/1k_nodes-4 806.9µ 837.4µ ~ 0.200
ParallelGetAndSetIfNotExists/10k_nodes-4 1.596m 1.583m ~ 0.400
ParallelGetAndSetIfNotExists/50k_nodes-4 6.692m 6.553m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 13.32m 13.12m ~ 0.200
SequentialGetAndSetIfNotExists/1k_nodes-4 667.9µ 662.9µ ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 2.991m 2.752m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 10.44m 10.34m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 19.91m 19.82m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 638.0µ 635.4µ ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.221m 4.250m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.39m 16.61m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 707.9µ 700.2µ ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 5.772m 5.816m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 37.53m 37.96m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.495µ 3.592µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.352µ 3.361µ ~ 1.000
DiskTxMap_ExistenceOnly-4 295.6n 300.1n ~ 0.400
Queue-4 195.3n 196.2n ~ 0.400
AtomicPointer-4 4.595n 4.536n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 903.2µ 864.3µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 854.5µ 824.1µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 124.0µ 108.3µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 62.70µ 62.27µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 60.44µ 67.33µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/10K-4 12.02µ 11.71µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 5.283µ 6.266µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.821µ 2.540µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.161m 9.871m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.755m 10.735m ~ 0.200
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.183m 1.124m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 684.9µ 691.5µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 668.2µ 712.6µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 288.5µ 318.2µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/100K-4 55.11µ 57.73µ ~ 0.400
ReorgOptimizations/NodeFlags/New/100K-4 19.28µ 19.70µ ~ 0.100
TxMapSetIfNotExists-4 51.67n 51.88n ~ 0.200
TxMapSetIfNotExistsDuplicate-4 38.14n 37.97n ~ 1.000
ChannelSendReceive-4 637.5n 596.0n ~ 0.100
CalcBlockWork-4 466.9n 463.9n ~ 0.200
CalculateWork-4 626.4n 618.6n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.034µ 1.036µ ~ 0.800
BuildBlockLocatorString_Helpers/Size_100-4 9.904µ 11.115µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 98.83µ 98.95µ ~ 0.100
CatchupWithHeaderCache-4 103.7m 103.8m ~ 1.000
_BufferPoolAllocation/16KB-4 3.516µ 3.497µ ~ 0.400
_BufferPoolAllocation/32KB-4 9.285µ 8.087µ ~ 0.100
_BufferPoolAllocation/64KB-4 15.63µ 17.90µ ~ 0.200
_BufferPoolAllocation/128KB-4 28.73µ 33.18µ ~ 0.100
_BufferPoolAllocation/512KB-4 117.5µ 116.8µ ~ 0.700
_BufferPoolConcurrent/32KB-4 19.21µ 17.80µ ~ 0.100
_BufferPoolConcurrent/64KB-4 31.42µ 28.47µ ~ 0.100
_BufferPoolConcurrent/512KB-4 158.5µ 149.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 642.6µ 612.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 668.5µ 639.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 636.6µ 648.9µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/128KB-4 642.0µ 661.5µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/512KB-4 666.9µ 672.3µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.44m 35.71m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 35.98m 35.22m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.07m 35.50m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.79m 35.31m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.08m 34.88m ~ 0.100
_PooledVsNonPooled/Pooled-4 740.0n 739.3n ~ 0.700
_PooledVsNonPooled/NonPooled-4 6.987µ 7.632µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.938µ 6.864µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.315µ 9.569µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.679µ 9.617µ ~ 0.700
_prepareTxsPerLevel-4 417.6m 462.0m ~ 0.100
_prepareTxsPerLevelOrdered-4 3.816m 4.071m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 405.6m 431.3m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.678m 4.089m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.286m 1.316m ~ 0.400
SubtreeSizes/10k_tx_16_per_subtree-4 299.7µ 307.4µ ~ 0.200
SubtreeSizes/10k_tx_64_per_subtree-4 72.12µ 72.17µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 17.77µ 17.91µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 8.782µ 8.871µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.449µ 4.358µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 2.182µ 2.195µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 78.22µ 69.32µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 17.81µ 17.57µ ~ 0.200
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.359µ 4.335µ ~ 0.200
BlockSizeScaling/50k_tx_64_per_subtree-4 369.3µ 366.9µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 86.82µ 93.73µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.40µ 22.82µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 148.7µ 148.8µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 158.1µ 159.1µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 310.0µ 314.3µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 8.854µ 8.985µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 9.302µ 9.400µ ~ 0.200
SubtreeAllocations/medium_subtrees_full_validation-4 17.63µ 17.36µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.071µ 2.088µ ~ 0.600
SubtreeAllocations/large_subtrees_data_fetch-4 2.231µ 2.269µ ~ 0.200
SubtreeAllocations/large_subtrees_full_validation-4 4.314µ 4.356µ ~ 0.700
StoreBlock_Sequential/BelowCSVHeight-4 321.2µ 339.0µ ~ 0.400
StoreBlock_Sequential/AboveCSVHeight-4 339.9µ 342.2µ ~ 1.000
GetUtxoHashes-4 256.8n 258.2n ~ 1.000
GetUtxoHashes_ManyOutputs-4 47.29µ 42.94µ ~ 0.100
_NewMetaDataFromBytes-4 237.6n 238.3n ~ 0.400
_Bytes-4 624.2n 624.9n ~ 1.000
_MetaBytes-4 579.5n 565.0n ~ 0.200

Threshold: >10% with p < 0.05 | Generated: 2026-05-12 09:50 UTC

@ordishs ordishs requested review from icellan and oskarszoon May 12, 2026 08:08
@ordishs ordishs enabled auto-merge (squash) May 12, 2026 08:11

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this. The signalling fix itself is correct (named returns + if *err == nil guard in both stores, no other recover() sites in production paths). But there are real blockers before merge.

Blockers

1. CI is red on this branch. Commit 2d4481771 traded a depguard violation for a forbidigo one — fmt.Errorf is banned project-wide. Hits stores/utxo/aerospike/spend_panic_test.go:35 and stores/utxo/sql/spend_panic_test.go:50. Switch to teranode's own errors.New(errors.ERR_UNKNOWN, ...).

2. Goroutine panics are not caught. Both Spend impls do most of the actual work inside g.Go(...) lambdas (PutCtx, channel reads, slice writes, errSpent parsing). The deferred recover only catches panics in the parent goroutine. If issue #4560's reported panic originated inside one of those workers, this PR does not fix it — the process still crashes. Each worker lambda needs its own recover (or errgroup wrapping with a recover helper) before this can claim to close #4560.

3. errors.NewProcessingError collides with the block-validation retry classifier. services/blockvalidation/BlockValidation.go:1511-1513 treats ErrProcessing as transient and retries; same in services/legacy/netsync/handle_block.go:953,1037,1292. A recovered panic is not transient — repeated retries will hammer a broken path. Either introduce a distinct ErrPanic error code, or exclude the panic-derived error from the retry classifier.

Other real issues

  • No Aerospike end-to-end regression test. SQL has both the helper unit test and a real Spend(ctx, nil, h) regression. Aerospike only has the helper unit test, so the defer placement, named returns, and &err plumbing on the actual Spend aren't covered. Add a parallel Aerospike test.
  • Partial-mutation state remains unmitigated. None of the three callers (validator.spendUtxos, quick_validate.createAndSpendUTXOsForBatch, process_conflicting.ProcessConflicting) call Unspend on Spend error. The PR's own internal rollback (the needsSpendRollback block) is bypassed by the panic path because control transfers to the deferred recover before g.Wait(). The PR makes the failure visible but leaves a half-spent UTXO set on disk. At minimum, document this in the commit so a follow-up can address rollback semantics.
  • Defer asymmetry between stores. SQL puts the blockHeight == 0 check before the defer (stores/utxo/sql/sql.go:1637+); Aerospike puts the defer first (stores/utxo/aerospike/spend.go:285+). Aerospike's order is structurally safer — future code added at the top is protected by default. Align both to defer-first.

Nit

  • The logged message uses %v on the recovered value only — no stack trace. For real-world triage add debug.Stack() to the log line. Keep the error message text unchanged so the test substring assertion still passes.

Happy to re-approve once CI is green and goroutine recovery + retry-classifier handling are addressed.

errgroup.Go does not recover panics in worker bodies, so a panic inside
a spend worker would crash the process even though the parent had a
recover. Add a per-worker defer that funnels through handleSpendPanic
and writes into spends[idx].Err.

Reclassify recovered panics as ERR_UNKNOWN rather than ERR_PROCESSING so
the block-validation retry classifier does not treat them as transient
infrastructure errors and retry indefinitely. Log a stack trace alongside
the recovered value so the source is recoverable post-mortem.

SQL: move the blockHeight==0 guard below the recover so an early panic
is still captured.
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
12.2% Duplication on New Code (required ≤ 3%)
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@ordishs ordishs requested a review from oskarszoon May 12, 2026 10:48

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three blockers from prior review resolved in 438c0df:

  • Lint clean — fmt.Errorf removed, golangci-lint run ./stores/utxo/... passes locally.
  • Worker-goroutine panics now caught — per-worker defer handleSpendPanic in both Aerospike and SQL stores.
  • Reclassified to ErrUnknown (not ErrProcessing) so the BlockValidation.go transient-retry classifier won't loop on a panic.

Plus the defer-asymmetry and stack-trace nits — both addressed. Aerospike end-to-end regression test added (TestStore_SpendNilTxPanicRecovery).

Two low-severity nits left, non-blocking: require.Contains(err.Error(), "panic") could be errors.Is(err, ErrUnknown); worker-panic path doesn't trigger needsSpendRollback (pre-existing, just flagging).

LGTM.

@ordishs ordishs merged commit 300574a into bsv-blockchain:main May 12, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants