Skip to content

fix(utxo/aerospike): close 4 silent-failure paths in sendStoreBatch#853

Merged
icellan merged 1 commit into
mainfrom
fix/aerospike-sendstorebatch-silent-failures
May 13, 2026
Merged

fix(utxo/aerospike): close 4 silent-failure paths in sendStoreBatch#853
icellan merged 1 commit into
mainfrom
fix/aerospike-sendstorebatch-silent-failures

Conversation

@icellan

@icellan icellan commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes four code paths in sendStoreBatch where Create() could either return success while the transaction was not actually persisted, or hang the caller indefinitely with no error returned. Companion fix to #784 (which fixed the same class of bug in storeExternallyWithLock).

The 4 fixed paths

  1. Missing return after non-KEY_EXISTS top-level error — falling through into the per-record loop sent SafeSend(nil) for any record whose per-record Err was unset (the common case when the top-level call failed before reaching the server). Now returns after error notification.

  2. Wrong txHash in multi-item top-level KEY_EXISTS_ERROR — was using batch[0].txHash for every item in the batch. Now each item's error references its own hash. The comment claimed "this should only be called with 1 record" but the default utxostore_storeBatcherSize is 100.

  3. Silent hang on non-*AerospikeError per-record types — the fallback SafeSend for unknown aerospike result codes was nested inside the err.(*aerospike.AerospikeError) type-asserted branch. The Aerospike client exposes const sentinels like aerospike.ErrTimeout / ErrNetwork whose concrete type is *constAerospikeError, not *AerospikeError. If one ever appeared in BatchRec().Err, the type assertion failed, no SafeSend ran, and the caller hung on <-errCh. Fallback moved outside the type assertion.

  4. KEY_NOT_FOUND_ERROR assumption — code unconditionally treated KEY_NOT_FOUND_ERROR as a NOOP-placeholder marker and continued without notifying. A real BatchWrite returning KEY_NOT_FOUND_ERROR (Aerospike client edge cases, future client changes) would silently hang the caller. NOOP intent is now tracked explicitly via a new resultHandledElsewhere []bool so only intentional NOOPs are skipped.

Bonus fix uncovered while implementing

Top-level error notification looped over all items including those already notified by the setup loop (NewKey / GetBinsToStore / external-store failures). The resultHandledElsewhere flag suppresses those duplicates too.

Refactor for testability

Added the s.batchOperateFn seam to sendStoreBatch (already present in storeExternallyWithLock via #784) so the per-record paths can be unit-tested without an Aerospike server. Same field, same pattern.

Test plan

New file stores/utxo/aerospike/send_store_batch_test.go — 7 unit tests, no testcontainers required:

  • TopLevelNonKeyExistsError_NoDuplicateNotification — fails before fix (item receives 2 notifications)
  • TopLevelKeyExistsError_EachItemGetsOwnTxHash — fails before fix (item 1's error names batch[0]'s hash)
  • PerRecordConstError_StillNotifies — fails before fix (hangs on aerospike.ErrTimeout)
  • KeyNotFoundOnRealRecord_NotifiesError — fails before fix (silent skip on non-NOOP record)
  • AllSuccess_SingleSuccessNotificationPerItem — happy-path regression guard
  • MixedPerRecordResults — verifies classification of success + KEY_EXISTS + const-error in one batch
  • TopLevelError_DoesNotDuplicateNotifyPreSendItems — top-level error loop doesn't re-notify pre-notified items

Each error-path test was verified to fail on main before the fix, then pass after.

  • go build ./...
  • go vet ./stores/utxo/aerospike/
  • go test -race -run TestSendStoreBatch ./stores/utxo/aerospike/ — 7/7 pass

Each path could result in Create() returning success or hanging while
the transaction was not actually persisted, with no error returned:

1. Non-KEY_EXISTS top-level BatchOperate error fell through to the
   per-record loop, which then sent SafeSend(nil) over closed channels
   for items the error path had just notified. Add a return after the
   error notification and guard with resultHandledElsewhere.

2. Per-record errors of any concrete type other than *AerospikeError
   (e.g. const sentinels like aerospike.ErrTimeout) were silently
   skipped because the fallback SafeSend was nested inside the
   `if ok` type-assertion branch. Move it out so any non-nil
   per-record error is surfaced.

3. KEY_NOT_FOUND_ERROR was unconditionally treated as a NOOP-record
   marker via `continue` without notifying. Track NOOP intent
   explicitly via resultHandledElsewhere so a real BatchWrite that
   returns KEY_NOT_FOUND_ERROR is surfaced as an error.

4. Top-level KEY_EXISTS_ERROR notified every item in the batch with
   an error message hardcoded to batch[0].txHash. Use each item's
   own txHash.

Also add a batchOperateFn seam so sendStoreBatch is unit-testable,
mirroring the existing pattern from storeExternallyWithLock.

Tests in send_store_batch_test.go cover each failure mode plus
happy-path and mixed-result scenarios.
@icellan icellan requested review from Copilot, freemans13 and ordishs May 12, 2026 16:53
@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. This PR fixes four critical silent-failure paths in sendStoreBatch with rigorous test coverage. The implementation follows defensive programming principles and the test suite validates all error paths.

Summary:

This fix addresses dangerous edge cases where Create() could silently succeed while transactions were not persisted, or hang callers indefinitely. The solution introduces a resultHandledElsewhere tracking mechanism to prevent duplicate notifications and ensure every caller receives exactly one result.

Strengths:

  • All 4 bugs are real, well-documented, and properly fixed
  • 7 comprehensive unit tests covering both failure modes and happy paths
  • Excellent test discipline: buffered channels detect duplicate sends, drainOne helper verifies absence of notifications
  • Code comments explain the "why" behind each fix
  • Minimal diff adheres to the Small Diff Rule from AGENTS.md
  • Test seam (batchOperateFn) enables testing without infrastructure

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens stores/utxo/aerospike batch-creation behavior by ensuring every sendStoreBatch path either notifies the caller exactly once (success or error) or intentionally delegates notification to an external-storage goroutine, eliminating several previously silent success/hang scenarios.

Changes:

  • Fixes top-level BatchOperate error handling to avoid fallthrough into per-record processing and to ensure correct per-item tx-hash reporting for KEY_EXISTS_ERROR.
  • Ensances per-record result handling so non-*AerospikeError errors (e.g., const sentinel errors like aerospike.ErrTimeout) and unexpected KEY_NOT_FOUND_ERROR cases always notify callers.
  • Adds unit tests (serverless) via a batchOperateFn seam usage and coverage of the previously silent-failure/hang paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
stores/utxo/aerospike/create.go Ensures sendStoreBatch reliably sends exactly one notification per item (or delegates ownership), fixes error classification gaps, and corrects per-item error messaging.
stores/utxo/aerospike/send_store_batch_test.go Adds focused unit tests covering top-level/per-record error cases and duplicate-notification regressions without requiring an Aerospike server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-853 (d8899f5)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.717µ 1.750µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.89n 61.71n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.53n 61.57n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.64n 61.52n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.88n 31.30n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 54.63n 53.23n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 113.3n 112.3n ~ 1.000
MiningCandidate_Stringify_Short-4 264.8n 265.1n ~ 0.700
MiningCandidate_Stringify_Long-4 1.919µ 1.915µ ~ 1.000
MiningSolution_Stringify-4 983.1n 985.1n ~ 0.700
BlockInfo_MarshalJSON-4 1.788µ 1.793µ ~ 0.500
NewFromBytes-4 128.7n 128.4n ~ 0.300
Mine_EasyDifficulty-4 61.23µ 60.76µ ~ 0.400
Mine_WithAddress-4 6.772µ 6.775µ ~ 1.000
BlockAssembler_AddTx-4 0.03076n 0.03214n ~ 1.000
AddNode-4 11.78 11.90 ~ 1.000
AddNodeWithMap-4 11.89 11.82 ~ 0.700
DirectSubtreeAdd/4_per_subtree-4 62.53n 63.55n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 29.52n 29.56n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 27.61n 27.78n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.51n 26.66n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 26.19n 26.27n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 287.6n 283.8n ~ 0.400
SubtreeProcessorAdd/64_per_subtree-4 275.4n 274.8n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 278.7n 278.0n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 272.5n 271.2n ~ 0.700
SubtreeProcessorAdd/2048_per_subtree-4 275.4n 271.5n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 281.2n 276.2n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 282.3n 275.6n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 278.2n 273.9n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 278.2n 275.9n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 56.49n 54.23n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 35.11n 34.31n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.81n 33.39n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.91n 32.76n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 116.1n 117.1n ~ 1.000
SubtreeCreationOnly/64_per_subtree-4 423.5n 434.2n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.407µ 1.369µ ~ 0.700
SubtreeCreationOnly/1024_per_subtree-4 4.559µ 4.678µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.623µ 8.932µ ~ 0.400
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 273.5n 273.6n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 273.2n 276.3n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 587.8µ 602.3µ ~ 0.200
ParallelGetAndSetIfNotExists/10k_nodes-4 1.372m 1.342m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 6.712m 6.625m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 13.46m 13.30m ~ 0.400
SequentialGetAndSetIfNotExists/1k_nodes-4 664.2µ 656.3µ ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 2.760m 2.840m ~ 0.700
SequentialGetAndSetIfNotExists/50k_nodes-4 10.39m 10.91m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 19.81m 20.67m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 653.1µ 646.3µ ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.246m 4.239m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.82m 16.94m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 728.7µ 717.2µ ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.087m 5.754m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.05m 38.71m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.618µ 3.539µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.605µ 3.287µ ~ 0.100
DiskTxMap_ExistenceOnly-4 300.3n 303.2n ~ 0.600
Queue-4 194.1n 195.2n ~ 0.100
AtomicPointer-4 4.893n 4.674n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 850.1µ 873.0µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 810.9µ 860.1µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 113.8µ 105.7µ ~ 0.400
ReorgOptimizations/AllMarkFalse/New/10K-4 61.90µ 61.54µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 64.59µ 67.04µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.86µ 11.96µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 5.730µ 6.311µ ~ 0.400
ReorgOptimizations/NodeFlags/New/10K-4 2.214µ 2.665µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.01m 10.07m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.585m 9.965m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.137m 1.128m ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 682.2µ 681.5µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/100K-4 619.6µ 655.8µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 335.2µ 315.1µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 52.69µ 56.73µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 19.00µ 19.97µ ~ 0.100
TxMapSetIfNotExists-4 50.87n 51.24n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 37.77n 37.75n ~ 0.700
ChannelSendReceive-4 604.2n 594.0n ~ 0.100
CalcBlockWork-4 466.2n 465.2n ~ 0.700
CalculateWork-4 620.6n 620.9n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.322µ 1.336µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_100-4 12.81µ 12.72µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 125.4µ 125.8µ ~ 0.400
CatchupWithHeaderCache-4 104.3m 104.4m ~ 0.700
_prepareTxsPerLevel-4 414.8m 434.0m ~ 0.100
_prepareTxsPerLevelOrdered-4 3.575m 3.630m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 424.4m 426.2m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.671m 3.564m ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.417m 1.395m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 331.5µ 336.2µ ~ 0.700
SubtreeSizes/10k_tx_64_per_subtree-4 79.03µ 80.40µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 19.92µ 19.68µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.891µ 9.814µ ~ 0.700
SubtreeSizes/10k_tx_1024_per_subtree-4 4.882µ 4.861µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.446µ 2.439µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 78.32µ 77.05µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 19.56µ 19.39µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.849µ 4.856µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 409.2µ 402.5µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 97.48µ 96.91µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.08µ 24.20µ ~ 1.000
SubtreeAllocations/small_subtrees_exists_check-4 165.5µ 162.2µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 172.0µ 171.6µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 338.8µ 337.2µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.745µ 9.672µ ~ 0.400
SubtreeAllocations/medium_subtrees_data_fetch-4 10.24µ 10.15µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 19.78µ 19.66µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.343µ 2.334µ ~ 1.000
SubtreeAllocations/large_subtrees_data_fetch-4 2.483µ 2.503µ ~ 1.000
SubtreeAllocations/large_subtrees_full_validation-4 4.933µ 4.923µ ~ 1.000
_BufferPoolAllocation/16KB-4 3.484µ 4.403µ ~ 0.400
_BufferPoolAllocation/32KB-4 9.135µ 6.884µ ~ 0.200
_BufferPoolAllocation/64KB-4 14.80µ 13.82µ ~ 1.000
_BufferPoolAllocation/128KB-4 29.76µ 29.18µ ~ 0.400
_BufferPoolAllocation/512KB-4 120.3µ 117.9µ ~ 0.400
_BufferPoolConcurrent/32KB-4 17.23µ 16.96µ ~ 0.700
_BufferPoolConcurrent/64KB-4 27.29µ 26.70µ ~ 0.100
_BufferPoolConcurrent/512KB-4 164.9µ 161.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 668.9µ 655.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 651.5µ 637.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 645.1µ 643.6µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/128KB-4 646.6µ 648.5µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/512KB-4 671.2µ 667.4µ ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.51m 37.50m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.36m 37.54m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.27m 37.54m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.41m 37.56m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 37.35m 37.39m ~ 1.000
_PooledVsNonPooled/Pooled-4 816.1n 819.9n ~ 0.100
_PooledVsNonPooled/NonPooled-4 6.630µ 6.545µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 7.934µ 7.844µ ~ 1.000
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.69µ 10.73µ ~ 0.700
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.14µ 10.11µ ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 332.1µ 322.7µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 337.9µ 334.7µ ~ 0.700
GetUtxoHashes-4 258.2n 256.8n ~ 0.700
GetUtxoHashes_ManyOutputs-4 44.41µ 43.43µ ~ 0.400
_NewMetaDataFromBytes-4 238.6n 239.3n ~ 1.000
_Bytes-4 627.9n 626.9n ~ 1.000
_MetaBytes-4 570.0n 564.4n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-12 17:08 UTC

@icellan icellan merged commit b5d9e81 into main May 13, 2026
34 checks passed
@icellan icellan deleted the fix/aerospike-sendstorebatch-silent-failures branch May 13, 2026 08:48
@icellan icellan self-assigned this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants