fix(setMinedMulti): enforce coverage invariant across store, model, cache, and block validation by icellan · Pull Request #850 · bsv-blockchain/teranode

icellan · 2026-05-12T12:23:47Z

Summary

Closes the gap that allowed a block to reach SetBlockMinedSet(true) without every non-coinbase transaction being durably tagged with the current blockID in the UTXO txmeta store. Adds a defence-in-depth invariant enforced at every layer (store, cache wrapper, caller, block-validation), plus a bounded retry loop so a node carrying historical-corrupt state cannot spin forever on the new check.

Coverage invariant — enforced at every layer

Contract (stores/utxo/Interface.go): when UnsetMined=false and SetMinedMulti returns nil, every submitted hash MUST be a key in the returned map and every returned slice MUST contain minedBlockInfo.BlockID. Implementations that cannot prove this MUST return a non-nil error.

Implementations:

Aerospike UDF path — handleBatchRecordError promotes KEY_NOT_FOUND_ERROR to TxNotFoundError for normal set-mined (only the UnsetMined path tolerates a missing record).
Aerospike expression path — FILTERED_OUT synthesizes {hash: [blockID]} (the filter condition already proves the durable list contains the current blockID). A new end-of-loop coverage check promotes nil-record/empty-BlockIDs paths to TxNotFoundError / ProcessingError so the expression backend fails closed identically to the UDF + SQL backends.
TxMetaCache wrapper — SetMinedMulti re-verifies coverage on the wrapped store's result before touching the cache, then evicts each hash from the cache (mined txs are not cacheable per the GetMeta policy: len(BlockIDs) == 0 && !Conflicting). SetMined now routes through SetMinedMulti to share the check.
Caller — model.UpdateTxMinedStatus re-verifies coverage in a closure that piggybacks on the existing blockIDsMap iteration (no new nested loop). Two separate counters distinguish I/O failures (setMinedErrorCount) from postcondition gaps (coverageGapCount); the final error message reports both.
Block validation — BlockValidation.processBlockMinedNotSet no longer emits the "processed block mined and set mined_set" log on the failure path. The primary setTxMinedStatus flow already orders UpdateTxMinedStatus strictly before SetBlockMinedSet.

Bounded `setMinedChan` retry (operational gate per @oskarszoon)

Pre-PR, the setMinedChan worker retried failures forever at 1Hz. After this PR the postcondition check surfaces previously-silent failures, so an unbounded loop becomes operationally hostile on nodes carrying historical-corrupt block_ids data.

Bounded exponential backoff per block hash: 1s → 2s → 4s → 8s → 16s × 6 (capped). After 10 consecutive failures (worst-case 111s total) the worker drops the block hash with an ERROR log containing the manual_intervention_required marker. Counter resets on success, MinedSet shortcut, ErrBlockNotFound, or drop.
Prometheus metrics:
- teranode_blockvalidation_setmined_retry_total{blockhash} — alert when a single label series approaches the retry ceiling.
- teranode_blockvalidation_setmined_drops_total{blockhash} — any non-zero value is page-worthy.
Operator runbook in docs/topics/services/blockValidation.md § 2.3 covers the new failure mode (how to identify, repair, and re-trigger; cluster-wide drop rate is a stop-the-rollout signal).
Per-blockhash label cardinality is intentional: an alert on a single stuck block is the only useful operator signal.

Files changed

File	Change
`stores/utxo/Interface.go`	Documents the SetMinedMulti coverage postcondition
`stores/utxo/aerospike/set_mined.go`	UDF `KEY_NOT_FOUND` → `TxNotFoundError` for set-mined
`stores/utxo/aerospike/set_mined_expressions.go`	`FILTERED_OUT` synthesizes map entry; end-of-loop coverage check for all backends
`stores/utxo/mock.go`	`MockUtxostore.SetMinedMulti` accepts a function return for realistic test maps
`stores/txmetacache/txmetacache.go`	`SetMinedMulti` re-verifies + evicts; `SetMined` routes through `SetMinedMulti`
`model/update-tx-mined.go`	`checkBatchResults` closure folds coverage check into the existing double-spend pass; split counters for I/O vs coverage
`services/blockvalidation/BlockValidation.go`	Bounded retry on `setMinedChan` with exponential backoff + `manual_intervention_required` marker; success log only on success path
`services/blockvalidation/metrics.go`	Adds `setmined_retry_total` / `setmined_drops_total` counters
`docs/topics/services/blockValidation.md`	Coverage postcondition + retry loop + operator runbook

Tests

Aerospike (unit) — TestHandleBatchRecordError_KeyNotFound (set / unset), TestParseSetMinedState_ExplicitEmptyList, TestProcessBatchResultsForSetMinedExpressions_FilteredOutSynthesizesMapEntry, _KeyNotFoundIsHardError, _CoverageGap_NilRecord, _CoverageGap_EmptyBlockIDs, _UnsetMinedToleratesGap.
Model — extended Test_updateTxMinedStatus_Internal with subtests for empty-map coverage gap, partial omission (expression path empty BlockIDs scenario), missing-blockID-in-slice, and UnsetMined tolerance.
TxMetaCache — TestTxMetaCacheSetMinedMulti_DelegatesToStoreAndEvicts proves underlying-store delegation + cache eviction; _PropagatesStoreError proves short-circuit on store error and cache preservation.
BlockValidation — TestSetMinedRetryBackoff pins the backoff curve and the worst-case budget.

Verification

go build ./...                                                     # clean
go vet ./model ./stores/utxo/aerospike ./stores/txmetacache ./services/blockvalidation  # clean
go test ./model                                                    # pass (incl. race)
go test ./services/blockvalidation -run TestSetMined               # pass
go test ./stores/txmetacache -run TestTxMetaCache                  # pass
go test ./stores/utxo/aerospike -run "TestHandle|TestParse|TestProcess"  # pass

TestContainers-based integration tests (Aerospike, Postgres) require Docker which isn't available in dev env — CI will exercise them.

Risks

Historical corrupt state will now fail loudly. A node carrying pre-existing block_ids drift will hit the new check on first restart. The bounded retry + manual_intervention_required marker + Prometheus drops counter make this visible to operators instead of an unbounded log flood. Pre-deploy: a one-shot diagnostic scan over recent block_ids is advisable to catch drift before flipping the stricter check on across a cluster.
Expression-mode coverage synthesis is sound under the current filter condition (count(blockID in blockIDs) == 0). The inline comment ties the assumption to the filter, and the end-of-loop coverage check would catch a regression if the filter expression ever changes.
Per-blockhash Prometheus label cardinality is bounded by the number of historically-stuck blocks; once repaired, series fall out of recent windows.

…ache, and block validation Make it impossible for a block to reach SetBlockMinedSet without every non-coinbase transaction being durably tagged with the current block ID in the UTXO txmeta store. - aerospike UDF path: handleBatchRecordError no longer swallows KEY_NOT_FOUND_ERROR for normal set-mined; only the UnsetMined path tolerates a missing record. - aerospike expression path: FILTERED_OUT now synthesizes {hash: [blockID]} in the returned map (the filter's truth condition already proves the blockID is durably present), so downstream coverage checks see it. - both Aerospike paths: batch-coverage postcondition verifies every submitted hash is in the returned map and each slice contains minedBlockInfo.BlockID for non-UnsetMined calls. - UpdateTxMinedStatus: defensive caller-level postcondition mirrors the store invariant via verifyMinedBatchCoverage and existing setMinedErrorCount accounting. - TxMetaCache.SetMinedMulti: delegates to the underlying utxo.Store first; cache update failure after store success returns an error to force retry instead of silently masking missing durable state. - BlockValidation.processBlockMinedNotSet: the "processed block mined and set mined_set" log now only fires when setTxMinedStatus succeeded (previously emitted unconditionally even after requeue). - utxo.Store interface: SetMinedMulti docstring documents the new postcondition contract for all implementations. - MockUtxostore.SetMinedMulti accepts a function return so tests can synthesize realistic blockIDsMaps from submitted hashes. Tests cover store-level KEY_NOT_FOUND semantics, caller-level coverage gaps (missing hash and missing blockID), UnsetMined tolerance, and TxMetaCache delegation + error propagation.

github-actions · 2026-05-12T12:25:24Z

🤖 Claude Code Review

Status: Complete

All previously reported issues have been resolved in this version.

History:

✅ Fixed: Separate counters for I/O failures vs coverage gaps (model/update-tx-mined.go)
✅ Fixed: Postcondition enforcement in Aerospike expression path
✅ Fixed: Coverage validation in TxMetaCache wrapper
✅ Fixed: All store implementations now enforce the SetMinedMulti contract

Current Review:
No issues found. The implementation is well-structured and comprehensive:

Coverage invariant properly enforced across all layers (store, cache, model, block validation)
Bounded retry logic with exponential backoff prevents operational issues
Comprehensive test coverage includes edge cases for coverage gaps, mixed failures, and retry behavior
Documentation accurately reflects implementation behavior
Metrics and observability provide clear operator signals for intervention

The PR successfully closes the gap that could allow blocks to reach MinedSet=true without proper transaction tagging.

github-actions · 2026-05-12T12:38:44Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-850 (202cfe5)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.612µ	1.554µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	71.25n	71.11n	~	0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	71.15n	71.25n	~	0.200
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	71.01n	71.01n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	31.86n	32.19n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	53.51n	54.43n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	129.6n	129.5n	~	1.000
MiningCandidate_Stringify_Short-4	223.2n	216.7n	~	0.100
MiningCandidate_Stringify_Long-4	1.595µ	1.602µ	~	0.600
MiningSolution_Stringify-4	830.9n	840.1n	~	0.100
BlockInfo_MarshalJSON-4	1.704µ	1.701µ	~	0.300
NewFromBytes-4	127.2n	127.7n	~	0.600
Mine_EasyDifficulty-4	60.66µ	60.50µ	~	0.700
Mine_WithAddress-4	6.753µ	6.782µ	~	0.400
DirectSubtreeAdd/4_per_subtree-4	56.89n	61.09n	~	0.200
DirectSubtreeAdd/64_per_subtree-4	28.39n	28.42n	~	1.000
DirectSubtreeAdd/256_per_subtree-4	27.77n	27.41n	~	0.700
DirectSubtreeAdd/1024_per_subtree-4	26.10n	26.25n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	25.78n	26.31n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	279.2n	283.3n	~	1.000
SubtreeProcessorAdd/64_per_subtree-4	272.8n	272.2n	~	0.400
SubtreeProcessorAdd/256_per_subtree-4	275.5n	274.9n	~	1.000
SubtreeProcessorAdd/1024_per_subtree-4	267.1n	267.0n	~	1.000
SubtreeProcessorAdd/2048_per_subtree-4	271.1n	266.5n	~	0.200
SubtreeProcessorRotate/4_per_subtree-4	273.9n	282.7n	~	0.100
SubtreeProcessorRotate/64_per_subtree-4	271.2n	272.1n	~	0.500
SubtreeProcessorRotate/256_per_subtree-4	269.5n	271.5n	~	0.200
SubtreeProcessorRotate/1024_per_subtree-4	273.6n	270.6n	~	0.100
SubtreeNodeAddOnly/4_per_subtree-4	54.03n	54.64n	~	0.200
SubtreeNodeAddOnly/64_per_subtree-4	34.22n	34.35n	~	0.200
SubtreeNodeAddOnly/256_per_subtree-4	33.20n	33.52n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	32.66n	33.00n	~	0.100
SubtreeCreationOnly/4_per_subtree-4	112.2n	115.2n	~	0.400
SubtreeCreationOnly/64_per_subtree-4	401.4n	393.3n	~	0.700
SubtreeCreationOnly/256_per_subtree-4	1.309µ	1.322µ	~	0.400
SubtreeCreationOnly/1024_per_subtree-4	4.335µ	4.347µ	~	1.000
SubtreeCreationOnly/2048_per_subtree-4	7.832µ	7.816µ	~	1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	270.1n	271.9n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	268.8n	269.6n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	800.0µ	798.7µ	~	1.000
ParallelGetAndSetIfNotExists/10k_nodes-4	1.527m	1.531m	~	1.000
ParallelGetAndSetIfNotExists/50k_nodes-4	6.692m	6.536m	~	0.200
ParallelGetAndSetIfNotExists/100k_nodes-4	13.54m	13.78m	~	0.400
SequentialGetAndSetIfNotExists/1k_nodes-4	648.3µ	664.8µ	~	0.100
SequentialGetAndSetIfNotExists/10k_nodes-4	2.843m	2.844m	~	1.000
SequentialGetAndSetIfNotExists/50k_nodes-4	10.44m	10.54m	~	0.100
SequentialGetAndSetIfNotExists/100k_nodes-4	19.91m	20.23m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	617.4µ	622.8µ	~	0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	4.147m	4.117m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	16.44m	16.40m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	683.3µ	688.5µ	~	0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	5.944m	5.685m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	39.36m	38.11m	~	0.100
BlockAssembler_AddTx-4	0.02856n	0.03087n	~	0.400
AddNode-4	11.58	11.83	~	1.000
AddNodeWithMap-4	11.82	12.28	~	0.400
DiskTxMap_SetIfNotExists-4	4.220µ	4.244µ	~	1.000
DiskTxMap_SetIfNotExists_Parallel-4	4.008µ	3.893µ	~	0.100
DiskTxMap_ExistenceOnly-4	449.1n	539.7n	~	0.700
Queue-4	205.9n	201.0n	~	0.100
AtomicPointer-4	8.179n	8.139n	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	794.8µ	793.0µ	~	0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4	742.5µ	737.9µ	~	0.400
ReorgOptimizations/AllMarkFalse/Old/10K-4	111.7µ	112.1µ	~	1.000
ReorgOptimizations/AllMarkFalse/New/10K-4	58.33µ	58.13µ	~	0.700
ReorgOptimizations/HashSlicePool/Old/10K-4	58.59µ	56.98µ	~	0.100
ReorgOptimizations/HashSlicePool/New/10K-4	11.80µ	11.81µ	~	1.000
ReorgOptimizations/NodeFlags/Old/10K-4	5.230µ	5.031µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	1.836µ	1.728µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	10.98m	11.39m	~	0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4	12.30m	12.03m	~	0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.185m	1.155m	~	0.100
ReorgOptimizations/AllMarkFalse/New/100K-4	737.0µ	728.9µ	~	0.400
ReorgOptimizations/HashSlicePool/Old/100K-4	580.8µ	556.9µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	332.0µ	331.6µ	~	0.400
ReorgOptimizations/NodeFlags/Old/100K-4	50.25µ	49.53µ	~	0.100
ReorgOptimizations/NodeFlags/New/100K-4	16.78µ	17.16µ	~	0.200
TxMapSetIfNotExists-4	51.77n	50.30n	~	0.200
TxMapSetIfNotExistsDuplicate-4	43.66n	43.27n	~	0.100
ChannelSendReceive-4	673.0n	694.3n	~	0.100
CalcBlockWork-4	503.6n	511.4n	~	0.100
CalculateWork-4	684.5n	681.8n	~	1.000
BuildBlockLocatorString_Helpers/Size_10-4	1.197µ	1.052µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	9.958µ	10.124µ	~	0.100
BuildBlockLocatorString_Helpers/Size_1000-4	98.71µ	110.63µ	~	0.100
CatchupWithHeaderCache-4	103.8m	103.8m	~	1.000
_BufferPoolAllocation/16KB-4	3.646µ	3.884µ	~	0.100
_BufferPoolAllocation/32KB-4	7.506µ	7.423µ	~	0.700
_BufferPoolAllocation/64KB-4	15.84µ	18.82µ	~	0.700
_BufferPoolAllocation/128KB-4	26.64µ	24.08µ	~	0.400
_BufferPoolAllocation/512KB-4	115.8µ	107.0µ	~	0.400
_BufferPoolConcurrent/32KB-4	18.52µ	17.88µ	~	0.200
_BufferPoolConcurrent/64KB-4	27.65µ	29.99µ	~	0.100
_BufferPoolConcurrent/512KB-4	142.1µ	154.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/16KB-4	609.3µ	652.9µ	~	0.100
_SubtreeDeserializationWithBufferSizes/32KB-4	611.1µ	647.3µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	616.0µ	657.8µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	595.3µ	655.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	612.0µ	645.8µ	~	0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4	36.74m	36.39m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4	36.45m	36.52m	~	1.000
_SubtreeDataDeserializationWithBufferSizes/64KB-4	36.46m	36.62m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4	36.32m	36.12m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4	36.47m	36.07m	~	0.100
_PooledVsNonPooled/Pooled-4	736.3n	742.7n	~	0.100
_PooledVsNonPooled/NonPooled-4	7.057µ	7.872µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.528µ	6.800µ	~	0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.371µ	11.356µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.143µ	10.202µ	~	0.100
_prepareTxsPerLevel-4	424.4m	428.3m	~	0.400
_prepareTxsPerLevelOrdered-4	4.109m	4.556m	~	0.700
_prepareTxsPerLevel_Comparison/Original-4	422.8m	428.0m	~	0.200
_prepareTxsPerLevel_Comparison/Optimized-4	3.930m	3.904m	~	0.400
SubtreeSizes/10k_tx_4_per_subtree-4	1.472m	1.402m	~	0.100
SubtreeSizes/10k_tx_16_per_subtree-4	334.4µ	326.9µ	~	0.700
SubtreeSizes/10k_tx_64_per_subtree-4	80.85µ	79.19µ	~	0.100
SubtreeSizes/10k_tx_256_per_subtree-4	19.94µ	19.39µ	~	0.100
SubtreeSizes/10k_tx_512_per_subtree-4	9.810µ	9.596µ	~	0.100
SubtreeSizes/10k_tx_1024_per_subtree-4	4.903µ	4.743µ	~	0.100
SubtreeSizes/10k_tx_2k_per_subtree-4	2.431µ	2.375µ	~	0.100
BlockSizeScaling/10k_tx_64_per_subtree-4	77.38µ	76.68µ	~	0.400
BlockSizeScaling/10k_tx_256_per_subtree-4	19.56µ	19.13µ	~	0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.826µ	4.729µ	~	0.100
BlockSizeScaling/50k_tx_64_per_subtree-4	402.7µ	399.4µ	~	0.400
BlockSizeScaling/50k_tx_256_per_subtree-4	97.27µ	94.03µ	~	0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4	23.98µ	23.56µ	~	0.100
SubtreeAllocations/small_subtrees_exists_check-4	164.6µ	163.6µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	171.0µ	168.2µ	~	0.400
SubtreeAllocations/small_subtrees_full_validation-4	340.1µ	329.1µ	~	0.100
SubtreeAllocations/medium_subtrees_exists_check-4	9.876µ	9.554µ	~	0.100
SubtreeAllocations/medium_subtrees_data_fetch-4	10.064µ	9.766µ	~	0.100
SubtreeAllocations/medium_subtrees_full_validation-4	19.76µ	19.06µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.336µ	2.274µ	~	0.100
SubtreeAllocations/large_subtrees_data_fetch-4	2.459µ	2.395µ	~	0.100
SubtreeAllocations/large_subtrees_full_validation-4	4.877µ	4.785µ	~	0.100
StoreBlock_Sequential/BelowCSVHeight-4	322.5µ	329.1µ	~	1.000
StoreBlock_Sequential/AboveCSVHeight-4	321.2µ	332.3µ	~	0.200
GetUtxoHashes-4	254.9n	255.6n	~	0.800
GetUtxoHashes_ManyOutputs-4	42.24µ	42.19µ	~	0.700
_NewMetaDataFromBytes-4	234.3n	234.6n	~	1.000
_Bytes-4	606.0n	601.5n	~	0.100
_MetaBytes-4	547.5n	547.0n	~	0.400

Threshold: >10% with p < 0.05 | Generated: 2026-05-18 09:55 UTC

…odel, cache, and block validation

Copilot

Pull request overview

This PR tightens the “tx mined tagging” invariants so block validation fails closed unless every non-coinbase tx is durably marked as mined in the UTXO txmeta store for the current blockID, and aligns cache behavior/logging with that goal.

Changes:

Tighten Aerospike SetMinedMulti error handling (KEY_NOT_FOUND is only tolerated for UnsetMined) and improve expression-mode handling of FILTERED_OUT.
Add a model-layer postcondition check to detect coverage gaps in SetMinedMulti results and fail block processing closed.
Fix TxMetaCache.SetMinedMulti to delegate to the underlying store and evict from cache; adjust block-validation logging on failure.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`stores/utxo/mock.go`	Extends mock to allow function-based return values for realistic `blockIDsMap` synthesis in tests.
`stores/utxo/Interface.go`	Documents a stronger `SetMinedMulti` postcondition/contract.
`stores/utxo/aerospike/set_mined.go`	Makes KEY_NOT_FOUND a hard error for normal set-mined (no-op only for unset-mined).
`stores/utxo/aerospike/set_mined_expressions.go`	Treats `FILTERED_OUT` as covered by synthesizing a map entry.
`stores/utxo/aerospike/set_mined_expressions_test.go`	Adds coverage for parsing an explicitly empty `blockIDs` bin.
`stores/utxo/aerospike/refactored_methods_test.go`	Adds regression coverage for KEY_NOT_FOUND handling behavior.
`stores/txmetacache/txmetacache.go`	Ensures SetMinedMulti delegates to store then evicts cache entries.
`stores/txmetacache/txmetacache_test.go`	Adds tests proving delegation/eviction and store-error short-circuiting.
`services/blockvalidation/BlockValidation.go`	Prevents success log emission when `setTxMinedStatus` fails and block is requeued.
`model/update-tx-mined.go`	Adds coverage-gap verification logic after `SetMinedMulti`.
`model/update-tx-mined_test.go`	Updates existing tests and adds new subtests for coverage-gap erroring and unset-mined tolerance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

oskarszoon

The 5-layer defense-in-depth shape is right and the fix correctly transforms a silent postcondition violation into a surfaced error. Store returns full blockIDs map → checkBatchResults walks coverage per submitted hash → cache pass-through (clean) → setTxMinedStatus surfaces as ProcessingError → setMinedChan retry loop. No peer-controllable DoS path; SQL and Lua backends produce equivalent semantics; cache is pure forwarding.

One operational blocker before merge:

Unbounded 1Hz retry loop on setMinedChan. BlockValidation.go:561-567 retries forever at 1-second intervals on any non-ErrBlockNotFound error. After this PR lands, a node carrying historical-corrupt block_ids data (records that lost a blockID via a prior Lua signal-handling crash, a pre-PR-4561-class bug, or an Aerospike eviction race) will hit the new coverage check on first restart and spin at ~86,400 ProcessingError log lines per day per affected block, with no way to drain the channel and no metric for operators to see "I'm stuck on this block" until log volume catches their eye. The fix is exactly the class of bug this PR is designed to surface, but the response shape is too aggressive for an unrecoverable error. Three required changes:

Add a max-retry counter on setMinedChan re-queues (e.g. 10 attempts with exponential backoff 1s→16s), then log WARN + drop with a manual_intervention_required metric/log marker.
Add block_validation_setmined_retry_total{blockhash} Prometheus counter so the stuck state is visible in dashboards.
Document the new failure mode in docs/topics/services/blockValidation.md so on-call understand "this loop is normal during corrupt-state recovery; here's how to escalate".

Pre-deploy operator coordination separately: worth a one-shot diagnostic scan across block_ids for recently-processed blocks before flipping this on for any environment with historical state. If drift exists, repair-then-deploy.

Non-gating follow-ups: differentiate setMinedErrorCount into typed buckets (storage / conversion / coverage) so the retry loop can apply policy; push the coverage invariant down into Aerospike Lua + SQL for atomic enforcement; add a fixture-driven table test for checkBatchResults exercising mixed-mode batch results.

Happy to re-approve once the bounded retry + Prometheus counter are in.

UpdateTxMinedStatus lumped two distinct failures under a single setMinedErrorCount, surfacing both as "failed to set mined status for N batches". An operator triaging a stalled block could not tell a store I/O blip from a postcondition violation (SetMinedMulti returned nil error but did not durably tag every submitted tx). Track them in separate atomics — setMinedErrorCount for I/O failures, coverageGapCount for postcondition violations — and switch on the two counts at the end to emit a cause-specific error. The combined-failure variant reports both counts so a mixed batch is still legible. Adds three subtests pinning each error variant (pure I/O, pure coverage, combined).

…_FOUND behavior processBatchResultsForSetMinedExpressions synthesizes the result-map entry on FILTERED_OUT (proof, from the filter condition, that the current blockID is already in the durable list) and treats it as a successful idempotent retry. There was no direct test covering that branch, so a regression that dropped the synthesis or changed it to an error would only have surfaced as a coverage-gap downstream. Adds a focused test that drives the function with a fake BatchWrite returning FILTERED_OUT and asserts the synthesized [blockID] slice. Adds a paired regression guard confirming KEY_NOT_FOUND on the same path remains a hard error and does not synthesize a map entry.

The SQL backend (postgres, sqlite, sqlitememory) silently dropped any submitted hash that was absent from the transactions table: existence was probed up front, the INSERT/SELECT path then operated only on the intersection, and the missing hashes never appeared in the returned map. The Interface.go postcondition added in PR-850 says the opposite — for normal set-mined every submitted hash MUST be in the result. The model layer's coverage check already caught this (the hash showed up as a coverage gap and the block failed closed on retry), so the visible behavior was correct. But the contract was wrong, and a caller outside UpdateTxMinedStatus (or a future refactor that thins the model-layer guard) would re-introduce the silent drop. Bring SQL into parity with Aerospike's KEY_NOT_FOUND handling: when !UnsetMined and any submitted hash is missing from transactions, return TxNotFoundError and let the deferred rollback fire. UnsetMined still tolerates missing rows (the tx is already gone). Updates TestStore_SetMinedMultiChunk_NoExistingTransactions (which previously pinned the silent-drop) to expect the new error, and adds two new tests: one for UnsetMined=true tolerance, one for the partial-miss case where some hashes exist and some don't.

github-actions · 2026-05-14T13:27:27Z

Reviewed the Aerospike expression path postcondition enforcement:

Issue: processBatchResultsForSetMinedExpressions (stores/utxo/aerospike/set_mined_expressions.go:328-464) lacks postcondition validation. When batchRec.Record == nil || batchRec.Record.Bins == nil (lines 383-384), the hash is silently dropped from the result map without error tracking.

Impact: Violates the SetMinedMulti interface contract (stores/utxo/Interface.go:261-264) which requires every submitted hash to appear in the returned map with the current blockID when \!UnsetMined.

Suggested fix: Add postcondition check before line 463 return:

if \!minedBlockInfo.UnsetMined {
  for _, h := range hashes {
    blockIDs, exists := blockIDs[*h]
    if \!exists || \!containsBlockID(blockIDs, minedBlockInfo.BlockID) {
      errs = errors.Join(errs, errors.NewTxNotFoundError("coverage gap", h.String()))
      nrErrors++
    }
  }
}

Note: The model layer defensive check (model/update-tx-mined.go:289-297) catches this, but store implementations should enforce their own contracts per the interface docstring.

ordishs

Approving on the fix correctness — the 5-layer defense-in-depth is sound, backend parity (SQL ↔ Aerospike) is in place, and the test matrix covers the important paths (empty map, partial omit, FILTERED_OUT synthesis, KEY_NOT_FOUND as hard error, mixed I/O + coverage error reporting). I built the branch, ran go vet, and ran the affected unit tests including -race on model — all green.

One operational concern worth a follow-up (not gating my approval, but flagging it before deploy):

The unbounded 1-second retry on setMinedChan (BlockValidation.go:553-567) is a pre-existing pattern, but this PR materially changes its risk profile. Before: failures were transient I/O. After: failures include "this block's historical state can never satisfy the new postcondition" — genuinely unrecoverable through retry alone. A node carrying any pre-existing corrupt block_ids data will spin at ~86,400 log lines/day per affected block on first restart, with no metric for operators to spot it.

Worth doing before this hits any environment with historical state:

Bounded retries on setMinedChan re-queue (e.g. exponential 1s→16s, drop after N attempts with a manual_intervention_required log marker).
block_validation_setmined_retry_total{blockhash} Prometheus counter for dashboard visibility.
A short note in docs/topics/services/blockValidation.md describing the new failure class for on-call.
Pre-deploy: a one-shot diagnostic scan over recent block_ids to catch any drift before flipping the stricter check on.

Happy to land this as-is if the retry hardening is queued as a follow-up PR with a clear owner.

Minor housekeeping for the description before merge: the TxMetaCache.SetMinedMulti blurb still says "cache failure after a successful store write returns an error" — the code now evicts via Delete, which can't fail. The "Batch-coverage postcondition (both Aerospike paths)" line is also slightly misleading; the postcondition is enforced at the model layer, with the Aerospike paths contributing the inputs (KEY_NOT_FOUND as hard error, FILTERED_OUT synthesis). Doesn't change correctness, but easier on future readers.

Align trailing comments on the new setMinedErrorCount / coverageGapCount declarations with the longer oldBlockIDs line so the gci formatter is happy. Pure whitespace, no behavior change.

…xMetaCache The SetMinedMulti interface contract added in this PR promises that, on a nil error with !UnsetMined, every submitted hash is present in the result map and every slice contains minedBlockInfo.BlockID. SQL already enforced this; Aerospike and TxMetaCache silently passed through coverage gaps. Aerospike: add an end-of-loop postcondition check in processBatchResultsForSetMinedExpressions that promotes uncovered hashes to a NewTxNotFoundError (or NewProcessingError when the slice exists but lacks the current blockID). Covers the nil-record/bins path and the empty-BlockIDs-bin path that previously returned nil. UnsetMined still tolerates gaps per the interface contract. TxMetaCache: re-verify coverage in SetMinedMulti before evicting cache entries so a regression in any wrapped backend cannot slip through the cache layer. Route SetMined through SetMinedMulti to share the check. Tests pin the new behaviour: - Aerospike: CoverageGap_NilRecord, CoverageGap_EmptyBlockIDs, UnsetMinedToleratesGap. - TxMetaCache: PostconditionMissingHash, PostconditionMissingBlockID, UnsetMinedToleratesGap. - Stale comment on TestParseSetMinedState_ExplicitEmptyList rewritten to reflect the new store-side enforcement.

…odel, cache, and block validation

sonarqubecloud · 2026-05-18T09:53:43Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
82.2% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

oskarszoon

All three blockers from the previous review are closed:

Bounded retry on setMinedChan: setMinedMaxRetries=10 with exponential backoff 1s→16s and a manual_intervention_required log marker on final drop (99098a1fc).
Prometheus counters: setmined_retry_total{blockhash} and setmined_drops_total{blockhash} in metrics.go.
Operator-runbook section in docs/topics/services/blockValidation.md covering the new failure class, backoff table, counters, and repair procedure.

TestSetMinedRetryBackoff pins the 111s total budget assertion correctly. PR description is updated. ordishs's typed-error-bucket split (e3d35e68f) and SQL postcondition (a941c7475) are clean; icellan's routing of SetMined through SetMinedMulti and the TxMetaCache defensive check in 05aed56ec are correct.

Two non-blocking follow-ups worth a separate ticket:

The non-expressions Aerospike path (processBatchResultsForSetMined) has no end-of-loop coverage check, unlike the expressions path. Covered downstream by the TxMetaCache and model-layer postcondition checks, so safe, but the asymmetry between the two batch paths is worth closing.
time.Sleep(backoff) blocks the single setMinedChan worker for up to 16s per retry. Operationally acceptable given retries should be rare, but worth a one-line note in the runbook ("during recovery the setMined worker is single-threaded; concurrent stuck blocks serialise").

Conflict: main's #850 (setMinedMulti coverage invariant) reworked handleBatchRecordError to branch on a new unsetMined flag, while the PR side had added a batchRecord parameter for richer diagnostic logging. Merged the two: signature is now handleBatchRecordError(batchRecord, err, hash, unsetMined) - KEY_NOT_FOUND_ERROR returns nil under unsetMined (main's invariant) or TxNotFoundError otherwise. - All other errors keep the PR's batchRecord-aware storage error message. Single call site in set_mined.go updated; refactored_methods_test.go calls updated to pass nil batchRecord (the unit-level tests don't need the diagnostic context). Also retargeted main's new test-file imports from upstream aerospike-client-go to the BSV fork, matching the rest of the PR.

icellan added 2 commits May 12, 2026 17:23

fixup! fix(setMinedMulti): enforce coverage invariant across store, m…

7d6027c

…odel, cache, and block validation

fixup! fix(setMinedMulti): enforce coverage invariant across store, m…

1f9a5ad

…odel, cache, and block validation

icellan requested review from Copilot, freemans13 and ordishs and removed request for ordishs May 12, 2026 16:57

Copilot started reviewing on behalf of icellan May 12, 2026 16:58 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread model/update-tx-mined.go

Comment thread stores/utxo/aerospike/set_mined_expressions.go

Comment thread stores/utxo/Interface.go

Comment thread stores/utxo/aerospike/set_mined_expressions_test.go Outdated

Comment thread stores/txmetacache/txmetacache.go

Merge branch 'main' into fix/set-tx-mined-invariant

84d57bf

freemans13 approved these changes May 14, 2026

View reviewed changes

oskarszoon requested changes May 14, 2026

View reviewed changes

ordishs added 4 commits May 14, 2026 15:19

Merge branch 'main' into fix/set-tx-mined-invariant

e541267

ordishs approved these changes May 14, 2026

View reviewed changes

icellan and others added 4 commits May 15, 2026 17:44

chore(model): fix gci comment alignment in update-tx-mined

a136e27

Align trailing comments on the new setMinedErrorCount / coverageGapCount declarations with the longer oldBlockIDs line so the gci formatter is happy. Pure whitespace, no behavior change.

Merge branch 'main' into fix/set-tx-mined-invariant

c524bd0

fixup! fix(setMinedMulti): enforce coverage invariant across store, m…

99098a1

…odel, cache, and block validation

oskarszoon approved these changes May 18, 2026

View reviewed changes

icellan merged commit fe3e73a into main May 18, 2026
30 checks passed

icellan deleted the fix/set-tx-mined-invariant branch May 18, 2026 10:12

icellan self-assigned this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(setMinedMulti): enforce coverage invariant across store, model, cache, and block validation#850

fix(setMinedMulti): enforce coverage invariant across store, model, cache, and block validation#850
icellan merged 12 commits into
mainfrom
fix/set-tx-mined-invariant

icellan commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oskarszoon left a comment

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

ordishs left a comment

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Uh oh!

oskarszoon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

icellan commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Coverage invariant — enforced at every layer

Bounded setMinedChan retry (operational gate per @oskarszoon)

Files changed

Tests

Verification

Risks

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oskarszoon left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

ordishs left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Quality Gate passed

Uh oh!

oskarszoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

icellan commented May 12, 2026 •

edited

Loading

Bounded `setMinedChan` retry (operational gate per @oskarszoon)

github-actions Bot commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading