perf(pruner): skip already-pruned parent updates using an in-memory cuckoo filter by freemans13 · Pull Request #875 · bsv-blockchain/teranode

freemans13 · 2026-05-15T11:44:59Z

Summary

The pruner deletes spent UTXOs and tells each spent UTXO's parent to mark itself as having one fewer outstanding child. That second step is a wasted Aerospike round-trip if the parent has already been pruned in an earlier session.

This PR keeps a set of pruned TXIDs in memory and uses it as a pre-flight skip filter: if a parent is in the set, we skip the update. The set is backed by a sharded, lock-free, two-generation cuckoo filter — ~1 byte per entry.

The set is held in memory for the lifetime of the process and reused across prune sessions, so children whose parents were pruned in an earlier session can still skip the round-trip. It is not persisted to disk: on pod restart it is rebuilt from scratch and the catch rate re-ramps from cold. Cap is configurable via pruner_utxoPrunedSetMaxEntries (default 10M ≈ 10 MiB; set to 0 for the built-in 2B ≈ 2 GiB default).

Measured impact on dev-scale-1

Measured with pruner_utxoPrunedSetMaxEntries=0 (the 2B / ~2 GiB cap), not the shipped 10M default:

	Before	After
records_deleted/sec	1.64M	2.6M
Wasted parent updates caught	2.4%	99.9%
Wasted Aerospike round-trips/session	411M	~500K
Pruner memory	2.6 GiB	~7 GiB
Pruner CPU	14.4 / 15 cores	14.4 / 15 cores

What's in this PR

PrunedTxSet — sharded set of TXIDs pruned in this and prior sessions within a single process. In-memory only; reused across prune sessions, rebuilt from scratch on pod restart.
Cuckoo-filter backing (util/cuckoo.H32) — specialised for 32-byte hashes, lock-free, zero-alloc on the hot path. ~3.1% false-positive rate (which only causes us to skip an update we should have done — never the other way — and with defensive mode off the deletedChildren bin is never read anyway).
Two-generation rotation — a single cuckoo eventually saturates and freezes. We keep a current and previous filter per shard; when current fills, it rotates into previous and a fresh current is allocated. Lookups check both. The set never freezes.
Saturation short-circuit — once eviction has provably failed once, future Inserts that find both candidate buckets full return immediately rather than burning ~15 µs of CPU per call on a hopeless eviction loop.
Combined batch — when defensive mode is off, parent updates and child deletions go in a single Aerospike BatchOperate instead of two. ~8% throughput win.
Configurable cap — pruner_utxoPrunedSetMaxEntries setting. Default 10M (~10 MiB); 0 selects the built-in 2B (~2 GiB) default.
Metrics — utxo_pruner_parents_skipped_pruned_total, utxo_pruner_pruned_set_size, utxo_pruner_pruned_set_saturated, utxo_pruner_pruned_set_rotations.

Why a hand-written cuckoo filter

We tried github.com/seiflotfy/cuckoofilter first. Two problems:

Allocations — its API takes []byte, causing the 32-byte hash to escape to the heap via the hasher interface on every Insert/Lookup/Delete. At 1.5M ops/sec that's ~50 MB/sec of churn, 2 GC/sec, p99 GC pause 5 ms.
Locks — per-shard sync.Mutex produced ~28% of total CPU in lock2/futex/procyield under load.

util/cuckoo.H32 is the specialised replacement: takes *[32]byte (no escape), uses atomic CAS on packed bucket words (no mutex). Same correctness guarantees, no allocations on hot paths, no lock contention.

We didn't migrate the three existing seiflotfy call sites in teranode (model/disk_parent_spends_map.go, model/disk_tx_map_uint64.go, services/blockassembly/subtreeprocessor/disk_tx_map.go) — they use different key sizes (36-byte inpoints, uint64) so can't drop in cuckoo.H32 as-is. Migrating them is a future PR.

Notes for reviewers

Not persisted across restarts. The set lives only in memory; a pod restart rebuilds it from scratch and the catch rate re-ramps from cold. A persistence layer (serialize on shutdown / load on startup) is possible future work but is not in this PR.
FP safety rests on one invariant. A false positive suppresses a deletedChildren bin update. That bin is only read by the defensive-mode safety check, and prunedSet is nil whenever defensive mode is on — so with the optimisation active the missed update has no behavioural consequence. If a future change makes non-defensive pruning consult deletedChildren, that invariant must be re-checked.
Defensive mode is unchanged. When pruner_defensiveMode is on, the two-step ordering (delete children, then update parents) is preserved and the optimisation is disabled — the combined batch only kicks in when defensive mode is off.
The cuckoo's destructive Delete and the two-generation rotation are complementary. ~89% of adds get deleted by their child in the same session; the remaining ~11% are orphans that the rotation eventually drops. We need both.
Catch rate ramps within a session from ~7% at the start to ~99% at the end. That's intrinsic to scanning Aerospike by partition rather than by chain order; the cuckoo only knows about TXIDs scanned so far. A two-pass scan would flatten the ramp but doubles I/O — not worth it.

Test plan

go test ./util/cuckoo/... ./stores/utxo/aerospike/pruner/... -race passes
go vet clean
Microbenchmarks: parallel Add+CheckAndRemove at ~10 ns/op
Deployed across many iterations on dev-scale-1; sustained 2.6M rec/sec, 99.9% catch rate, smooth Grafana

…able Two changes that together let the PrunedTxSet optimisation actually bite for the tight-chain workload it was designed for. 1. Persist the set on the Service struct so children whose parents were pruned in earlier sessions can still skip the parent-update round-trip. In the previous design the set was allocated per `PruneWithPartitions` call. With `pruner_block_trigger=OnBlockMined` every block triggers a new session, so a child at height H+1 was never able to recognise its parent at height H — by then the parent's session had ended and the set had been thrown away. On dev-scale-1 this limited the catch rate to ~6.5% of would-be-wasted parent updates (24M / 371M). 2. Replace the hard-coded 10M cap with a `pruner_utxoPrunedSetMaxEntries` setting (default 10M, 0 = unlimited). Persistent sessions accumulate entries across many blocks, so the cap is more likely to saturate; operators tuning for high-fan-out workloads can raise it. Adds two gauge metrics so operators can see what's happening: - utxo_pruner_pruned_set_size — current Len() of the set - utxo_pruner_pruned_set_saturated — 1 if cap reached, 0 otherwise `CheckAndRemove` remains destructive: for parents with high fan-out only the first child to look skips the update. That's a separate concern from persistence and not addressed here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the exact-map PrunedTxSet (~130 B/entry) with a sharded cuckoo filter from github.com/seiflotfy/cuckoofilter (already a dep). Memory drops ~100x — 1B entries fit in ~1 GiB instead of ~130 GiB. Why this is safe in our usage: - False positives (~3% at the standard 8-bit fingerprint × 4-slot bucket configuration) cause a child to incorrectly skip a parent update, suppressing the deletedChildren bin write. That bin is only consulted by the defensive-mode safety check, which is always off when prunedSet is non-nil. So FPs are behaviourally harmless. - CheckAndRemove uses cuckoo Delete, which on rare fingerprint collisions can remove the wrong entry. That manifests as a lost future skip for the collision-evicted hash (one wasted Aerospike round-trip), not a correctness bug. Concurrency: per-shard mutex, same sharding strategy as before (h[0] & mask). 256 shards by default keeps lock contention low under the existing read-ahead/processor split. Why this matters operationally: On dev-scale-1 (PR bsv-blockchain#873 deploy) the previous map-backed set saturated at 350M entries / ~39 GiB and the catch rate plateaued around 30% of would-be-wasted parent updates. The map can't fit enough entries to span typical parent/child temporal gaps without pushing the pod close to OOM. The filter unlocks effectively unlimited capacity within a small memory budget, lifting catch rate toward the ~97% ceiling (3% FP penalty). API: the public PrunedTxSet surface is unchanged (Add, Contains, CheckAndRemove, Len, Saturated). Adds Saturated() semantics: now sticky on cumulative Insert failures rather than a Len-based threshold, and adds an InsertFailures() accessor. NewPrunedTxSet(_, 0) falls back to defaultPrunedTxSetCapacity (2B entries / ~2 GiB) instead of an unlimited map. Tests: - Replaced exact-equality assertions with probabilistic-tolerant ones (cuckoo Len can over-count duplicate Adds; Contains for non-Added hashes can FP). - Added TestPrunedTxSet_FalsePositiveRate to bound the FP rate at < 6%. - Added TestPrunedTxSet_Saturated to verify InsertFailures latches on capacity exhaustion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T11:46:26Z

🤖 Claude Code Review

Status: Complete

Current Review:

This PR introduces a high-performance in-memory pruned-TX set using a custom lock-free cuckoo filter to skip wasteful Aerospike parent updates during UTXO pruning. The implementation demonstrates strong engineering with comprehensive testing and thorough documentation.

Key Strengths:

Well-documented two-generation rotation strategy prevents saturation
Lock-free atomic operations eliminate contention overhead
Comprehensive test coverage including concurrency, false-positive rates, and saturation behavior
Clear documentation in settings aligning with implementation (line 560: "0 = use built-in 2B default (NOT unlimited)")
All tests use explicit capacity values (no dangerous maxEntries=0 in test code)

Remaining Issues:

Several Copilot comments from earlier reviews remain unaddressed. Most are minor documentation/clarity improvements:

Cuckoo filter alignment (util/cuckoo/cuckoo_h32.go) - Multiple comments about uint32 bucket alignment. Current implementation uses []uint32 which provides proper alignment, but the comments in the code could be clearer about why this is safe.
Saturation semantics (pruner_service.go:251-253) - The metric help text for utxo_pruner_pruned_set_saturated says it indicates "configured maxEntries cap is reached", but PrunedTxSet.Saturated() actually reports insert failures (an error state), not routine at-capacity behavior. Consider updating the help text or using Rotations() metric for capacity pressure instead.
Test timing assertion (cuckoo_h32_test.go:127-139) - The saturation short-circuit test pre-generates hashes outside the timed region (good!), but the 1-second timeout bound is generous for CI. This is acceptable as-is.

Documentation Accuracy:
✅ PR description correctly states the set is NOT persisted across restarts
✅ settings/settings.go:560 correctly documents "0 = use built-in 2B default (NOT unlimited)"
✅ settings/pruner_settings.go:28 provides comprehensive documentation of the feature
✅ Test comments accurately describe current behavior (e.g., line 143 mentions "up-front Add loop")

Verdict:
The core implementation is sound. The remaining issues are documentation/clarity improvements that don't affect correctness. The measured performance gains (1.64M → 2.6M records/sec, 99.9% catch rate) validate the approach.

History:

Earlier reviews identified documentation mismatches between old map-based and new cuckoo-filter implementation
✅ Fixed: Tests no longer use dangerous maxEntries=0
✅ Fixed: settings.go now correctly documents 0 → 2B default
✅ Fixed: Test comments updated to reflect current behavior

github-actions · 2026-05-15T12:00:31Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-875 (87d3796)

Summary

Regressions: 0
Improvements: 0
Unchanged: 148
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.642µ	1.602µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	71.30n	71.24n	~	1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	71.26n	71.25n	~	0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	71.08n	71.24n	~	0.500
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	33.26n	33.67n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	56.04n	55.93n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	125.9n	132.4n	~	0.100
MiningCandidate_Stringify_Short-4	217.4n	224.0n	~	0.100
MiningCandidate_Stringify_Long-4	1.652µ	1.665µ	~	0.100
MiningSolution_Stringify-4	855.2n	861.0n	~	0.500
BlockInfo_MarshalJSON-4	1.743µ	1.756µ	~	0.700
NewFromBytes-4	128.7n	156.6n	~	0.100
AddTxBatchColumnar_Validation-4	2.518µ	2.486µ	~	0.700
OffsetValidationLoop-4	646.1n	657.7n	~	0.200
Mine_EasyDifficulty-4	61.43µ	60.97µ	~	0.700
Mine_WithAddress-4	7.391µ	7.046µ	~	0.100
BlockAssembler_AddTx-4	0.03093n	0.02522n	~	0.700
AddNode-4	10.42	10.36	~	1.000
AddNodeWithMap-4	11.66	11.26	~	0.200
DiskTxMap_SetIfNotExists-4	3.463µ	3.448µ	~	0.700
DiskTxMap_SetIfNotExists_Parallel-4	3.368µ	3.237µ	~	0.700
DiskTxMap_ExistenceOnly-4	306.0n	304.9n	~	0.400
Queue-4	186.4n	186.7n	~	0.800
AtomicPointer-4	4.580n	4.760n	~	0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	894.8µ	850.9µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4	821.9µ	780.7µ	~	0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4	103.9µ	101.9µ	~	0.700
ReorgOptimizations/AllMarkFalse/New/10K-4	62.08µ	61.98µ	~	1.000
ReorgOptimizations/HashSlicePool/Old/10K-4	56.69µ	66.26µ	~	0.400
ReorgOptimizations/HashSlicePool/New/10K-4	11.49µ	10.63µ	~	0.400
ReorgOptimizations/NodeFlags/Old/10K-4	4.637µ	4.628µ	~	0.700
ReorgOptimizations/NodeFlags/New/10K-4	1.572µ	1.646µ	~	0.500
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	9.509m	9.419m	~	0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4	10.890m	9.727m	~	0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.228m	1.150m	~	0.100
ReorgOptimizations/AllMarkFalse/New/100K-4	685.4µ	676.7µ	~	0.200
ReorgOptimizations/HashSlicePool/Old/100K-4	599.7µ	611.2µ	~	0.700
ReorgOptimizations/HashSlicePool/New/100K-4	343.1µ	308.0µ	~	0.100
ReorgOptimizations/NodeFlags/Old/100K-4	51.67µ	53.41µ	~	0.400
ReorgOptimizations/NodeFlags/New/100K-4	18.13µ	18.05µ	~	0.700
TxMapSetIfNotExists-4	52.09n	52.46n	~	0.100
TxMapSetIfNotExistsDuplicate-4	40.29n	39.73n	~	0.100
ChannelSendReceive-4	598.0n	621.5n	~	0.100
DirectSubtreeAdd/4_per_subtree-4	47.26n	47.54n	~	1.000
DirectSubtreeAdd/64_per_subtree-4	23.63n	23.21n	~	0.400
DirectSubtreeAdd/256_per_subtree-4	21.94n	22.02n	~	0.400
DirectSubtreeAdd/1024_per_subtree-4	20.96n	21.33n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	20.56n	20.59n	~	0.600
SubtreeProcessorAdd/4_per_subtree-4	241.4n	234.3n	~	0.700
SubtreeProcessorAdd/64_per_subtree-4	228.0n	227.4n	~	1.000
SubtreeProcessorAdd/256_per_subtree-4	227.9n	226.0n	~	0.700
SubtreeProcessorAdd/1024_per_subtree-4	225.1n	237.3n	~	0.200
SubtreeProcessorAdd/2048_per_subtree-4	221.5n	220.9n	~	0.800
SubtreeProcessorRotate/4_per_subtree-4	230.5n	227.0n	~	1.000
SubtreeProcessorRotate/64_per_subtree-4	228.8n	229.8n	~	0.700
SubtreeProcessorRotate/256_per_subtree-4	233.4n	230.1n	~	0.400
SubtreeProcessorRotate/1024_per_subtree-4	222.2n	235.7n	~	0.100
SubtreeNodeAddOnly/4_per_subtree-4	43.95n	44.36n	~	0.100
SubtreeNodeAddOnly/64_per_subtree-4	28.39n	29.12n	~	0.700
SubtreeNodeAddOnly/256_per_subtree-4	27.56n	27.76n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	27.05n	27.06n	~	0.600
SubtreeCreationOnly/4_per_subtree-4	90.28n	90.85n	~	0.700
SubtreeCreationOnly/64_per_subtree-4	320.0n	326.6n	~	0.100
SubtreeCreationOnly/256_per_subtree-4	1.093µ	1.084µ	~	0.700
SubtreeCreationOnly/1024_per_subtree-4	3.429µ	3.481µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	6.140µ	6.311µ	~	0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	229.2n	218.5n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	229.7n	220.6n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	1.635m	1.617m	~	0.700
ParallelGetAndSetIfNotExists/10k_nodes-4	4.372m	4.181m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	6.523m	5.812m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	8.724m	8.027m	~	0.100
SequentialGetAndSetIfNotExists/1k_nodes-4	1.414m	1.421m	~	0.200
SequentialGetAndSetIfNotExists/10k_nodes-4	3.851m	3.543m	~	0.100
SequentialGetAndSetIfNotExists/50k_nodes-4	16.14m	11.21m	~	0.100
SequentialGetAndSetIfNotExists/100k_nodes-4	25.09m	20.66m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	1.655m	1.652m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	6.902m	6.933m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	11.91m	12.02m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	1.464m	1.440m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	8.664m	9.076m	~	0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	56.64m	49.08m	~	0.400
CalcBlockWork-4	530.1n	505.6n	~	0.700
CalculateWork-4	695.2n	697.1n	~	0.400
CheckOldBlockIDs/on-chain-prefetch/1000-4	56.56µ	64.44µ	~	0.700
CheckOldBlockIDs/off-chain-prefetch/1000-4	47.11µ	45.94µ	~	0.200
CheckOldBlockIDs/on-chain-prefetch/10000-4	419.1µ	419.5µ	~	0.700
CheckOldBlockIDs/off-chain-prefetch/10000-4	337.7µ	332.3µ	~	0.100
BuildBlockLocatorString_Helpers/Size_10-4	1.361µ	1.358µ	~	1.000
BuildBlockLocatorString_Helpers/Size_100-4	13.03µ	12.99µ	~	1.000
BuildBlockLocatorString_Helpers/Size_1000-4	128.1µ	128.8µ	~	0.200
CatchupWithHeaderCache-4	104.5m	104.5m	~	1.000
_BufferPoolAllocation/16KB-4	4.339µ	4.156µ	~	0.700
_BufferPoolAllocation/32KB-4	9.027µ	10.620µ	~	0.700
_BufferPoolAllocation/64KB-4	20.48µ	17.01µ	~	0.200
_BufferPoolAllocation/128KB-4	35.45µ	37.35µ	~	0.100
_BufferPoolAllocation/512KB-4	146.3µ	123.1µ	~	0.100
_BufferPoolConcurrent/32KB-4	21.70µ	20.26µ	~	0.200
_BufferPoolConcurrent/64KB-4	34.41µ	31.98µ	~	0.100
_BufferPoolConcurrent/512KB-4	155.6µ	154.8µ	~	0.700
_SubtreeDeserializationWithBufferSizes/16KB-4	687.5µ	662.9µ	~	0.200
_SubtreeDeserializationWithBufferSizes/32KB-4	748.6µ	682.7µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	742.7µ	692.4µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	748.3µ	696.4µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	667.7µ	603.7µ	~	0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4	37.09m	37.19m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4	37.28m	37.03m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4	36.88m	37.03m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4	36.89m	37.36m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4	36.69m	36.96m	~	0.400
_PooledVsNonPooled/Pooled-4	740.6n	739.3n	~	0.100
_PooledVsNonPooled/NonPooled-4	8.958µ	9.156µ	~	0.700
_MemoryFootprint/Current_512KB_32concurrent-4	6.949µ	7.026µ	~	0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.913µ	10.228µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.474µ	9.545µ	~	0.400
_prepareTxsPerLevel-4	412.7m	407.0m	~	0.700
_prepareTxsPerLevelOrdered-4	3.743m	3.477m	~	0.200
_prepareTxsPerLevel_Comparison/Original-4	417.3m	411.4m	~	0.200
_prepareTxsPerLevel_Comparison/Optimized-4	3.851m	3.534m	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.436m	1.421m	~	0.700
SubtreeSizes/10k_tx_16_per_subtree-4	330.3µ	330.2µ	~	1.000
SubtreeSizes/10k_tx_64_per_subtree-4	80.83µ	80.80µ	~	0.700
SubtreeSizes/10k_tx_256_per_subtree-4	20.42µ	20.22µ	~	0.700
SubtreeSizes/10k_tx_512_per_subtree-4	10.093µ	9.954µ	~	0.100
SubtreeSizes/10k_tx_1024_per_subtree-4	4.971µ	4.950µ	~	1.000
SubtreeSizes/10k_tx_2k_per_subtree-4	2.483µ	2.470µ	~	0.300
BlockSizeScaling/10k_tx_64_per_subtree-4	77.72µ	79.03µ	~	0.400
BlockSizeScaling/10k_tx_256_per_subtree-4	20.17µ	19.91µ	~	0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.955µ	4.950µ	~	1.000
BlockSizeScaling/50k_tx_64_per_subtree-4	393.1µ	392.3µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	98.20µ	97.16µ	~	0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4	24.86µ	24.60µ	~	0.400
SubtreeAllocations/small_subtrees_exists_check-4	161.7µ	161.4µ	~	1.000
SubtreeAllocations/small_subtrees_data_fetch-4	171.7µ	168.8µ	~	0.100
SubtreeAllocations/small_subtrees_full_validation-4	328.5µ	324.5µ	~	0.100
SubtreeAllocations/medium_subtrees_exists_check-4	9.882µ	9.909µ	~	0.400
SubtreeAllocations/medium_subtrees_data_fetch-4	10.92µ	10.80µ	~	0.100
SubtreeAllocations/medium_subtrees_full_validation-4	20.64µ	20.28µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.408µ	2.436µ	~	0.400
SubtreeAllocations/large_subtrees_data_fetch-4	2.697µ	2.636µ	~	0.200
SubtreeAllocations/large_subtrees_full_validation-4	5.170µ	5.086µ	~	0.100
StoreBlock_Sequential/BelowCSVHeight-4	332.8µ	330.5µ	~	1.000
StoreBlock_Sequential/AboveCSVHeight-4	338.5µ	333.1µ	~	0.700
GetUtxoHashes-4	280.9n	279.9n	~	1.000
GetUtxoHashes_ManyOutputs-4	50.94µ	45.94µ	~	0.100
_NewMetaDataFromBytes-4	213.8n	215.0n	~	0.200
_Bytes-4	400.4n	396.3n	~	0.700
_MetaBytes-4	139.5n	138.4n	~	0.100

Threshold: >10% with p < 0.05 | Generated: 2026-06-04 16:08 UTC

Replaces the seiflotfy/cuckoofilter backing with a tiny in-package implementation (cuckooH32) specialised for chainhash.Hash inputs. Before: BenchmarkCuckoo_Add 3357 ns/op 32 B/op 1 alloc/op BenchmarkCuckoo_CheckAndRemove_Hit 84 ns/op 32 B/op 1 alloc/op Every Add/Lookup/Delete allocated a slice header because the library's hash interface dispatch caused h[:] to escape. At the pruner's observed ~1.5M ops/sec that translates to tens of MB/sec of allocation churn — exactly the kind of pattern Go's GC handles least well, and a plausible contributor to the throughput regression observed on dev-scale-1. After: BenchmarkCuckoo_Add 41.5 ns/op 0 B/op 0 allocs/op BenchmarkCuckoo_CheckAndRemove_Hit 37 ns/op 0 B/op 0 allocs/op BenchmarkCuckoo_Parallel 27 ns/op 0 B/op 0 allocs/op Design: - *[32]byte instead of []byte — no slice-header alloc, no interface escape. - Skip rehashing — chainhash is already a SHA-256 digest, so fingerprint comes from h[0] and bucket index from h[1:9] directly. - Shard on h[9] (not h[0]) so per-shard distribution is independent of the fingerprint bytes consumed by the filter. - Deterministic eviction slot (fp ^ k) — keeps the filter alloc-free while rotating through bucket slots during cuckoo kicks. FP rate, semantics, and safety guarantees unchanged — confirmed by TestCuckooH32_FalsePositiveRate and TestPrunedTxSet_FalsePositiveRate (< 6% over 100K probes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CPU profile of the previous version on dev-scale-1 showed: - ~28% of CPU in runtime.lock2/futex/procyield/syscall — sync.Mutex contention on the per-shard locks of PrunedTxSet. - 20% of processRecordChunk time inside CheckAndRemove, of which most was lock acquire/release overhead, not cuckoo work itself. Each bucket of the cuckoo filter is exactly 4 bytes (4 x 1-byte fingerprints) and naturally aligned in the bucket slice — perfect for treating as a single uint32 with atomic ops. This commit replaces the sync.Mutex-per-shard model with lock-free operations on the packed uint32 bucket word: - Lookup: one atomic.LoadUint32 + SWAR byte-equality test, zero memory writes. - Insert: per-slot CompareAndSwapUint32 (0 -> fp). On bucket full, eviction loop also uses CAS — race-tolerant: a concurrent insert may displace us back, costing one future lost skip but never corrupting the filter or losing fingerprints atomically. - Delete: CompareAndSwapUint32 (fp -> 0). PrunedTxSet no longer holds any mutex; sharding is retained purely for cache locality. Benchmark (Apple M3 Max, 16 cores): Before After Map (saturated) Parallel Add+Check (ns/op): 27.2 10.11 10.20 Single-thread Add (ns/op): 41.5 36.57 1.80 Single-thread Del (ns/op): 37 43.07 - Under parallel load the lock-free cuckoo is now identical to the saturated-map fast path. Single-thread CAS overhead is small and irrelevant — the production pruner is heavily parallel (6 partition workers x ~24 chunk goroutines each). go test -race passes; concurrency assumptions hold up to TSAN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T13:54:10Z

+	}
+}
+
+func (cf *cuckooH32) bucketAddr(i uint64) *uint32 {


[OUTDATED] File moved to util/cuckoo/cuckoo_h32.go - see current code uses uint32 buckets with atomic ops.

The alignment concern here is nuanced. While Go doesn't explicitly guarantee 4-byte alignment for [4]uint8, in practice the slice allocator will provide sufficient alignment for the atomic operations to work. However, this is an implementation detail.

That said, I verified the tests include concurrent access tests (TestPrunedTxSet_ConcurrentAccess with 100 goroutines) which would catch alignment issues on ARM if they existed. The code has been tested and works.

For absolute safety across all architectures, using []uint32 as suggested would be clearer. But the current approach is functionally correct given Go's runtime behavior.

The reader hot path used to call prunedSet.Add per scanned record. CPU profile on dev-scale-1 showed ~5% of total CPU there, with the cost dominated by cuckoo bucket cache misses (the 2 GB filter doesn't fit in cache). Move the Adds off the critical path: collect each chunk's record TXIDs during processRecordChunk, then fire them off in a parallel goroutine exactly when flushCleanupBatches kicks off its blocking BatchOperate. Adds run during the Aerospike network wait — work the pruner was previously parked on anyway. Lock-free cuckoo means concurrent Adds across chunk goroutines are safe. We wait for the Add goroutine to drain before returning so subsequent chunks see the entries — but Adds typically finish in ~10us per 1024 records vs ~10ms network wait, so the wait is a no-op. Trade-off: same-chunk parent/child catches are lost because the parent's Add happens at the end of the chunk that contained it, after that chunk's CheckAndRemoves. Cross-chunk catches (the dominant case in randomly-partitioned scans) are preserved. Updated TestProcessRecordChunk_SkipsParentsInPrunedSet to reflect the new behaviour: the child record's TXID is now Added during the test's flush, so Len() ends at 1 instead of 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T15:01:28Z

 	SkipPreserveParents             bool          `key:"pruner_skipPreserveParents" desc:"Skip Phase 1: preserve parents of unmined transactions" default:"false" category:"Pruner" usage:"Skip parent preservation phase" type:"bool" longdesc:"### Purpose\nSkips Phase 1 of pruning which preserves parent transactions of old unmined transactions.\n\n### How It Works\nWhen enabled, the pruner skips calling PreserveParentsOfOldUnminedTransactions.\nThis means parent transactions will not be protected from deletion even if they have unmined children.\n\n### Trade-offs\n| Setting | Benefit | Drawback |\n|---------|---------|----------|\n| false (default) | Parents preserved for unmined tx resubmission | Additional processing overhead |\n| true | Faster pruning, reduced processing | Parents may be deleted, breaking unmined tx chains |\n\n### Recommendations\n- **false** (default) - Normal operation, preserves parent transactions\n- **true** - Skip parent preservation if you don't need to resubmit unmined transactions"`
 	MinBlockHeight                  uint32        `key:"pruner_min_block_height" desc:"Minimum block height before pruning begins" default:"0" category:"Pruner" usage:"Skip all pruning until block height exceeds this value" type:"uint32" longdesc:"### Purpose\nPrevents all pruning operations until the blockchain reaches a minimum height.\n\n### How It Works\n- When set to a value > 0, the pruner skips ALL operations (parent preservation, DAH deletion, blob deletion) for any block with height <= this value\n- At height 0 (default), pruning behaves normally from the start\n- Useful for fresh environment bootstrapping where initial blocks must remain available for cross-node validation\n\n### Recommendations\n- **0** (default) - Normal operation, prune from the start\n- **300** - Typical value for dev environments that need to mine initial blocks for coinbase splitting"`
 	SkipDeletions                   bool          `key:"pruner_skipDeletions" desc:"Skip deletion operations" default:"false" category:"Pruner" usage:"Skip deletion operations" type:"bool" longdesc:"### Purpose\nSkips deletion operations during pruning.\n\n### How It Works\nWhen enabled, the pruner skips deletion operations during pruning.\n\n### Recommendations\n- **false** (default) - Normal operation\n- **true** - Skip deletion operations"`
+	UTXOPrunedSetMaxEntries         int           `key:"pruner_utxoPrunedSetMaxEntries" desc:"Soft cap on entries in the in-memory pruned-TX set" default:"10000000" category:"Pruner" usage:"0 = unlimited; reached entries become no-op Adds" type:"int" longdesc:"### Purpose\nCaps the in-memory size of the PrunedTxSet, which tracks TXIDs pruned across sessions so that wasteful Aerospike parent updates can be skipped pre-flight.\n\n### How It Works\n- The set is persisted on the Service struct and reused across prune sessions.\n- When a record is scanned for deletion, its TXID is registered. Children later check the set to skip parent-update round-trips for parents already known to be gone.\n- Once Len() reaches this cap, further Add() calls become silent no-ops and the optimisation degrades to baseline behaviour for any TXID added after saturation.\n- At ~96 bytes per entry (32-byte hash + Go map overhead), 10M entries is ~1 GB worst-case; 100M ≈ 10 GB.\n\n### Recommendations\n- **10000000** (default) - safe ceiling for general workloads\n- Increase (e.g. 100000000) for tightly-chained / high-throughput workloads where parents span prune sessions and the catch rate is bottlenecked by saturation\n- **0** - unlimited; only safe if you know the working-set will not grow unbounded\n- Disabled automatically when pruner_utxoDefensiveEnabled=true (defensive mode bypasses the optimisation)"`


Superseded by newer comment on the same issue.

Acknowledged — this earlier note has been superseded by the newer review comment on the same line. The follow-up is addressed in the next commit.

github-actions · 2026-05-15T15:01:35Z

-	if !s.defensiveEnabled {
-		prunedSet = NewPrunedTxSet(256, prunedTxSetMaxEntries)
-	}
+	// Memory is bounded by settings.Pruner.UTXOPrunedSetMaxEntries (default 10M, ~96 B/entry).


This was correctly noted - the pruner_service.go documentation is accurate and matches the cuckoo implementation.

Acknowledged — no change needed; the pruner_service.go docstring already matches the cuckoo implementation.

Previous commit spawned a goroutine per chunk to overlap prunedSet Adds with the Aerospike batch wait. At ~1700 chunks/sec across all partition workers that turned out to introduce visible throughput jitter — most likely from channel allocation + scheduler dispatch cost, plus the GC pressure visible at ~2 GC/sec with p99 pauses of 5.5 ms. The Adds themselves are tiny: ~30 us for a 1024-record chunk (lock-free atomic CAS, ~30 ns each). That's <1% of the typical ~10 ms flushCleanupBatches network wait. Whether we do them in parallel with the wait or sequentially after it is irrelevant to throughput — but the per-chunk goroutine churn was measurable. So drop the goroutine entirely. Adds run inline immediately after the flush returns. Same throughput, smoother per-pod rate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous commit moved Adds to inline-after-flush, eliminating per-chunk goroutine churn. That fixed the throughput jitter but dropped catch rate from ~80% to ~56% because chunk N's TXIDs only became visible to concurrent chunks AFTER chunk N's ~10 ms Aerospike wait. Other concurrent chunks (chunkGroupLimit=4) running during that wait would do their CheckAndRemoves and miss chunk N's records. Move the Adds to the START of processRecordChunk, before any CheckAndRemove fires. Same total ~30 us cost for a 1024-record chunk (lock-free CAS, ~30 ns each), trivial vs the ~10 ms flush that follows. Now chunk N's TXIDs are visible to concurrent chunks throughout N's entire CPU + flush window. Expected effect: catch rate recovers toward ~80% without re-introducing the goroutine-spawn jitter, throughput stays at ~1.7M records/sec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A future developer reading the pruner code will reasonably ask: why not just use seiflotfy/cuckoofilter, why a custom implementation, why lock-free, why are Adds done at the top of processRecordChunk? Encode the answers in the source so they don't have to git-archaeology the PR history. - cuckoo_h32.go header now explains the previous developer's FAQ: why we wrote our own (allocation escape via the library's []byte hasher interface), why specialised for *[32]byte, why lock-free (CPU profile showed ~28% of total in mutex contention). - pruner_service.go reader-stage comment now explains why Add is NOT called there and the two reasons it's done up-front in processRecordChunk (cache locality + concurrent-chunk visibility, with the concrete catch-rate numbers from dev-scale-1 testing). - PruneWithPartitions block updated: stale "10M default" cap and "~96 B/entry" memory comments fixed to reflect the cuckoo's 2B default and ~1 B/entry footprint. No behavioural changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CPU profile on dev-scale-1 caught the pathology: a 2 GiB cuckoo at ~92% load was burning ~22 cores worth of CPU on a 15-core pod just running the 500-kick eviction loop on every Insert. Pod CPU pegged at limit, but actual pruning throughput collapsed from 2.4M rec/sec to ~200K rec/sec. Root cause: Insert had no give-up condition once the filter was full. For every Add whose two candidate buckets were both occupied, it ran the full eviction loop (500 kicks × ~30 ns ≈ 15 us) chasing slots that, by definition, couldn't exist. At ~1.5M Adds/sec that consumed ~22 cores of pure overhead. Fix: latch a `saturated` atomic.Bool when the eviction loop first exhausts MaxKicks. Subsequent Inserts that find both buckets full return false immediately without re-running the loop. The fast path (slot-CAS in either candidate bucket) still runs, so when CheckAndRemove later frees slots the filter recovers transparently; we just refuse to do the expensive rebalancing work that — once proven impossible — is permanently expected to fail. Saturation latch is sticky for the lifetime of the cuckoo instance. A process restart resets it. TestCuckooH32_SaturationLatchesAndShortCircuits asserts the bound: 100K saturated Inserts complete in <1 s where the un-fixed version would take ~1.5 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single-filter PrunedTxSet saturated in ~50 min at sustained 1.7M TPS on dev-scale-1. After saturation, all new Adds were short-circuited (by the earlier eviction-loop fix), so cross-session catches collapsed from ~98% to ~14% within hours. The set effectively froze, holding stale historical entries that didn't match current children's parents. This commit rotates: each shard keeps a `current` filter (receives Adds) and a `previous` filter (read-only, last epoch's history). When `current` refuses an Insert, it slides into the `previous` slot (dropping the older `previous`), a fresh `current` is allocated, and the Add is retried. Lookup/CheckAndRemove try current first, fall back to previous on miss. Effect: the set never freezes. At 1.7M Adds/sec with the configured 8B total cap (= 4B per generation), each generation rotates ~every 40 min. Worst case, the set always has the last ~40 min of entries available — far longer than any tight-chain parent-to-child gap. Throughput preservation (the critical requirement): - Before first rotation, `previous == nil`. Hot path is byte- identical to the single-filter implementation. Zero overhead. - After rotation, Add still costs 1 atomic.Pointer.Load + 1 cuckoo CAS (~1 ns extra). CheckAndRemove that hits in current returns immediately — same cost as today. Miss path adds 1 atomic load + 1 cuckoo lookup on previous (~5-10 ns). - Bench: BenchmarkCuckoo_Parallel_AddPlusCheck 10.58 ns/op (was 10.11 ns/op single-filter). 0 allocs. No regression. Memory: total ≈ 2 × maxEntries × ~1 byte. NewPrunedTxSet sizes each generation at maxEntries / (2 × shardCount) so total memory budget matches the configured cap. Saturated() now reads as "something went badly wrong" (a fresh generation refused an Insert) — never expected in normal operation. The new Rotations() accessor surfaces rotation count for observability. TestPrunedTxSet_Saturated renamed to TestPrunedTxSet_RotatesUnderLoad to reflect the new behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

flushCleanupBatches previously sent two sequential Aerospike round-trips per chunk: first the parent UDF updates, then the child deletions. Each round-trip costs ~10-20 ms of network latency. With ~57K chunks per worker per session × 12 workers, even at chunkGroupLimit=4 parallelism that's tens of seconds of round-trip latency per session. Aerospike's BatchOperate accepts a mixed []BatchRecordIfc list with BatchUDF (or BatchWrite) records AND BatchDelete records interleaved. Server fans them out per-node and we get ONE network round-trip covering both kinds of operation. Bonus saving: client-side parseRecordResults runs once instead of twice — the CPU profile showed that path at ~20% of pruner CPU. Defensive mode keeps the original two-step sequence because its safety check has an implicit ordering dependency between parent deletedChildren writes and child deletions. Non-defensive (the dev-scale-1 default) uses the combined path; the deletedChildren bin isn't read in that configuration so within-batch order is irrelevant. The new executeBatchCleanupCombined classifies results by region: [0, parentEnd) parsed as UDF/Write returns (TX_NOT_FOUND silently counted); [parentEnd, end) parsed as Delete results (KEY_NOT_FOUND treated as idempotent success). Prometheus counters preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T12:38:12Z

Additional Configuration Issue - Default Value Mismatch

Beyond the documentation issues in the existing inline comments, there's a critical inconsistency in the default value:

Files claim different defaults:

settings/settings.go:533 → actual runtime default: 10_000_000 (10M)
pruner_service.go:714 → comment says: "default 2B entries"
pruned_tx_set.go:11 → fallback constant: 2_000_000_000 (2B)
PR description → claims: 2_000_000_000 (2B)

Impact:
The real default is 10M (from settings.go), not 2B. This 200x difference causes:

Optimization saturates 200x faster than users expect
Memory footprint ~20 MB instead of expected ~4 GB
Catch rate degrades much earlier in production workloads

Needs fix: Either update settings.go to 2B, or update all docs/comments to reflect 10M reality.

(Related to inline comment thread on pruner_settings.go:28)

…oo-filter

Moves the lock-free zero-alloc cuckoo filter specialised for 32-byte hashes from stores/utxo/aerospike/pruner/cuckoo_h32.go to a new util/cuckoo package as cuckoo.H32. The pruner now imports it. Why extract: teranode currently uses github.com/seiflotfy/cuckoofilter directly in three places (model/disk_parent_spends_map.go, model/disk_tx_map_uint64.go, services/blockassembly/subtreeprocessor/ disk_tx_map.go). Those call sites pay the same allocation/lock costs this implementation solved for the pruner — taking []byte arguments that escape via interface dispatch, plus sync.Mutex per shard. This commit does NOT migrate the existing call sites. Their key sizes differ (36-byte inpoints, uint64) so they can't drop in cuckoo.H32 as-is. What this commit gives them is a worked example of the optimisations (pointer arguments, packed-bucket atomic CAS, saturated short-circuit) and a place to live alongside if/when they get a similar treatment. API: - cuckoo.NewH32(capacity uint) *cuckoo.H32 - (*H32).Insert(*[32]byte) bool - (*H32).Lookup(*[32]byte) bool - (*H32).Delete(*[32]byte) bool - (*H32).Count() int - (*H32).Saturated() bool No behavioural change in the pruner. All four cuckoo tests pass under -race in the new location. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T09:39:21Z

 	SkipPreserveParents             bool          `key:"pruner_skipPreserveParents" desc:"Skip Phase 1: preserve parents of unmined transactions" default:"false" category:"Pruner" usage:"Skip parent preservation phase" type:"bool" longdesc:"### Purpose\nSkips Phase 1 of pruning which preserves parent transactions of old unmined transactions.\n\n### How It Works\nWhen enabled, the pruner skips calling PreserveParentsOfOldUnminedTransactions.\nThis means parent transactions will not be protected from deletion even if they have unmined children.\n\n### Trade-offs\n| Setting | Benefit | Drawback |\n|---------|---------|----------|\n| false (default) | Parents preserved for unmined tx resubmission | Additional processing overhead |\n| true | Faster pruning, reduced processing | Parents may be deleted, breaking unmined tx chains |\n\n### Recommendations\n- **false** (default) - Normal operation, preserves parent transactions\n- **true** - Skip parent preservation if you don't need to resubmit unmined transactions"`
 	MinBlockHeight                  uint32        `key:"pruner_min_block_height" desc:"Minimum block height before pruning begins" default:"0" category:"Pruner" usage:"Skip all pruning until block height exceeds this value" type:"uint32" longdesc:"### Purpose\nPrevents all pruning operations until the blockchain reaches a minimum height.\n\n### How It Works\n- When set to a value > 0, the pruner skips ALL operations (parent preservation, DAH deletion, blob deletion) for any block with height <= this value\n- At height 0 (default), pruning behaves normally from the start\n- Useful for fresh environment bootstrapping where initial blocks must remain available for cross-node validation\n\n### Recommendations\n- **0** (default) - Normal operation, prune from the start\n- **300** - Typical value for dev environments that need to mine initial blocks for coinbase splitting"`
 	SkipDeletions                   bool          `key:"pruner_skipDeletions" desc:"Skip deletion operations" default:"false" category:"Pruner" usage:"Skip deletion operations" type:"bool" longdesc:"### Purpose\nSkips deletion operations during pruning.\n\n### How It Works\nWhen enabled, the pruner skips deletion operations during pruning.\n\n### Recommendations\n- **false** (default) - Normal operation\n- **true** - Skip deletion operations"`
+	UTXOPrunedSetMaxEntries         int           `key:"pruner_utxoPrunedSetMaxEntries" desc:"Soft cap on entries in the in-memory pruned-TX set" default:"10000000" category:"Pruner" usage:"0 = unlimited; reached entries become no-op Adds" type:"int" longdesc:"### Purpose\nCaps the in-memory size of the PrunedTxSet, which tracks TXIDs pruned across sessions so that wasteful Aerospike parent updates can be skipped pre-flight.\n\n### How It Works\n- The set is persisted on the Service struct and reused across prune sessions.\n- When a record is scanned for deletion, its TXID is registered. Children later check the set to skip parent-update round-trips for parents already known to be gone.\n- Once Len() reaches this cap, further Add() calls become silent no-ops and the optimisation degrades to baseline behaviour for any TXID added after saturation.\n- At ~96 bytes per entry (32-byte hash + Go map overhead), 10M entries is ~1 GB worst-case; 100M ≈ 10 GB.\n\n### Recommendations\n- **10000000** (default) - safe ceiling for general workloads\n- Increase (e.g. 100000000) for tightly-chained / high-throughput workloads where parents span prune sessions and the catch rate is bottlenecked by saturation\n- **0** - unlimited; only safe if you know the working-set will not grow unbounded\n- Disabled automatically when pruner_utxoDefensiveEnabled=true (defensive mode bypasses the optimisation)"`


✅ Resolved - Documentation has been updated. The pruner_settings.go now correctly reflects the cuckoo filter memory usage (~1 byte/entry): 10M ≈ 10 MiB, 100M ≈ 100 MiB, 2B ≈ 2 GiB.

Addressed in the next commit. Replaced the ~96 bytes / map-overhead memory estimate with the cuckoo-filter reality (10M ≈ 20 MB, 100M ≈ 200 MB, 2B ≈ 2 GiB) and clarified that maxEntries=0 means the built-in 2B default (not unlimited). Also tightened the usage tag and recommendations to match.

…oo-filter

Copilot

Pull request overview

This PR aims to improve Aerospike UTXO pruning throughput by avoiding wasted “update parent” round-trips when the parent TX has already been pruned, using a sharded cuckoo-filter-backed PrunedTxSet and (when defensive mode is off) combining parent updates + child deletions into a single BatchOperate.

Changes:

Introduces a specialized lock-free cuckoo filter (util/cuckoo.H32) and uses it to rework PrunedTxSet into a sharded two-generation structure.
Updates the Aerospike pruner to (a) pre-register chunk TXIDs in the pruned set before processing and (b) optionally combine parent updates + deletions into one batch call.
Adds config/metrics and updates tests/benchmarks for the new pruned-set behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
util/cuckoo/cuckoo_h32.go	Adds lock-free cuckoo filter implementation specialized for 32-byte hashes.
util/cuckoo/cuckoo_h32_test.go	Adds correctness and perf-guard tests for the new cuckoo filter.
stores/utxo/aerospike/pruner/pruner_service.go	Integrates persistent-in-process `PrunedTxSet`, adds metrics, and introduces combined batch cleanup when not in defensive mode.
stores/utxo/aerospike/pruner/pruned_tx_set.go	Replaces map-backed pruned set with sharded two-generation cuckoo-backed set and rotation logic.
stores/utxo/aerospike/pruner/pruned_tx_set_test.go	Updates unit tests to reflect probabilistic behavior + rotations.
stores/utxo/aerospike/pruner/pruned_tx_set_bench_test.go	Adds benchmarks comparing cuckoo-backed set vs prior map approach.
stores/utxo/aerospike/pruner/pruned_set_skip_test.go	Updates skip-path unit test expectations for the new add timing/behavior.
stores/utxo/aerospike/pruner/partition_retry_test.go	Ensures new Prometheus gauges are initialized for tests that bypass `NewService()`.
settings/settings.go	Adds `pruner_utxoPrunedSetMaxEntries` setting default.
settings/pruner_settings.go	Documents the new `pruner_utxoPrunedSetMaxEntries` setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

freemans13 · 2026-05-19T12:47:34Z

+		fp = displaced
+		i = cf.altIndex(fp, i)
+		if cf.tryInsertCAS(fp, i) {
+			return true
+		}


Good catch — fixed. The eviction loop now treats displaced==nullFinger as immediate success (a concurrent Delete left the slot empty, so our fp landed cleanly and there is nothing to relocate). Count is incremented inline in that branch so it matches the tryInsertCAS bookkeeping.

Already handled — when displaced == nullFinger the eviction path increments Count and returns true immediately, so tryInsertCAS is never called with fp == 0. No spurious insert and no double-count.

freemans13 · 2026-05-19T12:47:36Z

+		if !swapped {
+			cf.saturated.Store(true)
+			return false
+		}


Agreed — the inner-CAS retry exhaustion is contention, not provable saturation. Fixed: the saturated flag is no longer latched on the !swapped branch (we just back out and let the caller retry). Latching is now reserved for the genuine maxKicks-exhaustion path at the bottom of the loop.

Already handled — the !swapped (CAS-retries-exhausted) path returns false WITHOUT latching saturated; only exhausting the full maxKicks eviction loop latches it. Transient contention therefore no longer permanently disables eviction.

freemans13 · 2026-05-19T12:47:38Z

+	// Memory is bounded by settings.Pruner.UTXOPrunedSetMaxEntries (default 2B entries
+	// ≈ 2 GiB at the cuckoo's ~1 byte per entry). Once the cap is reached, Insert silently
+	// fails for new entries and the optimisation degrades to baseline; the
+	// utxo_pruner_pruned_set_saturated gauge surfaces this.


Updated the comment block in pruner_service.go to reflect the actual model: in-memory only (not disk-persisted), default 10M settings cap with 2B opt-in via 0, two-generation rotation on saturation rather than silent Insert failure, and the new utxo_pruner_pruned_set_rotations_total counter for the routine saturation signal.

Addressed — that comment block was rewritten to match the implementation: default 10M (0 ⇒ 2B), and rotation-on-saturation rather than 'Insert silently fails once the cap is reached'.

freemans13 · 2026-05-19T12:47:39Z

+		return errors.NewStorageError("%d parent update operations failed", parentErrors)
+	}
+	if deleteErrors > 0 {
+		s.logger.Errorf("Combined cleanup: %d deletion errors (first: %v)", deleteErrors, firstDeleteErr)
+		return errors.NewStorageError("%d deletion operations failed", deleteErrors)


Fixed — firstParentErr is now wrapped as the final variadic argument to NewStorageError so the aerospike error chain is preserved on the returned error.

Addressed — the combined-cleanup error now wraps the underlying cause: NewStorageError("%d parent update operations failed", parentErrors, firstParentErr) and likewise firstDeleteErr for deletions, so the root Aerospike error is preserved in the chain.

freemans13 · 2026-05-19T12:47:41Z

+		return errors.NewStorageError("%d parent update operations failed", parentErrors)
+	}
+	if deleteErrors > 0 {
+		s.logger.Errorf("Combined cleanup: %d deletion errors (first: %v)", deleteErrors, firstDeleteErr)
+		return errors.NewStorageError("%d deletion operations failed", deleteErrors)


Fixed — firstDeleteErr is now wrapped as the final variadic argument to NewStorageError so the underlying aerospike cause is preserved.

Addressed — the combined-cleanup error now wraps the underlying cause: NewStorageError("%d parent update operations failed", parentErrors, firstParentErr) and likewise firstDeleteErr for deletions, so the root Aerospike error is preserved in the chain.

freemans13 · 2026-05-19T12:47:53Z

+	// PrunedTxSet is persistent across prune sessions so children whose parents were pruned
+	// in earlier sessions can still skip the parent-update round-trip. Defensive mode is
+	// incompatible with the optimisation because records may be skipped after the reader
+	// registers them.


The PR description was overstating things — the set is held on the Service struct in memory, so it survives across prune sessions within a pod lifetime but is NOT persisted to disk across restarts. Updated the source-code docstring in pruner_service.go to spell that out ("lives only in memory — rebuilt from scratch on pod restart"). I'm not able to edit the PR body from this loop, so I'll update it manually after the loop.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

freemans13 · 2026-05-19T12:47:55Z

+	// Time a batch of Inserts that we expect to fail. Even at b.N = 1M
+	// these should all return immediately (no eviction churn), so total
+	// wall time is bounded by the fast-path overhead only.
+	start := time.Now()
+	const n = 100_000
+	for i := 0; i < n; i++ {
+		var h [32]byte
+		_, err := rand.Read(h[:])
+		require.NoError(t, err)
+		cf.Insert(&h)


Fixed — hashes are now pre-generated outside the timed loop in TestH32_SaturationLatchesAndShortCircuits, so the measurement reflects the saturated-Insert fast path rather than crypto/rand.Read cost.

Addressed — the saturation timing test now pre-generates all hashes outside the timed region, so the measured elapsed reflects the short-circuited Insert fast path rather than crypto/rand.Read. A comment explains this.

freemans13 · 2026-05-19T12:47:58Z

 func TestPrunedTxSet_AddAndContains(t *testing.T) {
 	set := NewPrunedTxSet(16, 0)



Fixed — all NewPrunedTxSet calls in pruned_tx_set_test.go that previously used 0 now pass 4096 (a comfortable small ceiling for the tests that exercise add/contains/remove/rotation behaviour). TestPrunedTxSet_DefaultCapacity now just sanity-checks the defaultPrunedTxSetCapacity constant rather than actually constructing a 2B filter.

Addressed — these tests now pass an explicit small maxEntries (e.g. 4096 / 1_048_576) instead of 0, so they no longer fall back to the 2B default and its multi-GiB allocation.

freemans13 · 2026-05-19T12:48:00Z

+	require.Equal(t, 1, prunedSet.Len(),
+		"parents removed, child TXID added during the parallel-Add overlap")


Fixed — both NewPrunedTxSet calls in pruned_set_skip_test.go now pass 4096 instead of 0.

Addressed — these tests now pass an explicit small maxEntries (e.g. 4096 / 1_048_576) instead of 0, so they no longer fall back to the 2B default and its multi-GiB allocation.

Update pruner_utxoPrunedSetMaxEntries memory documentation to match the cuckoo-filter implementation. The previous text described the old map-based design (~96 B/entry, 10M=1GB, 100M=10GB), which was off by ~50x. Also clarify that maxEntries=0 uses the built-in 2B default rather than being unlimited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

freemans13 · 2026-05-19T13:05:25Z

+// bucketH32 holds 4 fingerprint bytes. Naturally 4-byte aligned in the
+// `buckets` slice, so atomic uint32 ops on the underlying memory are safe.
+type bucketH32 [bucketSize]uint8
+
+// NewH32 returns a filter sized to hold approximately the requested number


Already addressed in 2cfdd0e — buckets is now []uint32 (not [4]uint8), so atomic uint32 ops on slice elements are alignment-guaranteed and the unsafe import is gone. This Copilot review batch landed against the round-1 HEAD (8657c1d) before the fix had pushed.

This concern has been addressed in the current code. The implementation now uses buckets []uint32 (line 57 of cuckoo_h32.go) instead of []bucketH32, which guarantees 4-byte alignment for atomic operations. The bucketAddr method returns &cf.buckets[i] which is a properly aligned *uint32.

Addressed — bucket storage is now a []uint32 (not [4]uint8), so each bucket word is naturally 4-byte aligned and the atomic.Load/CompareAndSwapUint32 ops on &buckets[i] are guaranteed safe on every architecture Go supports. The doc comment states this explicitly.

freemans13 · 2026-05-19T13:05:27Z

+		if !swapped {
+			cf.saturated.Store(true)
+			return false
+		}


Already addressed in 2cfdd0e — the !swapped branch in Insert no longer latches the saturated flag; that's reserved for genuine maxKicks exhaustion. CAS-retry exhaustion is now treated as transient contention and the Insert just backs out.

Already handled — the !swapped (CAS-retries-exhausted) path returns false WITHOUT latching saturated; only exhausting the full maxKicks eviction loop latches it. Transient contention therefore no longer permanently disables eviction.

freemans13 · 2026-05-19T13:05:28Z

+	// Time a batch of Inserts that we expect to fail. Even at b.N = 1M
+	// these should all return immediately (no eviction churn), so total
+	// wall time is bounded by the fast-path overhead only.
+	start := time.Now()
+	const n = 100_000
+	for i := 0; i < n; i++ {
+		var h [32]byte
+		_, err := rand.Read(h[:])
+		require.NoError(t, err)
+		cf.Insert(&h)


Already addressed in 2cfdd0e — TestH32_SaturationLatchesAndShortCircuits pre-generates the 100K hashes outside the timed region so the assertion measures the saturated-Insert fast path rather than crypto/rand cost.

Addressed — the saturation timing test now pre-generates all hashes outside the timed region, so the measured elapsed reflects the short-circuited Insert fast path rather than crypto/rand.Read. A comment explains this.

freemans13 · 2026-05-19T13:05:31Z

+		})
+		prometheusUtxoPrunedSetSaturated = promauto.NewGauge(prometheus.GaugeOpts{
+			Name: "utxo_pruner_pruned_set_saturated",
+			Help: "1 if the PrunedTxSet has reached its configured maxEntries cap, 0 otherwise",


Already addressed in 2cfdd0e — added utxo_pruner_pruned_set_rotations (driven by PrunedTxSet.Rotations()) as the routine saturation signal, and reframed the saturated gauge's Help text to make clear it's a backstop / error signal. In the current commit I've also dropped the _total suffix from the rotations metric to follow Prometheus convention.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

freemans13 · 2026-05-19T13:05:33Z

+	// PrunedTxSet is persistent across prune sessions so children whose parents were pruned
+	// in earlier sessions can still skip the parent-update round-trip. Defensive mode is
+	// incompatible with the optimisation because records may be skipped after the reader
+	// registers them.
+	if !service.defensiveEnabled {
+		service.prunedSet = NewPrunedTxSet(256, settings.Pruner.UTXOPrunedSetMaxEntries)
+	}


Already addressed in 2cfdd0e — the source comment now spells out "lives only in memory — rebuilt from scratch on pod restart and is not persisted to disk". The PR description still has the stale wording; I'll update that out-of-band (the review-loop tooling blocks PR body edits).

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

freemans13 · 2026-05-19T13:05:38Z

 			SkipPreserveParents:            getBool("pruner_skipPreserveParents", false, alternativeContext...),                  // Skip Phase 1: preserve parents
 			SkipDeletions:                  getBool("pruner_skipDeletions", false, alternativeContext...),                        // Skip deletions for performance
 			MinBlockHeight:                 getUint32("pruner_min_block_height", 0, alternativeContext...),                       // Do not prune blocks at or below this height
+			UTXOPrunedSetMaxEntries:        getInt("pruner_utxoPrunedSetMaxEntries", 10_000_000, alternativeContext...),          // Soft cap on PrunedTxSet entries; 0 = unlimited


Already addressed in 2cfdd0e — settings.go inline comment now reads 0 = use built-in 2B default (NOT unlimited). This review batch reviewed 8657c1d before the round-2 fix had pushed.

Addressed — the settings comment no longer says '0 = unlimited'. It now reads '0 = use built-in 2B default (NOT unlimited)', matching NewPrunedTxSet's handling of maxEntries<=0.

freemans13 · 2026-05-19T13:05:40Z

 	SkipPreserveParents             bool          `key:"pruner_skipPreserveParents" desc:"Skip Phase 1: preserve parents of unmined transactions" default:"false" category:"Pruner" usage:"Skip parent preservation phase" type:"bool" longdesc:"### Purpose\nSkips Phase 1 of pruning which preserves parent transactions of old unmined transactions.\n\n### How It Works\nWhen enabled, the pruner skips calling PreserveParentsOfOldUnminedTransactions.\nThis means parent transactions will not be protected from deletion even if they have unmined children.\n\n### Trade-offs\n| Setting | Benefit | Drawback |\n|---------|---------|----------|\n| false (default) | Parents preserved for unmined tx resubmission | Additional processing overhead |\n| true | Faster pruning, reduced processing | Parents may be deleted, breaking unmined tx chains |\n\n### Recommendations\n- **false** (default) - Normal operation, preserves parent transactions\n- **true** - Skip parent preservation if you don't need to resubmit unmined transactions"`
 	MinBlockHeight                  uint32        `key:"pruner_min_block_height" desc:"Minimum block height before pruning begins" default:"0" category:"Pruner" usage:"Skip all pruning until block height exceeds this value" type:"uint32" longdesc:"### Purpose\nPrevents all pruning operations until the blockchain reaches a minimum height.\n\n### How It Works\n- When set to a value > 0, the pruner skips ALL operations (parent preservation, DAH deletion, blob deletion) for any block with height <= this value\n- At height 0 (default), pruning behaves normally from the start\n- Useful for fresh environment bootstrapping where initial blocks must remain available for cross-node validation\n\n### Recommendations\n- **0** (default) - Normal operation, prune from the start\n- **300** - Typical value for dev environments that need to mine initial blocks for coinbase splitting"`
 	SkipDeletions                   bool          `key:"pruner_skipDeletions" desc:"Skip deletion operations" default:"false" category:"Pruner" usage:"Skip deletion operations" type:"bool" longdesc:"### Purpose\nSkips deletion operations during pruning.\n\n### How It Works\nWhen enabled, the pruner skips deletion operations during pruning.\n\n### Recommendations\n- **false** (default) - Normal operation\n- **true** - Skip deletion operations"`
+	UTXOPrunedSetMaxEntries         int           `key:"pruner_utxoPrunedSetMaxEntries" desc:"Soft cap on entries in the in-memory pruned-TX set" default:"10000000" category:"Pruner" usage:"0 = use default (2B); saturation causes Inserts to no-op" type:"int" longdesc:"### Purpose\nCaps the in-memory size of the PrunedTxSet, which tracks TXIDs pruned across sessions so that wasteful Aerospike parent updates can be skipped pre-flight.\n\n### How It Works\n- The set is persisted on the Service struct and reused across prune sessions.\n- When a record is scanned for deletion, its TXID is registered. Children later check the set to skip parent-update round-trips for parents already known to be gone.\n- Backed by a sharded two-generation cuckoo filter. Once a generation saturates it rotates into the 'previous' slot and a fresh 'current' is allocated, so the set never freezes — but entries fall out of 'previous' on the next rotation.\n- At ~1 byte per entry across both generations (cuckoo filter): 10M entries ≈ 20 MB; 100M ≈ 200 MB; 2B (the default when maxEntries=0) ≈ 2 GiB.\n\n### Recommendations\n- **10000000** (default) - safe ceiling for general workloads (~20 MB)\n- Increase (e.g. 100000000 ≈ 200 MB, or 0 for the 2B / ~2 GiB default) for tightly-chained / high-throughput workloads where parents span prune sessions and the catch rate is bottlenecked by rotation pressure\n- **0** - use the built-in 2B default (≈ 2 GiB); appropriate when the working-set may grow large\n- Disabled automatically when pruner_utxoDefensiveEnabled=true (defensive mode bypasses the optimisation)"`


Partially addressed in 2cfdd0e (longdesc updated to describe rotation and memory) and fully tightened in this commit (the short usage tag now reads 0 = use built-in 2B default; on saturation the current generation rotates, it does not stop accepting Adds). Also corrected the 10M ≈ 20 MB figure to ≈ 10 MiB to match the actual sizing math (see 3266345357).

Addressed — the longdesc was rewritten for the cuckoo/two-generation model: it describes rotation-on-saturation (not the old map-based 'Add becomes a no-op' soft cap) and the ~1 byte/entry budget, and no longer quotes the ~96 bytes/entry map cost.

freemans13 · 2026-05-19T13:05:43Z

+// Memory budget: total memory ≈ 2 × maxEntries × ~1 byte (both
+// generations live simultaneously). NewPrunedTxSet sizes each generation
+// at maxEntries / (2 × shardCount) so the SUM of capacity across all
+// shards and both generations equals the configured maxEntries.


Good catch — that comment block was internally inconsistent. Fixed: the doc now says maxEntries is a TOTAL budget across both generations of all shards (memory ≈ maxEntries × ~1 byte), which matches the sizing code. The settings longdesc and the pruner_service.go doc are now consistent with the same model.

Addressed — the memory-budget comment was rewritten to be internally consistent: maxEntries is the TOTAL across both generations of all shards, each generation sized at maxEntries/(2×shardCount), so total memory ≈ maxEntries × ~1 byte.

freemans13 · 2026-05-19T13:05:45Z

+	// Both parents removed via CheckAndRemove (Len -= 2), and the record's
+	// own TXID is now Added by the deferred-Add path that runs in parallel
+	// with flushCleanupBatches (Len += 1). Net: 1 entry (the child).
+	require.Equal(t, 1, prunedSet.Len(),
+		"parents removed, child TXID added during the parallel-Add overlap")


Already addressed in 2cfdd0e — the test comment now says the chunk's own TXID is Added by the up-front loop at the start of processRecordChunk. This review batch reviewed 8657c1d before the round-2 fix had pushed.

Addressed — the comment was updated to describe the current up-front Add: TXIDs are Added at the top of processRecordChunk before any CheckAndRemove/flush, not the old deferred-Add path.

freemans13 · 2026-05-19T13:05:47Z

+	b.Helper()
+	hashes := make([]chainhash.Hash, n)
+	for i := range hashes {
+		_, _ = rand.Read(hashes[i][:])


Good catch — makeHashes (in pruned_tx_set_bench_test.go) now b.Fatalf's on rand.Read error rather than silently producing low-entropy hashes.

Addressed — makeHashes now checks the rand.Read error and fails via b.Fatalf.

github-actions · 2026-05-19T13:32:46Z

+	// life of this process. This lets children whose parents were pruned in an earlier
+	// session (the common case for chains crossing block-height boundaries) still skip
+	// the parent-update round-trip. The set lives only in memory — it is rebuilt from
+	// scratch on pod restart and is not persisted to disk.


[RESOLVED - Comment was incorrect]

The PR description actually states correctly (from the Summary section):

"It is not persisted to disk: on pod restart it is rebuilt from scratch"

And from the Notes section:

"Not persisted across restarts. The set lives only in memory; a pod restart rebuilds it from scratch"

Both the PR description and implementation correctly document that the set is in-memory only and NOT persisted to disk. This comment was based on a misreading of the PR description.

Acknowledged — this is the recurring "PR description claims disk persistence" issue (also flagged in 3266256410, 3266345176, 3266430001, 3266631775). The source-code documentation now correctly describes the in-memory-only behaviour (pruner_service.go:710 and pruner_settings.go:28 both spell it out). The review-loop tooling I'm running denies PR body edits, so I've left the description for the author to update manually. The implementation matches the source-code docs; only the PR description (which is metadata, not code) is stale.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

freemans13 · 2026-05-19T13:51:58Z

+// Saturated reports whether the set has experienced any insert failures
+// since construction. With the two-generation design, the only way to
+// see InsertFailures > 0 is if a freshly-allocated generation refused
+// an Insert (effectively never in normal operation), so Saturated() is
+// now best read as "something went badly wrong" rather than "we're full".
+// Use Rotations() to see how often the set is recycling.


Fixed — Saturated() docstring now enumerates both paths that can increment insertFailures (the contention-without-rotation path added in this PR's round-4 fix, AND insertion into a freshly-rotated generation also failing) and explicitly frames it as an error/backstop signal rather than a routine at-capacity indicator.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

freemans13 · 2026-05-19T13:52:00Z

+		})
+		prometheusUtxoPrunedSetSaturated = promauto.NewGauge(prometheus.GaugeOpts{
+			Name: "utxo_pruner_pruned_set_saturated",
+			Help: "1 if any PrunedTxSet Insert has failed even on a freshly-rotated generation (backstop / error signal — should be 0 in normal operation; use utxo_pruner_pruned_set_rotations for routine saturation)",


Fixed — utxo_pruner_pruned_set_saturated help text now reads "1 if any PrunedTxSet Insert has failed since construction without rotation recovering it (extreme CAS contention without saturation, or insertion into a freshly-rotated generation also failing)". Same framing as the Saturated() docstring.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

NewPrunedTxSet memory budget comment - Reworded "matches the configured maxEntries" to capture two upper-bound caveats: (a) cuckoo.NewH32 rounds bucket count up to the next power of two (up to ~2x slot overhead per generation), and (b) cuckoo filters run well below 100% load before saturating. Operators should treat maxEntries as a target, not a hard RSS ceiling. PrunedTxSet.Saturated() docstring - Now enumerates the two distinct paths that can increment insertFailures, both of which are error/backstop signals (not routine at-capacity indicators): * Add observed Insert failure without cur.Saturated() after a retry (extreme CAS contention). * Insert failed on a freshly-rotated generation. - This matches the actual semantics after the round-4 PrunedTxSet.Add fix that introduced the contention-without-rotation backstop. utxo_pruner_pruned_set_saturated help text - Synced with the Saturated() docstring so the metric description reflects the real "any insert failure after retries/rotation" condition rather than just the freshly-rotated case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

freemans13 · 2026-05-21T10:34:45Z

+	// PrunedTxSet is owned by the Service and reused across prune sessions within the
+	// life of this process. This lets children whose parents were pruned in an earlier
+	// session (the common case for chains crossing block-height boundaries) still skip
+	// the parent-update round-trip. The set lives only in memory — it is rebuilt from
+	// scratch on pod restart and is not persisted to disk.


Good catch — the in-code documentation (around lines 706-735 of pruner_service.go) is now the source of truth: the set is in-memory only, rebuilt on pod restart, and not persisted to disk. The PR description predates that comment block being rewritten; I'll get the description aligned with the code rather than the other way around. No behaviour change.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

freemans13 · 2026-05-21T10:34:53Z

 			SkipPreserveParents:            getBool("pruner_skipPreserveParents", false, alternativeContext...),                  // Skip Phase 1: preserve parents
 			SkipDeletions:                  getBool("pruner_skipDeletions", false, alternativeContext...),                        // Skip deletions for performance
 			MinBlockHeight:                 getUint32("pruner_min_block_height", 0, alternativeContext...),                       // Do not prune blocks at or below this height
+			UTXOPrunedSetMaxEntries:        getInt("pruner_utxoPrunedSetMaxEntries", 10_000_000, alternativeContext...),          // Soft cap on PrunedTxSet entries; 0 = use built-in 2B default (NOT unlimited)


Confirmed: the implemented default is 10,000,000 (settings.go:539), with the sentinel value 0 selecting the built-in 2,000,000,000 ceiling. The PR description's "default 2B" is inaccurate and I'll update the description to match the code rather than change the default — 10M is the deliberate conservative default for fresh deployments; operators opt into the larger cap.

Resolved — updated the PR description: the shipped default is 10,000,000 (~10 MiB), and 0 selects the built-in 2B (~2 GiB) default. The dev-scale-1 numbers were measured with maxEntries=0 (the 2B cap), now annotated as such in the description.

freemans13 · 2026-05-21T10:35:08Z

+// b.N unique hashes ensures we measure first-time inserts (the production
+// case), not eviction loops from re-inserting duplicate fingerprints.
+func BenchmarkCuckoo_Add(b *testing.B) {
+	// Pre-generate enough hashes for b.N iterations so each Add is unique.
+	// We allocate them outside the timed region.
+	const cap_ = 64_000_000
+	hashes := makeHashes(b, min(b.N, cap_/2))
+	set := NewPrunedTxSet(256, cap_)
+	b.ResetTimer()
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		set.Add(hashes[i%len(hashes)])


Capped the pool at 1M hashes (~32 MiB) and stride via i & (poolSize-1) — bench memory is now O(poolSize) regardless of b.N. Updated the surrounding comment to be honest about what the benchmark measures after the pool wraps (cuckoo's idempotent re-insert fast path), since that side effect of capping is real.

Addressed — BenchmarkCuckoo_Add caps the pre-generated pool at 1 << 20 (~32 MiB) and cycles through it via i&(poolSize-1), independent of b.N.

freemans13 · 2026-05-21T10:35:51Z

+// Tests use a deliberately small per-shard capacity (4096 buckets × 4 slots =
+// 16K slots per shard) so the bench can run on a developer machine without
+// allocating multi-GB filters. The per-op cost we're measuring (hash extract,
+// bucket access, lock acquire) is independent of total capacity.


Fixed the block comment to match what the bench actually constructs: NewPrunedTxSet(256, 64_000_000) → 256 shards × ~250K entries/shard → ~65,536 buckets per shard after the cuckoo's power-of-two rounding. The old "4096 buckets" number was stale.

Addressed — the benchmark comment was corrected to describe the actual construction (NewPrunedTxSet(256, 64_000_000) ⇒ ~65,536 buckets/shard, ~64 MiB) and notes the power-of-two rounding.

…oo-filter

oskarszoon

Request changes. Two blockers.

PR description vs code disagree on persistence. Body says "persisted to disk across pruner restarts" + references pruner_utxoPrunedSetMaxEntries. pruner_service.go:709-710 says the opposite: "NOT persisted across pod restarts." Setting longdesc agrees with the code. The 99.9% catch rate resets to ~7% after every pod restart. Either wire the persistence or fix the description.
bucketCASDelete (util/cuckoo/cuckoo_h32.go:164-182) misses items under contention. Per-slot CAS loop does if b != fp { break } — if a concurrent writer placed fp between Load and the check (stale zero read → break), the delete silently misses. Same issue in tryInsertCAS on the b != nullFinger branch. bucketHas does this correctly: single atomic.LoadUint32 + SWAR on the full word. Both CAS functions should match.

Other items in full draft: two-generation drain-on-rotate not quantified, 10M default vs 2B constant mismatch in PR memory table, CheckAndRemove destructive semantics under high fan-in, per-shard metrics missing.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

freemans13 · 2026-06-04T15:55:39Z

+	// PrunedTxSet is owned by the Service and reused across prune sessions within the
+	// life of this process. This lets children whose parents were pruned in an earlier
+	// session (the common case for chains crossing block-height boundaries) still skip
+	// the parent-update round-trip. The set lives only in memory — it is rebuilt from
+	// scratch on pod restart and is not persisted to disk.


Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

freemans13 · 2026-06-04T15:57:27Z

+	// Operators should treat maxEntries as the target rather than a
+	// hard ceiling on RSS; the worst-case allocation overhead is bounded
+	// by the power-of-two rounding.
+	perShard := uint(maxEntries / n / 2)


Fixed in 9beb36d — perShard is now clamped to a minimum (minPerShardCap = 1024) so a small maxEntries can't drive it to 0 and produce a degenerate single-bucket filter that rotates every few inserts (~1 KiB/generation floor).

freemans13 · 2026-06-04T15:57:30Z

 type PrunedTxSet struct {
-	shards     []prunedTxShard
-	mask       uint8 // shardCount - 1, for fast modulo via bitwise AND; uint8 is sufficient because shardCount is capped at 256
-	count      atomic.Int64
-	maxEntries int64 // soft cap on Len(); 0 means unlimited
+	shards         []prunedTxShard
+	mask           uint8
+	perShardCap    uint // capacity of each generation in each shard
+	insertFailures atomic.Int64
+	rotations      atomic.Int64 // number of generation rotations across all shards
+	capacity       int64
 }


Fixed in 9beb36d — the unused capacity field was removed.

… b.N Pre-generating min(b.N, 32M) random hashes blew past 1 GiB on CI runners when go test auto-tuned b.N very high. Switch to a fixed 1M-hash pool and stride through it with a bitmask. Memory is now O(poolSize), independent of b.N. Updated the surrounding block comment to match the actual sizing the benches construct (256 shards × ~250K entries/shard, not the stale '4096 buckets × 4 slots' number). Behavioural note: once b.N exceeds the pool size, subsequent Adds hit already-present fingerprints and exercise the cuckoo's idempotent re-insert fast path rather than fresh evictions. The hash-extract, bucket-access, and atomic-CAS hot path under measurement is the same.

github-actions · 2026-05-21T10:40:08Z

Clarification on PR description claim

The PR description mentions "The set is persisted to disk across pruner restarts," but the implementation only keeps it in memory (rebuilt on pod restart). The code documentation is correct (pruner_service.go:710, settings/pruner_settings.go:28 both state it is NOT persisted to disk). Consider updating the PR description to clarify this is in-memory persistence across prune sessions within a single process.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

freemans13 · 2026-06-04T15:55:42Z

+	// PrunedTxSet is owned by the Service and reused across prune sessions within the
+	// life of this process. This lets children whose parents were pruned in an earlier
+	// session (the common case for chains crossing block-height boundaries) still skip
+	// the parent-update round-trip. The set lives only in memory — it is rebuilt from
+	// scratch on pod restart and is not persisted to disk.


Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

freemans13 · 2026-06-04T15:57:12Z

+		})
+		prometheusUtxoPrunedSetSaturated = promauto.NewGauge(prometheus.GaugeOpts{
+			Name: "utxo_pruner_pruned_set_saturated",
+			Help: "1 if any PrunedTxSet Insert has failed since construction without rotation recovering it (extreme CAS contention without saturation, or insertion into a freshly-rotated generation also failing — both are error/backstop signals; should be 0 in normal operation. Use utxo_pruner_pruned_set_rotations for routine cap pressure.)",


Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

freemans13 · 2026-06-04T15:55:47Z

 			SkipPreserveParents:            getBool("pruner_skipPreserveParents", false, alternativeContext...),                  // Skip Phase 1: preserve parents
 			SkipDeletions:                  getBool("pruner_skipDeletions", false, alternativeContext...),                        // Skip deletions for performance
 			MinBlockHeight:                 getUint32("pruner_min_block_height", 0, alternativeContext...),                       // Do not prune blocks at or below this height
+			UTXOPrunedSetMaxEntries:        getInt("pruner_utxoPrunedSetMaxEntries", 10_000_000, alternativeContext...),          // Soft cap on PrunedTxSet entries; 0 = use built-in 2B default (NOT unlimited)


Resolved — updated the PR description: the shipped default is 10,000,000 (~10 MiB), and 0 selects the built-in 2B (~2 GiB) default. The dev-scale-1 numbers were measured with maxEntries=0 (the 2B cap), now annotated as such in the description.

Address Copilot review findings on PR bsv-blockchain#875: - perShard could reach 0 when maxEntries < 2×shardCount, making cuckoo.NewH32 round up to a single 4-slot bucket that saturates and rotates every few inserts. Add a minPerShardCap=1024 floor (~1 KiB per generation) so degenerate configs stay functional. - Remove the `capacity` field: it was stored in the constructor but never read anywhere in the package. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…oo-filter

sonarqubecloud · 2026-06-04T16:05:13Z

Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
83.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

oskarszoon

Both blockers from the last pass are fixed and verified, so approving. The persistence story is now consistent in-memory-only across the body / pruner_service.go / longdesc, and the bucketCASDelete/tryInsertCAS race is closed (break on slot-mismatch + reload-cur-on-retry; race detector clean across both packages, ×5 runs). The CAS fix checks out under review and the benchmarks are strong (66 ns/op add 0-alloc, near-linear parallel, no contention).

Recommend before merge (non-blocking — code is correct):

Memory: 10M ≈ ~16 MiB, not 10 MiB (settings/pruner_settings.go:28, pruned_tx_set.go:54, body). Power-of-two bucket rounding in NewH32 makes it ~1.68 bytes/entry at 10M; the ~1 byte/entry formula only holds at large configs.
The 99.9% catch / 2.6M rec/sec figures are for maxEntries=0 (~2 GiB), not the shipped 10M default. At production TPS a generation fills in ~3s, so cross-session catch — the feature's main motivation — is near zero at 10M. The "safe ceiling for general workloads" framing reads as if the optimization works at the default; it doesn't for cross-session parents. Worth telling operators to set 0 or 100M+ for the advertised benefit.

And two regression tests given the CAS-bug history (the cuckoo package's own tests wouldn't currently catch a regression):

A direct H32 concurrent insert/delete test that asserts Count() (today's TestPrunedTxSet_ConcurrentAccess only checks "no panics").
A previous-generation lookup test (insert → force rotation → query the pre-rotation hash) — the two-generation fallback branch (Contains/Len:211/237) is currently 0% covered.

Nice work closing both blockers cleanly.

ordishs

Approve with minor follow-ups.

Excellent, exceptionally well-documented PR. The design is sound, the lock-free cuckoo hot path is correct in its memory model (naturally-aligned uint32 buckets, atomic load + SWAR equality, CAS insert/delete; altIndex involution holds), and shard/fingerprint/index byte usage is cleanly independent.

Suggested non-blocking follow-ups:

Correctness rests entirely on the invariant that the deletedChildren bin is only read in defensive mode and prunedSet is nil there. Worth a runtime guard/assertion so a future regression fails loudly. A unit test asserting prunedSet == nil when defensiveEnabled would lock this in.
executeBatchCleanupCombined has no unit test — the [0,parentEnd) vs [parentEnd,end) result-region split and UDF/BatchWrite TX_NOT_FOUND/KEY_NOT_FOUND accounting are mockable and worth covering.
One line in the pruner_utxoPrunedSetMaxEntries longdesc noting that 0 eagerly allocates ~1 GiB of current-gen filters at construction.

Watch-items (not blockers): cuckoo Insert is not idempotent, so partition-retry re-scans inflate Count and can accelerate rotation, lowering catch rate under churn — the rotations gauge is the right signal. Count/Len are approximate by design.

Confirm go test -race and golangci-lint on util/cuckoo/... and stores/utxo/aerospike/pruner/... before merge.

freemans13 and others added 2 commits May 15, 2026 11:19

freemans13 and others added 2 commits May 15, 2026 14:12

github-actions Bot reviewed May 15, 2026

View reviewed changes

freemans13 and others added 6 commits May 15, 2026 16:27

freemans13 self-assigned this May 18, 2026

freemans13 and others added 2 commits May 18, 2026 14:33

Merge remote-tracking branch 'upstream/main' into stu/pruner-set-cuck…

161a28b

…oo-filter

github-actions Bot reviewed May 19, 2026

View reviewed changes

freemans13 changed the title ~~feat(pruner): back PrunedTxSet with a cuckoo filter (~1 byte/entry)~~ perf(pruner): skip already-pruned parent updates using a persistent cuckoo filter May 19, 2026

Merge remote-tracking branch 'upstream/main' into stu/pruner-set-cuck…

03af510

…oo-filter

freemans13 requested a review from Copilot May 19, 2026 12:22

Copilot started reviewing on behalf of freemans13 May 19, 2026 12:22 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

freemans13 requested a review from Copilot May 19, 2026 12:37

Copilot started reviewing on behalf of freemans13 May 19, 2026 12:38 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

freemans13 requested a review from Copilot May 19, 2026 13:29

Copilot started reviewing on behalf of freemans13 May 19, 2026 13:29 View session

github-actions Bot reviewed May 19, 2026

View reviewed changes

Copilot AI reviewed May 19, 2026

View reviewed changes

freemans13 requested a review from Copilot May 19, 2026 13:52

Copilot started reviewing on behalf of freemans13 May 19, 2026 13:53 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

freemans13 added 2 commits May 19, 2026 15:47

Merge remote-tracking branch 'upstream/main' into stu/pruner-set-cuck…

e00dbd4

…oo-filter

Merge remote-tracking branch 'upstream/main' into stu/pruner-set-cuck…

417ef39

…oo-filter

oskarszoon requested changes May 21, 2026

View reviewed changes

freemans13 requested a review from Copilot May 21, 2026 10:30

Copilot started reviewing on behalf of freemans13 May 21, 2026 10:31 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

freemans13 requested a review from Copilot May 21, 2026 10:37

Copilot started reviewing on behalf of freemans13 May 21, 2026 10:37 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

freemans13 added 2 commits May 29, 2026 12:58

Merge remote-tracking branch 'upstream/main' into HEAD

5ec4d3d

Merge upstream/main into stu/pruner-set-cuckoo-filter

afeb4d3

freemans13 changed the title ~~perf(pruner): skip already-pruned parent updates using a persistent cuckoo filter~~ perf(pruner): skip already-pruned parent updates using an in-memory cuckoo filter Jun 4, 2026

freemans13 and others added 2 commits June 4, 2026 16:31

Merge remote-tracking branch 'upstream/main' into stu/pruner-set-cuck…

1f9400c

…oo-filter

freemans13 requested review from ordishs and oskarszoon June 4, 2026 16:17

oskarszoon approved these changes Jun 5, 2026

View reviewed changes

ordishs approved these changes Jun 5, 2026

View reviewed changes

freemans13 merged commit 1937972 into bsv-blockchain:main Jun 5, 2026
34 checks passed

		func TestPrunedTxSet_AddAndContains(t *testing.T) {
		set := NewPrunedTxSet(16, 0)

		require.Equal(t, 1, prunedSet.Len(),
		"parents removed, child TXID added during the parallel-Add overlap")

Conversation

freemans13 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured impact on dev-scale-1

What's in this PR

Why a hand-written cuckoo filter

Notes for reviewers

Test plan

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

github-actions Bot May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

freemans13 commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot May 15, 2026 •

edited

Loading

github-actions Bot May 15, 2026 •

edited

Loading

github-actions Bot May 15, 2026 •

edited

Loading

github-actions Bot May 19, 2026 •

edited

Loading