Skip to content

perf(pruner): skip already-pruned parent updates using an in-memory cuckoo filter#875

Merged
freemans13 merged 27 commits into
bsv-blockchain:mainfrom
freemans13:stu/pruner-set-cuckoo-filter
Jun 5, 2026
Merged

perf(pruner): skip already-pruned parent updates using an in-memory cuckoo filter#875
freemans13 merged 27 commits into
bsv-blockchain:mainfrom
freemans13:stu/pruner-set-cuckoo-filter

Conversation

@freemans13

@freemans13 freemans13 commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

The pruner deletes spent UTXOs and tells each spent UTXO's parent to mark itself as having one fewer outstanding child. That second step is a wasted Aerospike round-trip if the parent has already been pruned in an earlier session.

This PR keeps a set of pruned TXIDs in memory and uses it as a pre-flight skip filter: if a parent is in the set, we skip the update. The set is backed by a sharded, lock-free, two-generation cuckoo filter — ~1 byte per entry.

The set is held in memory for the lifetime of the process and reused across prune sessions, so children whose parents were pruned in an earlier session can still skip the round-trip. It is not persisted to disk: on pod restart it is rebuilt from scratch and the catch rate re-ramps from cold. Cap is configurable via pruner_utxoPrunedSetMaxEntries (default 10M ≈ 10 MiB; set to 0 for the built-in 2B ≈ 2 GiB default).

Measured impact on dev-scale-1

Measured with pruner_utxoPrunedSetMaxEntries=0 (the 2B / ~2 GiB cap), not the shipped 10M default:

Before After
records_deleted/sec 1.64M 2.6M
Wasted parent updates caught 2.4% 99.9%
Wasted Aerospike round-trips/session 411M ~500K
Pruner memory 2.6 GiB ~7 GiB
Pruner CPU 14.4 / 15 cores 14.4 / 15 cores

What's in this PR

  1. PrunedTxSet — sharded set of TXIDs pruned in this and prior sessions within a single process. In-memory only; reused across prune sessions, rebuilt from scratch on pod restart.
  2. Cuckoo-filter backing (util/cuckoo.H32) — specialised for 32-byte hashes, lock-free, zero-alloc on the hot path. ~3.1% false-positive rate (which only causes us to skip an update we should have done — never the other way — and with defensive mode off the deletedChildren bin is never read anyway).
  3. Two-generation rotation — a single cuckoo eventually saturates and freezes. We keep a current and previous filter per shard; when current fills, it rotates into previous and a fresh current is allocated. Lookups check both. The set never freezes.
  4. Saturation short-circuit — once eviction has provably failed once, future Inserts that find both candidate buckets full return immediately rather than burning ~15 µs of CPU per call on a hopeless eviction loop.
  5. Combined batch — when defensive mode is off, parent updates and child deletions go in a single Aerospike BatchOperate instead of two. ~8% throughput win.
  6. Configurable cappruner_utxoPrunedSetMaxEntries setting. Default 10M (~10 MiB); 0 selects the built-in 2B (~2 GiB) default.
  7. Metricsutxo_pruner_parents_skipped_pruned_total, utxo_pruner_pruned_set_size, utxo_pruner_pruned_set_saturated, utxo_pruner_pruned_set_rotations.

Why a hand-written cuckoo filter

We tried github.com/seiflotfy/cuckoofilter first. Two problems:

  • Allocations — its API takes []byte, causing the 32-byte hash to escape to the heap via the hasher interface on every Insert/Lookup/Delete. At 1.5M ops/sec that's ~50 MB/sec of churn, 2 GC/sec, p99 GC pause 5 ms.
  • Locks — per-shard sync.Mutex produced ~28% of total CPU in lock2/futex/procyield under load.

util/cuckoo.H32 is the specialised replacement: takes *[32]byte (no escape), uses atomic CAS on packed bucket words (no mutex). Same correctness guarantees, no allocations on hot paths, no lock contention.

We didn't migrate the three existing seiflotfy call sites in teranode (model/disk_parent_spends_map.go, model/disk_tx_map_uint64.go, services/blockassembly/subtreeprocessor/disk_tx_map.go) — they use different key sizes (36-byte inpoints, uint64) so can't drop in cuckoo.H32 as-is. Migrating them is a future PR.

Notes for reviewers

  • Not persisted across restarts. The set lives only in memory; a pod restart rebuilds it from scratch and the catch rate re-ramps from cold. A persistence layer (serialize on shutdown / load on startup) is possible future work but is not in this PR.
  • FP safety rests on one invariant. A false positive suppresses a deletedChildren bin update. That bin is only read by the defensive-mode safety check, and prunedSet is nil whenever defensive mode is on — so with the optimisation active the missed update has no behavioural consequence. If a future change makes non-defensive pruning consult deletedChildren, that invariant must be re-checked.
  • Defensive mode is unchanged. When pruner_defensiveMode is on, the two-step ordering (delete children, then update parents) is preserved and the optimisation is disabled — the combined batch only kicks in when defensive mode is off.
  • The cuckoo's destructive Delete and the two-generation rotation are complementary. ~89% of adds get deleted by their child in the same session; the remaining ~11% are orphans that the rotation eventually drops. We need both.
  • Catch rate ramps within a session from ~7% at the start to ~99% at the end. That's intrinsic to scanning Aerospike by partition rather than by chain order; the cuckoo only knows about TXIDs scanned so far. A two-pass scan would flatten the ramp but doubles I/O — not worth it.

Test plan

  • go test ./util/cuckoo/... ./stores/utxo/aerospike/pruner/... -race passes
  • go vet clean
  • Microbenchmarks: parallel Add+CheckAndRemove at ~10 ns/op
  • Deployed across many iterations on dev-scale-1; sustained 2.6M rec/sec, 99.9% catch rate, smooth Grafana

freemans13 and others added 2 commits May 15, 2026 11:19
…able

Two changes that together let the PrunedTxSet optimisation actually bite for
the tight-chain workload it was designed for.

1. Persist the set on the Service struct so children whose parents were
   pruned in earlier sessions can still skip the parent-update round-trip.
   In the previous design the set was allocated per `PruneWithPartitions`
   call. With `pruner_block_trigger=OnBlockMined` every block triggers a
   new session, so a child at height H+1 was never able to recognise its
   parent at height H — by then the parent's session had ended and the
   set had been thrown away. On dev-scale-1 this limited the catch rate
   to ~6.5% of would-be-wasted parent updates (24M / 371M).

2. Replace the hard-coded 10M cap with a `pruner_utxoPrunedSetMaxEntries`
   setting (default 10M, 0 = unlimited). Persistent sessions accumulate
   entries across many blocks, so the cap is more likely to saturate;
   operators tuning for high-fan-out workloads can raise it.

Adds two gauge metrics so operators can see what's happening:
  - utxo_pruner_pruned_set_size        — current Len() of the set
  - utxo_pruner_pruned_set_saturated   — 1 if cap reached, 0 otherwise

`CheckAndRemove` remains destructive: for parents with high fan-out only
the first child to look skips the update. That's a separate concern from
persistence and not addressed here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the exact-map PrunedTxSet (~130 B/entry) with a sharded cuckoo
filter from github.com/seiflotfy/cuckoofilter (already a dep). Memory drops
~100x — 1B entries fit in ~1 GiB instead of ~130 GiB.

Why this is safe in our usage:
- False positives (~3% at the standard 8-bit fingerprint × 4-slot bucket
  configuration) cause a child to incorrectly skip a parent update,
  suppressing the deletedChildren bin write. That bin is only consulted
  by the defensive-mode safety check, which is always off when
  prunedSet is non-nil. So FPs are behaviourally harmless.
- CheckAndRemove uses cuckoo Delete, which on rare fingerprint collisions
  can remove the wrong entry. That manifests as a lost future skip for
  the collision-evicted hash (one wasted Aerospike round-trip), not a
  correctness bug.

Concurrency: per-shard mutex, same sharding strategy as before
(h[0] & mask). 256 shards by default keeps lock contention low under
the existing read-ahead/processor split.

Why this matters operationally:
On dev-scale-1 (PR bsv-blockchain#873 deploy) the previous map-backed set saturated at
350M entries / ~39 GiB and the catch rate plateaued around 30% of
would-be-wasted parent updates. The map can't fit enough entries to span
typical parent/child temporal gaps without pushing the pod close to OOM.
The filter unlocks effectively unlimited capacity within a small memory
budget, lifting catch rate toward the ~97% ceiling (3% FP penalty).

API: the public PrunedTxSet surface is unchanged (Add, Contains,
CheckAndRemove, Len, Saturated). Adds Saturated() semantics: now sticky
on cumulative Insert failures rather than a Len-based threshold, and
adds an InsertFailures() accessor. NewPrunedTxSet(_, 0) falls back to
defaultPrunedTxSetCapacity (2B entries / ~2 GiB) instead of an
unlimited map.

Tests:
- Replaced exact-equality assertions with probabilistic-tolerant ones
  (cuckoo Len can over-count duplicate Adds; Contains for non-Added
  hashes can FP).
- Added TestPrunedTxSet_FalsePositiveRate to bound the FP rate at < 6%.
- Added TestPrunedTxSet_Saturated to verify InsertFailures latches on
  capacity exhaustion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:

This PR introduces a high-performance in-memory pruned-TX set using a custom lock-free cuckoo filter to skip wasteful Aerospike parent updates during UTXO pruning. The implementation demonstrates strong engineering with comprehensive testing and thorough documentation.

Key Strengths:

  • Well-documented two-generation rotation strategy prevents saturation
  • Lock-free atomic operations eliminate contention overhead
  • Comprehensive test coverage including concurrency, false-positive rates, and saturation behavior
  • Clear documentation in settings aligning with implementation (line 560: "0 = use built-in 2B default (NOT unlimited)")
  • All tests use explicit capacity values (no dangerous maxEntries=0 in test code)

Remaining Issues:

Several Copilot comments from earlier reviews remain unaddressed. Most are minor documentation/clarity improvements:

  1. Cuckoo filter alignment (util/cuckoo/cuckoo_h32.go) - Multiple comments about uint32 bucket alignment. Current implementation uses []uint32 which provides proper alignment, but the comments in the code could be clearer about why this is safe.

  2. Saturation semantics (pruner_service.go:251-253) - The metric help text for utxo_pruner_pruned_set_saturated says it indicates "configured maxEntries cap is reached", but PrunedTxSet.Saturated() actually reports insert failures (an error state), not routine at-capacity behavior. Consider updating the help text or using Rotations() metric for capacity pressure instead.

  3. Test timing assertion (cuckoo_h32_test.go:127-139) - The saturation short-circuit test pre-generates hashes outside the timed region (good!), but the 1-second timeout bound is generous for CI. This is acceptable as-is.

Documentation Accuracy:
✅ PR description correctly states the set is NOT persisted across restarts
✅ settings/settings.go:560 correctly documents "0 = use built-in 2B default (NOT unlimited)"
✅ settings/pruner_settings.go:28 provides comprehensive documentation of the feature
✅ Test comments accurately describe current behavior (e.g., line 143 mentions "up-front Add loop")

Verdict:
The core implementation is sound. The remaining issues are documentation/clarity improvements that don't affect correctness. The measured performance gains (1.64M → 2.6M records/sec, 99.9% catch rate) validate the approach.


History:

  • Earlier reviews identified documentation mismatches between old map-based and new cuckoo-filter implementation
  • ✅ Fixed: Tests no longer use dangerous maxEntries=0
  • ✅ Fixed: settings.go now correctly documents 0 → 2B default
  • ✅ Fixed: Test comments updated to reflect current behavior

@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-875 (87d3796)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 148
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.642µ 1.602µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.30n 71.24n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.26n 71.25n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.08n 71.24n ~ 0.500
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.26n 33.67n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 56.04n 55.93n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 125.9n 132.4n ~ 0.100
MiningCandidate_Stringify_Short-4 217.4n 224.0n ~ 0.100
MiningCandidate_Stringify_Long-4 1.652µ 1.665µ ~ 0.100
MiningSolution_Stringify-4 855.2n 861.0n ~ 0.500
BlockInfo_MarshalJSON-4 1.743µ 1.756µ ~ 0.700
NewFromBytes-4 128.7n 156.6n ~ 0.100
AddTxBatchColumnar_Validation-4 2.518µ 2.486µ ~ 0.700
OffsetValidationLoop-4 646.1n 657.7n ~ 0.200
Mine_EasyDifficulty-4 61.43µ 60.97µ ~ 0.700
Mine_WithAddress-4 7.391µ 7.046µ ~ 0.100
BlockAssembler_AddTx-4 0.03093n 0.02522n ~ 0.700
AddNode-4 10.42 10.36 ~ 1.000
AddNodeWithMap-4 11.66 11.26 ~ 0.200
DiskTxMap_SetIfNotExists-4 3.463µ 3.448µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.368µ 3.237µ ~ 0.700
DiskTxMap_ExistenceOnly-4 306.0n 304.9n ~ 0.400
Queue-4 186.4n 186.7n ~ 0.800
AtomicPointer-4 4.580n 4.760n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 894.8µ 850.9µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 821.9µ 780.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 103.9µ 101.9µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 62.08µ 61.98µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 56.69µ 66.26µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/10K-4 11.49µ 10.63µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/10K-4 4.637µ 4.628µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 1.572µ 1.646µ ~ 0.500
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.509m 9.419m ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.890m 9.727m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.228m 1.150m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 685.4µ 676.7µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/100K-4 599.7µ 611.2µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 343.1µ 308.0µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/100K-4 51.67µ 53.41µ ~ 0.400
ReorgOptimizations/NodeFlags/New/100K-4 18.13µ 18.05µ ~ 0.700
TxMapSetIfNotExists-4 52.09n 52.46n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 40.29n 39.73n ~ 0.100
ChannelSendReceive-4 598.0n 621.5n ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 47.26n 47.54n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 23.63n 23.21n ~ 0.400
DirectSubtreeAdd/256_per_subtree-4 21.94n 22.02n ~ 0.400
DirectSubtreeAdd/1024_per_subtree-4 20.96n 21.33n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 20.56n 20.59n ~ 0.600
SubtreeProcessorAdd/4_per_subtree-4 241.4n 234.3n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 228.0n 227.4n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 227.9n 226.0n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 225.1n 237.3n ~ 0.200
SubtreeProcessorAdd/2048_per_subtree-4 221.5n 220.9n ~ 0.800
SubtreeProcessorRotate/4_per_subtree-4 230.5n 227.0n ~ 1.000
SubtreeProcessorRotate/64_per_subtree-4 228.8n 229.8n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 233.4n 230.1n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 222.2n 235.7n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 43.95n 44.36n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 28.39n 29.12n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 27.56n 27.76n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 27.05n 27.06n ~ 0.600
SubtreeCreationOnly/4_per_subtree-4 90.28n 90.85n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 320.0n 326.6n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.093µ 1.084µ ~ 0.700
SubtreeCreationOnly/1024_per_subtree-4 3.429µ 3.481µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 6.140µ 6.311µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 229.2n 218.5n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 229.7n 220.6n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 1.635m 1.617m ~ 0.700
ParallelGetAndSetIfNotExists/10k_nodes-4 4.372m 4.181m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 6.523m 5.812m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 8.724m 8.027m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.414m 1.421m ~ 0.200
SequentialGetAndSetIfNotExists/10k_nodes-4 3.851m 3.543m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 16.14m 11.21m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 25.09m 20.66m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 1.655m 1.652m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 6.902m 6.933m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 11.91m 12.02m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.464m 1.440m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.664m 9.076m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 56.64m 49.08m ~ 0.400
CalcBlockWork-4 530.1n 505.6n ~ 0.700
CalculateWork-4 695.2n 697.1n ~ 0.400
CheckOldBlockIDs/on-chain-prefetch/1000-4 56.56µ 64.44µ ~ 0.700
CheckOldBlockIDs/off-chain-prefetch/1000-4 47.11µ 45.94µ ~ 0.200
CheckOldBlockIDs/on-chain-prefetch/10000-4 419.1µ 419.5µ ~ 0.700
CheckOldBlockIDs/off-chain-prefetch/10000-4 337.7µ 332.3µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.361µ 1.358µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_100-4 13.03µ 12.99µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 128.1µ 128.8µ ~ 0.200
CatchupWithHeaderCache-4 104.5m 104.5m ~ 1.000
_BufferPoolAllocation/16KB-4 4.339µ 4.156µ ~ 0.700
_BufferPoolAllocation/32KB-4 9.027µ 10.620µ ~ 0.700
_BufferPoolAllocation/64KB-4 20.48µ 17.01µ ~ 0.200
_BufferPoolAllocation/128KB-4 35.45µ 37.35µ ~ 0.100
_BufferPoolAllocation/512KB-4 146.3µ 123.1µ ~ 0.100
_BufferPoolConcurrent/32KB-4 21.70µ 20.26µ ~ 0.200
_BufferPoolConcurrent/64KB-4 34.41µ 31.98µ ~ 0.100
_BufferPoolConcurrent/512KB-4 155.6µ 154.8µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 687.5µ 662.9µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/32KB-4 748.6µ 682.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 742.7µ 692.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 748.3µ 696.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 667.7µ 603.7µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.09m 37.19m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.28m 37.03m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.88m 37.03m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.89m 37.36m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.69m 36.96m ~ 0.400
_PooledVsNonPooled/Pooled-4 740.6n 739.3n ~ 0.100
_PooledVsNonPooled/NonPooled-4 8.958µ 9.156µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.949µ 7.026µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.913µ 10.228µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.474µ 9.545µ ~ 0.400
_prepareTxsPerLevel-4 412.7m 407.0m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.743m 3.477m ~ 0.200
_prepareTxsPerLevel_Comparison/Original-4 417.3m 411.4m ~ 0.200
_prepareTxsPerLevel_Comparison/Optimized-4 3.851m 3.534m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.436m 1.421m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 330.3µ 330.2µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 80.83µ 80.80µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 20.42µ 20.22µ ~ 0.700
SubtreeSizes/10k_tx_512_per_subtree-4 10.093µ 9.954µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.971µ 4.950µ ~ 1.000
SubtreeSizes/10k_tx_2k_per_subtree-4 2.483µ 2.470µ ~ 0.300
BlockSizeScaling/10k_tx_64_per_subtree-4 77.72µ 79.03µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 20.17µ 19.91µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.955µ 4.950µ ~ 1.000
BlockSizeScaling/50k_tx_64_per_subtree-4 393.1µ 392.3µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 98.20µ 97.16µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.86µ 24.60µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 161.7µ 161.4µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 171.7µ 168.8µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 328.5µ 324.5µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 9.882µ 9.909µ ~ 0.400
SubtreeAllocations/medium_subtrees_data_fetch-4 10.92µ 10.80µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 20.64µ 20.28µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.408µ 2.436µ ~ 0.400
SubtreeAllocations/large_subtrees_data_fetch-4 2.697µ 2.636µ ~ 0.200
SubtreeAllocations/large_subtrees_full_validation-4 5.170µ 5.086µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 332.8µ 330.5µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 338.5µ 333.1µ ~ 0.700
GetUtxoHashes-4 280.9n 279.9n ~ 1.000
GetUtxoHashes_ManyOutputs-4 50.94µ 45.94µ ~ 0.100
_NewMetaDataFromBytes-4 213.8n 215.0n ~ 0.200
_Bytes-4 400.4n 396.3n ~ 0.700
_MetaBytes-4 139.5n 138.4n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-06-04 16:08 UTC

freemans13 and others added 2 commits May 15, 2026 14:12
Replaces the seiflotfy/cuckoofilter backing with a tiny in-package
implementation (cuckooH32) specialised for chainhash.Hash inputs.

Before:
  BenchmarkCuckoo_Add                3357 ns/op  32 B/op  1 alloc/op
  BenchmarkCuckoo_CheckAndRemove_Hit    84 ns/op  32 B/op  1 alloc/op

Every Add/Lookup/Delete allocated a slice header because the library's
hash interface dispatch caused h[:] to escape. At the pruner's observed
~1.5M ops/sec that translates to tens of MB/sec of allocation churn —
exactly the kind of pattern Go's GC handles least well, and a plausible
contributor to the throughput regression observed on dev-scale-1.

After:
  BenchmarkCuckoo_Add                  41.5 ns/op  0 B/op  0 allocs/op
  BenchmarkCuckoo_CheckAndRemove_Hit     37 ns/op  0 B/op  0 allocs/op
  BenchmarkCuckoo_Parallel               27 ns/op  0 B/op  0 allocs/op

Design:
- *[32]byte instead of []byte — no slice-header alloc, no interface escape.
- Skip rehashing — chainhash is already a SHA-256 digest, so fingerprint
  comes from h[0] and bucket index from h[1:9] directly.
- Shard on h[9] (not h[0]) so per-shard distribution is independent of
  the fingerprint bytes consumed by the filter.
- Deterministic eviction slot (fp ^ k) — keeps the filter alloc-free
  while rotating through bucket slots during cuckoo kicks.

FP rate, semantics, and safety guarantees unchanged — confirmed by
TestCuckooH32_FalsePositiveRate and TestPrunedTxSet_FalsePositiveRate
(< 6% over 100K probes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CPU profile of the previous version on dev-scale-1 showed:
  - ~28% of CPU in runtime.lock2/futex/procyield/syscall — sync.Mutex
    contention on the per-shard locks of PrunedTxSet.
  - 20% of processRecordChunk time inside CheckAndRemove, of which most
    was lock acquire/release overhead, not cuckoo work itself.

Each bucket of the cuckoo filter is exactly 4 bytes (4 x 1-byte
fingerprints) and naturally aligned in the bucket slice — perfect for
treating as a single uint32 with atomic ops.

This commit replaces the sync.Mutex-per-shard model with lock-free
operations on the packed uint32 bucket word:

  - Lookup: one atomic.LoadUint32 + SWAR byte-equality test, zero
    memory writes.
  - Insert: per-slot CompareAndSwapUint32 (0 -> fp). On bucket full,
    eviction loop also uses CAS — race-tolerant: a concurrent insert
    may displace us back, costing one future lost skip but never
    corrupting the filter or losing fingerprints atomically.
  - Delete: CompareAndSwapUint32 (fp -> 0).

PrunedTxSet no longer holds any mutex; sharding is retained purely for
cache locality.

Benchmark (Apple M3 Max, 16 cores):

                              Before  After    Map (saturated)
  Parallel Add+Check (ns/op):   27.2   10.11   10.20
  Single-thread Add  (ns/op):   41.5   36.57    1.80
  Single-thread Del  (ns/op):   37     43.07    -

Under parallel load the lock-free cuckoo is now identical to the
saturated-map fast path. Single-thread CAS overhead is small and
irrelevant — the production pruner is heavily parallel (6 partition
workers x ~24 chunk goroutines each).

go test -race passes; concurrency assumptions hold up to TSAN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
}
}

func (cf *cuckooH32) bucketAddr(i uint64) *uint32 {

@github-actions github-actions Bot May 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[OUTDATED] File moved to util/cuckoo/cuckoo_h32.go - see current code uses uint32 buckets with atomic ops.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment concern here is nuanced. While Go doesn't explicitly guarantee 4-byte alignment for [4]uint8, in practice the slice allocator will provide sufficient alignment for the atomic operations to work. However, this is an implementation detail.

That said, I verified the tests include concurrent access tests (TestPrunedTxSet_ConcurrentAccess with 100 goroutines) which would catch alignment issues on ARM if they existed. The code has been tested and works.

For absolute safety across all architectures, using []uint32 as suggested would be clearer. But the current approach is functionally correct given Go's runtime behavior.

The reader hot path used to call prunedSet.Add per scanned record. CPU
profile on dev-scale-1 showed ~5% of total CPU there, with the cost
dominated by cuckoo bucket cache misses (the 2 GB filter doesn't fit
in cache).

Move the Adds off the critical path: collect each chunk's record TXIDs
during processRecordChunk, then fire them off in a parallel goroutine
exactly when flushCleanupBatches kicks off its blocking BatchOperate.
Adds run during the Aerospike network wait — work the pruner was
previously parked on anyway.

Lock-free cuckoo means concurrent Adds across chunk goroutines are
safe. We wait for the Add goroutine to drain before returning so
subsequent chunks see the entries — but Adds typically finish in
~10us per 1024 records vs ~10ms network wait, so the wait is a no-op.

Trade-off: same-chunk parent/child catches are lost because the
parent's Add happens at the end of the chunk that contained it,
after that chunk's CheckAndRemoves. Cross-chunk catches (the dominant
case in randomly-partitioned scans) are preserved.

Updated TestProcessRecordChunk_SkipsParentsInPrunedSet to reflect the
new behaviour: the child record's TXID is now Added during the test's
flush, so Len() ends at 1 instead of 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread settings/pruner_settings.go Outdated
SkipPreserveParents bool `key:"pruner_skipPreserveParents" desc:"Skip Phase 1: preserve parents of unmined transactions" default:"false" category:"Pruner" usage:"Skip parent preservation phase" type:"bool" longdesc:"### Purpose\nSkips Phase 1 of pruning which preserves parent transactions of old unmined transactions.\n\n### How It Works\nWhen enabled, the pruner skips calling PreserveParentsOfOldUnminedTransactions.\nThis means parent transactions will not be protected from deletion even if they have unmined children.\n\n### Trade-offs\n| Setting | Benefit | Drawback |\n|---------|---------|----------|\n| false (default) | Parents preserved for unmined tx resubmission | Additional processing overhead |\n| true | Faster pruning, reduced processing | Parents may be deleted, breaking unmined tx chains |\n\n### Recommendations\n- **false** (default) - Normal operation, preserves parent transactions\n- **true** - Skip parent preservation if you don't need to resubmit unmined transactions"`
MinBlockHeight uint32 `key:"pruner_min_block_height" desc:"Minimum block height before pruning begins" default:"0" category:"Pruner" usage:"Skip all pruning until block height exceeds this value" type:"uint32" longdesc:"### Purpose\nPrevents all pruning operations until the blockchain reaches a minimum height.\n\n### How It Works\n- When set to a value > 0, the pruner skips ALL operations (parent preservation, DAH deletion, blob deletion) for any block with height <= this value\n- At height 0 (default), pruning behaves normally from the start\n- Useful for fresh environment bootstrapping where initial blocks must remain available for cross-node validation\n\n### Recommendations\n- **0** (default) - Normal operation, prune from the start\n- **300** - Typical value for dev environments that need to mine initial blocks for coinbase splitting"`
SkipDeletions bool `key:"pruner_skipDeletions" desc:"Skip deletion operations" default:"false" category:"Pruner" usage:"Skip deletion operations" type:"bool" longdesc:"### Purpose\nSkips deletion operations during pruning.\n\n### How It Works\nWhen enabled, the pruner skips deletion operations during pruning.\n\n### Recommendations\n- **false** (default) - Normal operation\n- **true** - Skip deletion operations"`
UTXOPrunedSetMaxEntries int `key:"pruner_utxoPrunedSetMaxEntries" desc:"Soft cap on entries in the in-memory pruned-TX set" default:"10000000" category:"Pruner" usage:"0 = unlimited; reached entries become no-op Adds" type:"int" longdesc:"### Purpose\nCaps the in-memory size of the PrunedTxSet, which tracks TXIDs pruned across sessions so that wasteful Aerospike parent updates can be skipped pre-flight.\n\n### How It Works\n- The set is persisted on the Service struct and reused across prune sessions.\n- When a record is scanned for deletion, its TXID is registered. Children later check the set to skip parent-update round-trips for parents already known to be gone.\n- Once Len() reaches this cap, further Add() calls become silent no-ops and the optimisation degrades to baseline behaviour for any TXID added after saturation.\n- At ~96 bytes per entry (32-byte hash + Go map overhead), 10M entries is ~1 GB worst-case; 100M ≈ 10 GB.\n\n### Recommendations\n- **10000000** (default) - safe ceiling for general workloads\n- Increase (e.g. 100000000) for tightly-chained / high-throughput workloads where parents span prune sessions and the catch rate is bottlenecked by saturation\n- **0** - unlimited; only safe if you know the working-set will not grow unbounded\n- Disabled automatically when pruner_utxoDefensiveEnabled=true (defensive mode bypasses the optimisation)"`

@github-actions github-actions Bot May 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by newer comment on the same issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this earlier note has been superseded by the newer review comment on the same line. The follow-up is addressed in the next commit.

if !s.defensiveEnabled {
prunedSet = NewPrunedTxSet(256, prunedTxSetMaxEntries)
}
// Memory is bounded by settings.Pruner.UTXOPrunedSetMaxEntries (default 10M, ~96 B/entry).

@github-actions github-actions Bot May 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was correctly noted - the pruner_service.go documentation is accurate and matches the cuckoo implementation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — no change needed; the pruner_service.go docstring already matches the cuckoo implementation.

freemans13 and others added 6 commits May 15, 2026 16:27
Previous commit spawned a goroutine per chunk to overlap prunedSet Adds
with the Aerospike batch wait. At ~1700 chunks/sec across all partition
workers that turned out to introduce visible throughput jitter — most
likely from channel allocation + scheduler dispatch cost, plus the
GC pressure visible at ~2 GC/sec with p99 pauses of 5.5 ms.

The Adds themselves are tiny: ~30 us for a 1024-record chunk (lock-free
atomic CAS, ~30 ns each). That's <1% of the typical ~10 ms
flushCleanupBatches network wait. Whether we do them in parallel with
the wait or sequentially after it is irrelevant to throughput — but
the per-chunk goroutine churn was measurable.

So drop the goroutine entirely. Adds run inline immediately after the
flush returns. Same throughput, smoother per-pod rate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit moved Adds to inline-after-flush, eliminating per-chunk
goroutine churn. That fixed the throughput jitter but dropped catch
rate from ~80% to ~56% because chunk N's TXIDs only became visible to
concurrent chunks AFTER chunk N's ~10 ms Aerospike wait. Other
concurrent chunks (chunkGroupLimit=4) running during that wait would
do their CheckAndRemoves and miss chunk N's records.

Move the Adds to the START of processRecordChunk, before any
CheckAndRemove fires. Same total ~30 us cost for a 1024-record chunk
(lock-free CAS, ~30 ns each), trivial vs the ~10 ms flush that
follows. Now chunk N's TXIDs are visible to concurrent chunks
throughout N's entire CPU + flush window.

Expected effect: catch rate recovers toward ~80% without re-introducing
the goroutine-spawn jitter, throughput stays at ~1.7M records/sec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A future developer reading the pruner code will reasonably ask: why
not just use seiflotfy/cuckoofilter, why a custom implementation,
why lock-free, why are Adds done at the top of processRecordChunk?
Encode the answers in the source so they don't have to git-archaeology
the PR history.

- cuckoo_h32.go header now explains the previous developer's FAQ:
  why we wrote our own (allocation escape via the library's []byte
  hasher interface), why specialised for *[32]byte, why lock-free
  (CPU profile showed ~28% of total in mutex contention).
- pruner_service.go reader-stage comment now explains why Add is
  NOT called there and the two reasons it's done up-front in
  processRecordChunk (cache locality + concurrent-chunk visibility,
  with the concrete catch-rate numbers from dev-scale-1 testing).
- PruneWithPartitions block updated: stale "10M default" cap and
  "~96 B/entry" memory comments fixed to reflect the cuckoo's 2B
  default and ~1 B/entry footprint.

No behavioural changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CPU profile on dev-scale-1 caught the pathology: a 2 GiB cuckoo at
~92% load was burning ~22 cores worth of CPU on a 15-core pod just
running the 500-kick eviction loop on every Insert. Pod CPU pegged at
limit, but actual pruning throughput collapsed from 2.4M rec/sec to
~200K rec/sec.

Root cause: Insert had no give-up condition once the filter was full.
For every Add whose two candidate buckets were both occupied, it ran
the full eviction loop (500 kicks × ~30 ns ≈ 15 us) chasing slots
that, by definition, couldn't exist. At ~1.5M Adds/sec that consumed
~22 cores of pure overhead.

Fix: latch a `saturated` atomic.Bool when the eviction loop first
exhausts MaxKicks. Subsequent Inserts that find both buckets full
return false immediately without re-running the loop. The fast path
(slot-CAS in either candidate bucket) still runs, so when
CheckAndRemove later frees slots the filter recovers transparently;
we just refuse to do the expensive rebalancing work that — once
proven impossible — is permanently expected to fail.

Saturation latch is sticky for the lifetime of the cuckoo instance.
A process restart resets it.

TestCuckooH32_SaturationLatchesAndShortCircuits asserts the bound:
100K saturated Inserts complete in <1 s where the un-fixed version
would take ~1.5 s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-filter PrunedTxSet saturated in ~50 min at sustained 1.7M TPS
on dev-scale-1. After saturation, all new Adds were short-circuited
(by the earlier eviction-loop fix), so cross-session catches collapsed
from ~98% to ~14% within hours. The set effectively froze, holding
stale historical entries that didn't match current children's parents.

This commit rotates: each shard keeps a `current` filter (receives
Adds) and a `previous` filter (read-only, last epoch's history). When
`current` refuses an Insert, it slides into the `previous` slot
(dropping the older `previous`), a fresh `current` is allocated, and
the Add is retried. Lookup/CheckAndRemove try current first, fall back
to previous on miss.

Effect: the set never freezes. At 1.7M Adds/sec with the configured
8B total cap (= 4B per generation), each generation rotates ~every
40 min. Worst case, the set always has the last ~40 min of entries
available — far longer than any tight-chain parent-to-child gap.

Throughput preservation (the critical requirement):
  - Before first rotation, `previous == nil`. Hot path is byte-
    identical to the single-filter implementation. Zero overhead.
  - After rotation, Add still costs 1 atomic.Pointer.Load + 1 cuckoo
    CAS (~1 ns extra). CheckAndRemove that hits in current returns
    immediately — same cost as today. Miss path adds 1 atomic load
    + 1 cuckoo lookup on previous (~5-10 ns).
  - Bench:  BenchmarkCuckoo_Parallel_AddPlusCheck  10.58 ns/op (was
    10.11 ns/op single-filter). 0 allocs. No regression.

Memory: total ≈ 2 × maxEntries × ~1 byte. NewPrunedTxSet sizes each
generation at maxEntries / (2 × shardCount) so total memory budget
matches the configured cap.

Saturated() now reads as "something went badly wrong" (a fresh
generation refused an Insert) — never expected in normal operation.
The new Rotations() accessor surfaces rotation count for observability.

TestPrunedTxSet_Saturated renamed to TestPrunedTxSet_RotatesUnderLoad
to reflect the new behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
flushCleanupBatches previously sent two sequential Aerospike round-trips
per chunk: first the parent UDF updates, then the child deletions. Each
round-trip costs ~10-20 ms of network latency. With ~57K chunks per
worker per session × 12 workers, even at chunkGroupLimit=4 parallelism
that's tens of seconds of round-trip latency per session.

Aerospike's BatchOperate accepts a mixed []BatchRecordIfc list with
BatchUDF (or BatchWrite) records AND BatchDelete records interleaved.
Server fans them out per-node and we get ONE network round-trip
covering both kinds of operation.

Bonus saving: client-side parseRecordResults runs once instead of
twice — the CPU profile showed that path at ~20% of pruner CPU.

Defensive mode keeps the original two-step sequence because its
safety check has an implicit ordering dependency between parent
deletedChildren writes and child deletions. Non-defensive (the
dev-scale-1 default) uses the combined path; the deletedChildren bin
isn't read in that configuration so within-batch order is irrelevant.

The new executeBatchCleanupCombined classifies results by region:
[0, parentEnd) parsed as UDF/Write returns (TX_NOT_FOUND silently
counted); [parentEnd, end) parsed as Delete results (KEY_NOT_FOUND
treated as idempotent success). Prometheus counters preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Additional Configuration Issue - Default Value Mismatch

Beyond the documentation issues in the existing inline comments, there's a critical inconsistency in the default value:

Files claim different defaults:

  • settings/settings.go:533 → actual runtime default: 10_000_000 (10M)
  • pruner_service.go:714 → comment says: "default 2B entries"
  • pruned_tx_set.go:11 → fallback constant: 2_000_000_000 (2B)
  • PR description → claims: 2_000_000_000 (2B)

Impact:
The real default is 10M (from settings.go), not 2B. This 200x difference causes:

  • Optimization saturates 200x faster than users expect
  • Memory footprint ~20 MB instead of expected ~4 GB
  • Catch rate degrades much earlier in production workloads

Needs fix: Either update settings.go to 2B, or update all docs/comments to reflect 10M reality.

(Related to inline comment thread on pruner_settings.go:28)

@freemans13 freemans13 self-assigned this May 18, 2026
freemans13 and others added 2 commits May 18, 2026 14:33
Moves the lock-free zero-alloc cuckoo filter specialised for 32-byte
hashes from stores/utxo/aerospike/pruner/cuckoo_h32.go to a new
util/cuckoo package as cuckoo.H32. The pruner now imports it.

Why extract: teranode currently uses github.com/seiflotfy/cuckoofilter
directly in three places (model/disk_parent_spends_map.go,
model/disk_tx_map_uint64.go, services/blockassembly/subtreeprocessor/
disk_tx_map.go). Those call sites pay the same allocation/lock costs
this implementation solved for the pruner — taking []byte arguments
that escape via interface dispatch, plus sync.Mutex per shard.

This commit does NOT migrate the existing call sites. Their key sizes
differ (36-byte inpoints, uint64) so they can't drop in cuckoo.H32 as-is.
What this commit gives them is a worked example of the optimisations
(pointer arguments, packed-bucket atomic CAS, saturated short-circuit)
and a place to live alongside if/when they get a similar treatment.

API:
  - cuckoo.NewH32(capacity uint) *cuckoo.H32
  - (*H32).Insert(*[32]byte) bool
  - (*H32).Lookup(*[32]byte) bool
  - (*H32).Delete(*[32]byte) bool
  - (*H32).Count() int
  - (*H32).Saturated() bool

No behavioural change in the pruner. All four cuckoo tests pass under
-race in the new location.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread settings/pruner_settings.go Outdated
SkipPreserveParents bool `key:"pruner_skipPreserveParents" desc:"Skip Phase 1: preserve parents of unmined transactions" default:"false" category:"Pruner" usage:"Skip parent preservation phase" type:"bool" longdesc:"### Purpose\nSkips Phase 1 of pruning which preserves parent transactions of old unmined transactions.\n\n### How It Works\nWhen enabled, the pruner skips calling PreserveParentsOfOldUnminedTransactions.\nThis means parent transactions will not be protected from deletion even if they have unmined children.\n\n### Trade-offs\n| Setting | Benefit | Drawback |\n|---------|---------|----------|\n| false (default) | Parents preserved for unmined tx resubmission | Additional processing overhead |\n| true | Faster pruning, reduced processing | Parents may be deleted, breaking unmined tx chains |\n\n### Recommendations\n- **false** (default) - Normal operation, preserves parent transactions\n- **true** - Skip parent preservation if you don't need to resubmit unmined transactions"`
MinBlockHeight uint32 `key:"pruner_min_block_height" desc:"Minimum block height before pruning begins" default:"0" category:"Pruner" usage:"Skip all pruning until block height exceeds this value" type:"uint32" longdesc:"### Purpose\nPrevents all pruning operations until the blockchain reaches a minimum height.\n\n### How It Works\n- When set to a value > 0, the pruner skips ALL operations (parent preservation, DAH deletion, blob deletion) for any block with height <= this value\n- At height 0 (default), pruning behaves normally from the start\n- Useful for fresh environment bootstrapping where initial blocks must remain available for cross-node validation\n\n### Recommendations\n- **0** (default) - Normal operation, prune from the start\n- **300** - Typical value for dev environments that need to mine initial blocks for coinbase splitting"`
SkipDeletions bool `key:"pruner_skipDeletions" desc:"Skip deletion operations" default:"false" category:"Pruner" usage:"Skip deletion operations" type:"bool" longdesc:"### Purpose\nSkips deletion operations during pruning.\n\n### How It Works\nWhen enabled, the pruner skips deletion operations during pruning.\n\n### Recommendations\n- **false** (default) - Normal operation\n- **true** - Skip deletion operations"`
UTXOPrunedSetMaxEntries int `key:"pruner_utxoPrunedSetMaxEntries" desc:"Soft cap on entries in the in-memory pruned-TX set" default:"10000000" category:"Pruner" usage:"0 = unlimited; reached entries become no-op Adds" type:"int" longdesc:"### Purpose\nCaps the in-memory size of the PrunedTxSet, which tracks TXIDs pruned across sessions so that wasteful Aerospike parent updates can be skipped pre-flight.\n\n### How It Works\n- The set is persisted on the Service struct and reused across prune sessions.\n- When a record is scanned for deletion, its TXID is registered. Children later check the set to skip parent-update round-trips for parents already known to be gone.\n- Once Len() reaches this cap, further Add() calls become silent no-ops and the optimisation degrades to baseline behaviour for any TXID added after saturation.\n- At ~96 bytes per entry (32-byte hash + Go map overhead), 10M entries is ~1 GB worst-case; 100M ≈ 10 GB.\n\n### Recommendations\n- **10000000** (default) - safe ceiling for general workloads\n- Increase (e.g. 100000000) for tightly-chained / high-throughput workloads where parents span prune sessions and the catch rate is bottlenecked by saturation\n- **0** - unlimited; only safe if you know the working-set will not grow unbounded\n- Disabled automatically when pruner_utxoDefensiveEnabled=true (defensive mode bypasses the optimisation)"`

@github-actions github-actions Bot May 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - Documentation has been updated. The pruner_settings.go now correctly reflects the cuckoo filter memory usage (~1 byte/entry): 10M ≈ 10 MiB, 100M ≈ 100 MiB, 2B ≈ 2 GiB.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the next commit. Replaced the ~96 bytes / map-overhead memory estimate with the cuckoo-filter reality (10M ≈ 20 MB, 100M ≈ 200 MB, 2B ≈ 2 GiB) and clarified that maxEntries=0 means the built-in 2B default (not unlimited). Also tightened the usage tag and recommendations to match.

@freemans13 freemans13 changed the title feat(pruner): back PrunedTxSet with a cuckoo filter (~1 byte/entry) perf(pruner): skip already-pruned parent updates using a persistent cuckoo filter May 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve Aerospike UTXO pruning throughput by avoiding wasted “update parent” round-trips when the parent TX has already been pruned, using a sharded cuckoo-filter-backed PrunedTxSet and (when defensive mode is off) combining parent updates + child deletions into a single BatchOperate.

Changes:

  • Introduces a specialized lock-free cuckoo filter (util/cuckoo.H32) and uses it to rework PrunedTxSet into a sharded two-generation structure.
  • Updates the Aerospike pruner to (a) pre-register chunk TXIDs in the pruned set before processing and (b) optionally combine parent updates + deletions into one batch call.
  • Adds config/metrics and updates tests/benchmarks for the new pruned-set behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
util/cuckoo/cuckoo_h32.go Adds lock-free cuckoo filter implementation specialized for 32-byte hashes.
util/cuckoo/cuckoo_h32_test.go Adds correctness and perf-guard tests for the new cuckoo filter.
stores/utxo/aerospike/pruner/pruner_service.go Integrates persistent-in-process PrunedTxSet, adds metrics, and introduces combined batch cleanup when not in defensive mode.
stores/utxo/aerospike/pruner/pruned_tx_set.go Replaces map-backed pruned set with sharded two-generation cuckoo-backed set and rotation logic.
stores/utxo/aerospike/pruner/pruned_tx_set_test.go Updates unit tests to reflect probabilistic behavior + rotations.
stores/utxo/aerospike/pruner/pruned_tx_set_bench_test.go Adds benchmarks comparing cuckoo-backed set vs prior map approach.
stores/utxo/aerospike/pruner/pruned_set_skip_test.go Updates skip-path unit test expectations for the new add timing/behavior.
stores/utxo/aerospike/pruner/partition_retry_test.go Ensures new Prometheus gauges are initialized for tests that bypass NewService().
settings/settings.go Adds pruner_utxoPrunedSetMaxEntries setting default.
settings/pruner_settings.go Documents the new pruner_utxoPrunedSetMaxEntries setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread util/cuckoo/cuckoo_h32.go
Comment on lines +237 to +241
fp = displaced
i = cf.altIndex(fp, i)
if cf.tryInsertCAS(fp, i) {
return true
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed. The eviction loop now treats displaced==nullFinger as immediate success (a concurrent Delete left the slot empty, so our fp landed cleanly and there is nothing to relocate). Count is incremented inline in that branch so it matches the tryInsertCAS bookkeeping.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already handled — when displaced == nullFinger the eviction path increments Count and returns true immediately, so tryInsertCAS is never called with fp == 0. No spurious insert and no double-count.

Comment thread util/cuckoo/cuckoo_h32.go
Comment on lines +233 to +236
if !swapped {
cf.saturated.Store(true)
return false
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the inner-CAS retry exhaustion is contention, not provable saturation. Fixed: the saturated flag is no longer latched on the !swapped branch (we just back out and let the caller retry). Latching is now reserved for the genuine maxKicks-exhaustion path at the bottom of the loop.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already handled — the !swapped (CAS-retries-exhausted) path returns false WITHOUT latching saturated; only exhausting the full maxKicks eviction loop latches it. Transient contention therefore no longer permanently disables eviction.

Comment on lines +714 to +717
// Memory is bounded by settings.Pruner.UTXOPrunedSetMaxEntries (default 2B entries
// ≈ 2 GiB at the cuckoo's ~1 byte per entry). Once the cap is reached, Insert silently
// fails for new entries and the optimisation degrades to baseline; the
// utxo_pruner_pruned_set_saturated gauge surfaces this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment block in pruner_service.go to reflect the actual model: in-memory only (not disk-persisted), default 10M settings cap with 2B opt-in via 0, two-generation rotation on saturation rather than silent Insert failure, and the new utxo_pruner_pruned_set_rotations_total counter for the routine saturation signal.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — that comment block was rewritten to match the implementation: default 10M (0 ⇒ 2B), and rotation-on-saturation rather than 'Insert silently fails once the cap is reached'.

Comment on lines +1678 to +1682
return errors.NewStorageError("%d parent update operations failed", parentErrors)
}
if deleteErrors > 0 {
s.logger.Errorf("Combined cleanup: %d deletion errors (first: %v)", deleteErrors, firstDeleteErr)
return errors.NewStorageError("%d deletion operations failed", deleteErrors)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — firstParentErr is now wrapped as the final variadic argument to NewStorageError so the aerospike error chain is preserved on the returned error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the combined-cleanup error now wraps the underlying cause: NewStorageError("%d parent update operations failed", parentErrors, firstParentErr) and likewise firstDeleteErr for deletions, so the root Aerospike error is preserved in the chain.

Comment on lines +1678 to +1682
return errors.NewStorageError("%d parent update operations failed", parentErrors)
}
if deleteErrors > 0 {
s.logger.Errorf("Combined cleanup: %d deletion errors (first: %v)", deleteErrors, firstDeleteErr)
return errors.NewStorageError("%d deletion operations failed", deleteErrors)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — firstDeleteErr is now wrapped as the final variadic argument to NewStorageError so the underlying aerospike cause is preserved.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the combined-cleanup error now wraps the underlying cause: NewStorageError("%d parent update operations failed", parentErrors, firstParentErr) and likewise firstDeleteErr for deletions, so the root Aerospike error is preserved in the chain.

Comment on lines +306 to +309
// PrunedTxSet is persistent across prune sessions so children whose parents were pruned
// in earlier sessions can still skip the parent-update round-trip. Defensive mode is
// incompatible with the optimisation because records may be skipped after the reader
// registers them.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description was overstating things — the set is held on the Service struct in memory, so it survives across prune sessions within a pod lifetime but is NOT persisted to disk across restarts. Updated the source-code docstring in pruner_service.go to spell that out ("lives only in memory — rebuilt from scratch on pod restart"). I'm not able to edit the PR body from this loop, so I'll update it manually after the loop.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

Comment thread util/cuckoo/cuckoo_h32_test.go Outdated
Comment on lines +118 to +127
// Time a batch of Inserts that we expect to fail. Even at b.N = 1M
// these should all return immediately (no eviction churn), so total
// wall time is bounded by the fast-path overhead only.
start := time.Now()
const n = 100_000
for i := 0; i < n; i++ {
var h [32]byte
_, err := rand.Read(h[:])
require.NoError(t, err)
cf.Insert(&h)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hashes are now pre-generated outside the timed loop in TestH32_SaturationLatchesAndShortCircuits, so the measurement reflects the saturated-Insert fast path rather than crypto/rand.Read cost.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the saturation timing test now pre-generates all hashes outside the timed region, so the measured elapsed reflects the short-circuited Insert fast path rather than crypto/rand.Read. A comment explains this.

Comment thread stores/utxo/aerospike/pruner/pruned_tx_set.go
Comment on lines 35 to 37
func TestPrunedTxSet_AddAndContains(t *testing.T) {
set := NewPrunedTxSet(16, 0)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — all NewPrunedTxSet calls in pruned_tx_set_test.go that previously used 0 now pass 4096 (a comfortable small ceiling for the tests that exercise add/contains/remove/rotation behaviour). TestPrunedTxSet_DefaultCapacity now just sanity-checks the defaultPrunedTxSetCapacity constant rather than actually constructing a 2B filter.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — these tests now pass an explicit small maxEntries (e.g. 4096 / 1_048_576) instead of 0, so they no longer fall back to the 2B default and its multi-GiB allocation.

Comment on lines +142 to +143
require.Equal(t, 1, prunedSet.Len(),
"parents removed, child TXID added during the parallel-Add overlap")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — both NewPrunedTxSet calls in pruned_set_skip_test.go now pass 4096 instead of 0.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — these tests now pass an explicit small maxEntries (e.g. 4096 / 1_048_576) instead of 0, so they no longer fall back to the 2B default and its multi-GiB allocation.

Update pruner_utxoPrunedSetMaxEntries memory documentation to match the
cuckoo-filter implementation. The previous text described the old map-based
design (~96 B/entry, 10M=1GB, 100M=10GB), which was off by ~50x. Also clarify
that maxEntries=0 uses the built-in 2B default rather than being unlimited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

Comment thread util/cuckoo/cuckoo_h32.go Outdated
Comment on lines +82 to +86
// bucketH32 holds 4 fingerprint bytes. Naturally 4-byte aligned in the
// `buckets` slice, so atomic uint32 ops on the underlying memory are safe.
type bucketH32 [bucketSize]uint8

// NewH32 returns a filter sized to hold approximately the requested number

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — buckets is now []uint32 (not [4]uint8), so atomic uint32 ops on slice elements are alignment-guaranteed and the unsafe import is gone. This Copilot review batch landed against the round-1 HEAD (8657c1d) before the fix had pushed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concern has been addressed in the current code. The implementation now uses buckets []uint32 (line 57 of cuckoo_h32.go) instead of []bucketH32, which guarantees 4-byte alignment for atomic operations. The bucketAddr method returns &cf.buckets[i] which is a properly aligned *uint32.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — bucket storage is now a []uint32 (not [4]uint8), so each bucket word is naturally 4-byte aligned and the atomic.Load/CompareAndSwapUint32 ops on &buckets[i] are guaranteed safe on every architecture Go supports. The doc comment states this explicitly.

Comment thread util/cuckoo/cuckoo_h32.go
Comment on lines +233 to +236
if !swapped {
cf.saturated.Store(true)
return false
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — the !swapped branch in Insert no longer latches the saturated flag; that's reserved for genuine maxKicks exhaustion. CAS-retry exhaustion is now treated as transient contention and the Insert just backs out.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already handled — the !swapped (CAS-retries-exhausted) path returns false WITHOUT latching saturated; only exhausting the full maxKicks eviction loop latches it. Transient contention therefore no longer permanently disables eviction.

Comment thread util/cuckoo/cuckoo_h32_test.go Outdated
Comment on lines +118 to +127
// Time a batch of Inserts that we expect to fail. Even at b.N = 1M
// these should all return immediately (no eviction churn), so total
// wall time is bounded by the fast-path overhead only.
start := time.Now()
const n = 100_000
for i := 0; i < n; i++ {
var h [32]byte
_, err := rand.Read(h[:])
require.NoError(t, err)
cf.Insert(&h)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — TestH32_SaturationLatchesAndShortCircuits pre-generates the 100K hashes outside the timed region so the assertion measures the saturated-Insert fast path rather than crypto/rand cost.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the saturation timing test now pre-generates all hashes outside the timed region, so the measured elapsed reflects the short-circuited Insert fast path rather than crypto/rand.Read. A comment explains this.

})
prometheusUtxoPrunedSetSaturated = promauto.NewGauge(prometheus.GaugeOpts{
Name: "utxo_pruner_pruned_set_saturated",
Help: "1 if the PrunedTxSet has reached its configured maxEntries cap, 0 otherwise",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — added utxo_pruner_pruned_set_rotations (driven by PrunedTxSet.Rotations()) as the routine saturation signal, and reframed the saturated gauge's Help text to make clear it's a backstop / error signal. In the current commit I've also dropped the _total suffix from the rotations metric to follow Prometheus convention.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

Comment on lines +306 to +312
// PrunedTxSet is persistent across prune sessions so children whose parents were pruned
// in earlier sessions can still skip the parent-update round-trip. Defensive mode is
// incompatible with the optimisation because records may be skipped after the reader
// registers them.
if !service.defensiveEnabled {
service.prunedSet = NewPrunedTxSet(256, settings.Pruner.UTXOPrunedSetMaxEntries)
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — the source comment now spells out "lives only in memory — rebuilt from scratch on pod restart and is not persisted to disk". The PR description still has the stale wording; I'll update that out-of-band (the review-loop tooling blocks PR body edits).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

Comment thread settings/settings.go Outdated
SkipPreserveParents: getBool("pruner_skipPreserveParents", false, alternativeContext...), // Skip Phase 1: preserve parents
SkipDeletions: getBool("pruner_skipDeletions", false, alternativeContext...), // Skip deletions for performance
MinBlockHeight: getUint32("pruner_min_block_height", 0, alternativeContext...), // Do not prune blocks at or below this height
UTXOPrunedSetMaxEntries: getInt("pruner_utxoPrunedSetMaxEntries", 10_000_000, alternativeContext...), // Soft cap on PrunedTxSet entries; 0 = unlimited

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — settings.go inline comment now reads 0 = use built-in 2B default (NOT unlimited). This review batch reviewed 8657c1d before the round-2 fix had pushed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the settings comment no longer says '0 = unlimited'. It now reads '0 = use built-in 2B default (NOT unlimited)', matching NewPrunedTxSet's handling of maxEntries<=0.

Comment thread settings/pruner_settings.go Outdated
SkipPreserveParents bool `key:"pruner_skipPreserveParents" desc:"Skip Phase 1: preserve parents of unmined transactions" default:"false" category:"Pruner" usage:"Skip parent preservation phase" type:"bool" longdesc:"### Purpose\nSkips Phase 1 of pruning which preserves parent transactions of old unmined transactions.\n\n### How It Works\nWhen enabled, the pruner skips calling PreserveParentsOfOldUnminedTransactions.\nThis means parent transactions will not be protected from deletion even if they have unmined children.\n\n### Trade-offs\n| Setting | Benefit | Drawback |\n|---------|---------|----------|\n| false (default) | Parents preserved for unmined tx resubmission | Additional processing overhead |\n| true | Faster pruning, reduced processing | Parents may be deleted, breaking unmined tx chains |\n\n### Recommendations\n- **false** (default) - Normal operation, preserves parent transactions\n- **true** - Skip parent preservation if you don't need to resubmit unmined transactions"`
MinBlockHeight uint32 `key:"pruner_min_block_height" desc:"Minimum block height before pruning begins" default:"0" category:"Pruner" usage:"Skip all pruning until block height exceeds this value" type:"uint32" longdesc:"### Purpose\nPrevents all pruning operations until the blockchain reaches a minimum height.\n\n### How It Works\n- When set to a value > 0, the pruner skips ALL operations (parent preservation, DAH deletion, blob deletion) for any block with height <= this value\n- At height 0 (default), pruning behaves normally from the start\n- Useful for fresh environment bootstrapping where initial blocks must remain available for cross-node validation\n\n### Recommendations\n- **0** (default) - Normal operation, prune from the start\n- **300** - Typical value for dev environments that need to mine initial blocks for coinbase splitting"`
SkipDeletions bool `key:"pruner_skipDeletions" desc:"Skip deletion operations" default:"false" category:"Pruner" usage:"Skip deletion operations" type:"bool" longdesc:"### Purpose\nSkips deletion operations during pruning.\n\n### How It Works\nWhen enabled, the pruner skips deletion operations during pruning.\n\n### Recommendations\n- **false** (default) - Normal operation\n- **true** - Skip deletion operations"`
UTXOPrunedSetMaxEntries int `key:"pruner_utxoPrunedSetMaxEntries" desc:"Soft cap on entries in the in-memory pruned-TX set" default:"10000000" category:"Pruner" usage:"0 = use default (2B); saturation causes Inserts to no-op" type:"int" longdesc:"### Purpose\nCaps the in-memory size of the PrunedTxSet, which tracks TXIDs pruned across sessions so that wasteful Aerospike parent updates can be skipped pre-flight.\n\n### How It Works\n- The set is persisted on the Service struct and reused across prune sessions.\n- When a record is scanned for deletion, its TXID is registered. Children later check the set to skip parent-update round-trips for parents already known to be gone.\n- Backed by a sharded two-generation cuckoo filter. Once a generation saturates it rotates into the 'previous' slot and a fresh 'current' is allocated, so the set never freezes — but entries fall out of 'previous' on the next rotation.\n- At ~1 byte per entry across both generations (cuckoo filter): 10M entries ≈ 20 MB; 100M ≈ 200 MB; 2B (the default when maxEntries=0) ≈ 2 GiB.\n\n### Recommendations\n- **10000000** (default) - safe ceiling for general workloads (~20 MB)\n- Increase (e.g. 100000000 ≈ 200 MB, or 0 for the 2B / ~2 GiB default) for tightly-chained / high-throughput workloads where parents span prune sessions and the catch rate is bottlenecked by rotation pressure\n- **0** - use the built-in 2B default (≈ 2 GiB); appropriate when the working-set may grow large\n- Disabled automatically when pruner_utxoDefensiveEnabled=true (defensive mode bypasses the optimisation)"`

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed in 2cfdd0e (longdesc updated to describe rotation and memory) and fully tightened in this commit (the short usage tag now reads 0 = use built-in 2B default; on saturation the current generation rotates, it does not stop accepting Adds). Also corrected the 10M ≈ 20 MB figure to ≈ 10 MiB to match the actual sizing math (see 3266345357).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the longdesc was rewritten for the cuckoo/two-generation model: it describes rotation-on-saturation (not the old map-based 'Add becomes a no-op' soft cap) and the ~1 byte/entry budget, and no longer quotes the ~96 bytes/entry map cost.

Comment on lines +49 to +52
// Memory budget: total memory ≈ 2 × maxEntries × ~1 byte (both
// generations live simultaneously). NewPrunedTxSet sizes each generation
// at maxEntries / (2 × shardCount) so the SUM of capacity across all
// shards and both generations equals the configured maxEntries.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — that comment block was internally inconsistent. Fixed: the doc now says maxEntries is a TOTAL budget across both generations of all shards (memory ≈ maxEntries × ~1 byte), which matches the sizing code. The settings longdesc and the pruner_service.go doc are now consistent with the same model.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the memory-budget comment was rewritten to be internally consistent: maxEntries is the TOTAL across both generations of all shards, each generation sized at maxEntries/(2×shardCount), so total memory ≈ maxEntries × ~1 byte.

Comment on lines +139 to +143
// Both parents removed via CheckAndRemove (Len -= 2), and the record's
// own TXID is now Added by the deferred-Add path that runs in parallel
// with flushCleanupBatches (Len += 1). Net: 1 entry (the child).
require.Equal(t, 1, prunedSet.Len(),
"parents removed, child TXID added during the parallel-Add overlap")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2cfdd0e — the test comment now says the chunk's own TXID is Added by the up-front loop at the start of processRecordChunk. This review batch reviewed 8657c1d before the round-2 fix had pushed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the comment was updated to describe the current up-front Add: TXIDs are Added at the top of processRecordChunk before any CheckAndRemove/flush, not the old deferred-Add path.

b.Helper()
hashes := make([]chainhash.Hash, n)
for i := range hashes {
_, _ = rand.Read(hashes[i][:])

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — makeHashes (in pruned_tx_set_bench_test.go) now b.Fatalf's on rand.Read error rather than silently producing low-entropy hashes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — makeHashes now checks the rand.Read error and fails via b.Fatalf.

// life of this process. This lets children whose parents were pruned in an earlier
// session (the common case for chains crossing block-height boundaries) still skip
// the parent-update round-trip. The set lives only in memory — it is rebuilt from
// scratch on pod restart and is not persisted to disk.

@github-actions github-actions Bot May 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[RESOLVED - Comment was incorrect]

The PR description actually states correctly (from the Summary section):

"It is not persisted to disk: on pod restart it is rebuilt from scratch"

And from the Notes section:

"Not persisted across restarts. The set lives only in memory; a pod restart rebuilds it from scratch"

Both the PR description and implementation correctly document that the set is in-memory only and NOT persisted to disk. This comment was based on a misreading of the PR description.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is the recurring "PR description claims disk persistence" issue (also flagged in 3266256410, 3266345176, 3266430001, 3266631775). The source-code documentation now correctly describes the in-memory-only behaviour (pruner_service.go:710 and pruner_settings.go:28 both spell it out). The review-loop tooling I'm running denies PR body edits, so I've left the description for the author to update manually. The implementation matches the source-code docs; only the PR description (which is metadata, not code) is stale.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment thread stores/utxo/aerospike/pruner/pruned_tx_set.go
Comment on lines +225 to +230
// Saturated reports whether the set has experienced any insert failures
// since construction. With the two-generation design, the only way to
// see InsertFailures > 0 is if a freshly-allocated generation refused
// an Insert (effectively never in normal operation), so Saturated() is
// now best read as "something went badly wrong" rather than "we're full".
// Use Rotations() to see how often the set is recycling.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Saturated() docstring now enumerates both paths that can increment insertFailures (the contention-without-rotation path added in this PR's round-4 fix, AND insertion into a freshly-rotated generation also failing) and explicitly frames it as an error/backstop signal rather than a routine at-capacity indicator.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

})
prometheusUtxoPrunedSetSaturated = promauto.NewGauge(prometheus.GaugeOpts{
Name: "utxo_pruner_pruned_set_saturated",
Help: "1 if any PrunedTxSet Insert has failed even on a freshly-rotated generation (backstop / error signal — should be 0 in normal operation; use utxo_pruner_pruned_set_rotations for routine saturation)",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — utxo_pruner_pruned_set_saturated help text now reads "1 if any PrunedTxSet Insert has failed since construction without rotation recovering it (extreme CAS contention without saturation, or insertion into a freshly-rotated generation also failing)". Same framing as the Saturated() docstring.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

NewPrunedTxSet memory budget comment
- Reworded "matches the configured maxEntries" to capture two
  upper-bound caveats: (a) cuckoo.NewH32 rounds bucket count up to
  the next power of two (up to ~2x slot overhead per generation),
  and (b) cuckoo filters run well below 100% load before saturating.
  Operators should treat maxEntries as a target, not a hard RSS
  ceiling.

PrunedTxSet.Saturated() docstring
- Now enumerates the two distinct paths that can increment
  insertFailures, both of which are error/backstop signals (not
  routine at-capacity indicators):
    * Add observed Insert failure without cur.Saturated() after a
      retry (extreme CAS contention).
    * Insert failed on a freshly-rotated generation.
- This matches the actual semantics after the round-4 PrunedTxSet.Add
  fix that introduced the contention-without-rotation backstop.

utxo_pruner_pruned_set_saturated help text
- Synced with the Saturated() docstring so the metric description
  reflects the real "any insert failure after retries/rotation"
  condition rather than just the freshly-rotated case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comment on lines +706 to +710
// PrunedTxSet is owned by the Service and reused across prune sessions within the
// life of this process. This lets children whose parents were pruned in an earlier
// session (the common case for chains crossing block-height boundaries) still skip
// the parent-update round-trip. The set lives only in memory — it is rebuilt from
// scratch on pod restart and is not persisted to disk.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the in-code documentation (around lines 706-735 of pruner_service.go) is now the source of truth: the set is in-memory only, rebuilt on pod restart, and not persisted to disk. The PR description predates that comment block being rewritten; I'll get the description aligned with the code rather than the other way around. No behaviour change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

Comment thread settings/settings.go
SkipPreserveParents: getBool("pruner_skipPreserveParents", false, alternativeContext...), // Skip Phase 1: preserve parents
SkipDeletions: getBool("pruner_skipDeletions", false, alternativeContext...), // Skip deletions for performance
MinBlockHeight: getUint32("pruner_min_block_height", 0, alternativeContext...), // Do not prune blocks at or below this height
UTXOPrunedSetMaxEntries: getInt("pruner_utxoPrunedSetMaxEntries", 10_000_000, alternativeContext...), // Soft cap on PrunedTxSet entries; 0 = use built-in 2B default (NOT unlimited)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed: the implemented default is 10,000,000 (settings.go:539), with the sentinel value 0 selecting the built-in 2,000,000,000 ceiling. The PR description's "default 2B" is inaccurate and I'll update the description to match the code rather than change the default — 10M is the deliberate conservative default for fresh deployments; operators opt into the larger cap.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — updated the PR description: the shipped default is 10,000,000 (~10 MiB), and 0 selects the built-in 2B (~2 GiB) default. The dev-scale-1 numbers were measured with maxEntries=0 (the 2B cap), now annotated as such in the description.

Comment on lines +33 to +44
// b.N unique hashes ensures we measure first-time inserts (the production
// case), not eviction loops from re-inserting duplicate fingerprints.
func BenchmarkCuckoo_Add(b *testing.B) {
// Pre-generate enough hashes for b.N iterations so each Add is unique.
// We allocate them outside the timed region.
const cap_ = 64_000_000
hashes := makeHashes(b, min(b.N, cap_/2))
set := NewPrunedTxSet(256, cap_)
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
set.Add(hashes[i%len(hashes)])

@freemans13 freemans13 May 21, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capped the pool at 1M hashes (~32 MiB) and stride via i & (poolSize-1) — bench memory is now O(poolSize) regardless of b.N. Updated the surrounding comment to be honest about what the benchmark measures after the pool wraps (cuckoo's idempotent re-insert fast path), since that side effect of capping is real.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — BenchmarkCuckoo_Add caps the pre-generated pool at 1 << 20 (~32 MiB) and cycles through it via i&(poolSize-1), independent of b.N.

Comment on lines +27 to +30
// Tests use a deliberately small per-shard capacity (4096 buckets × 4 slots =
// 16K slots per shard) so the bench can run on a developer machine without
// allocating multi-GB filters. The per-op cost we're measuring (hash extract,
// bucket access, lock acquire) is independent of total capacity.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the block comment to match what the bench actually constructs: NewPrunedTxSet(256, 64_000_000) → 256 shards × ~250K entries/shard → ~65,536 buckets per shard after the cuckoo's power-of-two rounding. The old "4096 buckets" number was stale.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the benchmark comment was corrected to describe the actual construction (NewPrunedTxSet(256, 64_000_000) ⇒ ~65,536 buckets/shard, ~64 MiB) and notes the power-of-two rounding.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes. Two blockers.

  1. PR description vs code disagree on persistence. Body says "persisted to disk across pruner restarts" + references pruner_utxoPrunedSetMaxEntries. pruner_service.go:709-710 says the opposite: "NOT persisted across pod restarts." Setting longdesc agrees with the code. The 99.9% catch rate resets to ~7% after every pod restart. Either wire the persistence or fix the description.

  2. bucketCASDelete (util/cuckoo/cuckoo_h32.go:164-182) misses items under contention. Per-slot CAS loop does if b != fp { break } — if a concurrent writer placed fp between Load and the check (stale zero read → break), the delete silently misses. Same issue in tryInsertCAS on the b != nullFinger branch. bucketHas does this correctly: single atomic.LoadUint32 + SWAR on the full word. Both CAS functions should match.

Other items in full draft: two-generation drain-on-rotate not quantified, 10M default vs 2B constant mismatch in PR memory table, CheckAndRemove destructive semantics under high fan-in, per-shard metrics missing.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment on lines +706 to +710
// PrunedTxSet is owned by the Service and reused across prune sessions within the
// life of this process. This lets children whose parents were pruned in an earlier
// session (the common case for chains crossing block-height boundaries) still skip
// the parent-update round-trip. The set lives only in memory — it is rebuilt from
// scratch on pod restart and is not persisted to disk.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

// Operators should treat maxEntries as the target rather than a
// hard ceiling on RSS; the worst-case allocation overhead is bounded
// by the power-of-two rounding.
perShard := uint(maxEntries / n / 2)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9beb36dperShard is now clamped to a minimum (minPerShardCap = 1024) so a small maxEntries can't drive it to 0 and produce a degenerate single-bucket filter that rotates every few inserts (~1 KiB/generation floor).

Comment on lines 60 to 67
type PrunedTxSet struct {
shards []prunedTxShard
mask uint8 // shardCount - 1, for fast modulo via bitwise AND; uint8 is sufficient because shardCount is capped at 256
count atomic.Int64
maxEntries int64 // soft cap on Len(); 0 means unlimited
shards []prunedTxShard
mask uint8
perShardCap uint // capacity of each generation in each shard
insertFailures atomic.Int64
rotations atomic.Int64 // number of generation rotations across all shards
capacity int64
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9beb36d — the unused capacity field was removed.

… b.N

Pre-generating min(b.N, 32M) random hashes blew past 1 GiB on CI runners
when go test auto-tuned b.N very high. Switch to a fixed 1M-hash pool
and stride through it with a bitmask. Memory is now O(poolSize),
independent of b.N. Updated the surrounding block comment to match the
actual sizing the benches construct (256 shards × ~250K entries/shard,
not the stale '4096 buckets × 4 slots' number).

Behavioural note: once b.N exceeds the pool size, subsequent Adds hit
already-present fingerprints and exercise the cuckoo's idempotent
re-insert fast path rather than fresh evictions. The hash-extract,
bucket-access, and atomic-CAS hot path under measurement is the same.
@github-actions

Copy link
Copy Markdown
Contributor

Clarification on PR description claim

The PR description mentions "The set is persisted to disk across pruner restarts," but the implementation only keeps it in memory (rebuilt on pod restart). The code documentation is correct (pruner_service.go:710, settings/pruner_settings.go:28 both state it is NOT persisted to disk). Consider updating the PR description to clarify this is in-memory persistence across prune sessions within a single process.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comment on lines +706 to +710
// PrunedTxSet is owned by the Service and reused across prune sessions within the
// life of this process. This lets children whose parents were pruned in an earlier
// session (the common case for chains crossing block-height boundaries) still skip
// the parent-update round-trip. The set lives only in memory — it is rebuilt from
// scratch on pod restart and is not persisted to disk.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — the disk-persistence claim was stale. The set is in-memory only (reused across prune sessions within a single process, rebuilt from scratch on pod restart). I've updated the PR title and description to drop the persistence claims and state the in-memory behaviour explicitly; the in-code comments already document it.

})
prometheusUtxoPrunedSetSaturated = promauto.NewGauge(prometheus.GaugeOpts{
Name: "utxo_pruner_pruned_set_saturated",
Help: "1 if any PrunedTxSet Insert has failed since construction without rotation recovering it (extreme CAS contention without saturation, or insertion into a freshly-rotated generation also failing — both are error/backstop signals; should be 0 in normal operation. Use utxo_pruner_pruned_set_rotations for routine cap pressure.)",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the metric help text and PrunedTxSet.Saturated() godoc were reworded to the actual semantics: sticky, '≥1 insert failure since construction' — an error/backstop signal (extreme CAS contention, or a failed insert into a freshly-rotated generation), explicitly NOT the at-capacity indicator. Routine cap pressure is surfaced by utxo_pruner_pruned_set_rotations.

Comment thread settings/settings.go
SkipPreserveParents: getBool("pruner_skipPreserveParents", false, alternativeContext...), // Skip Phase 1: preserve parents
SkipDeletions: getBool("pruner_skipDeletions", false, alternativeContext...), // Skip deletions for performance
MinBlockHeight: getUint32("pruner_min_block_height", 0, alternativeContext...), // Do not prune blocks at or below this height
UTXOPrunedSetMaxEntries: getInt("pruner_utxoPrunedSetMaxEntries", 10_000_000, alternativeContext...), // Soft cap on PrunedTxSet entries; 0 = use built-in 2B default (NOT unlimited)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — updated the PR description: the shipped default is 10,000,000 (~10 MiB), and 0 selects the built-in 2B (~2 GiB) default. The dev-scale-1 numbers were measured with maxEntries=0 (the 2B cap), now annotated as such in the description.

@freemans13 freemans13 changed the title perf(pruner): skip already-pruned parent updates using a persistent cuckoo filter perf(pruner): skip already-pruned parent updates using an in-memory cuckoo filter Jun 4, 2026
freemans13 and others added 2 commits June 4, 2026 16:31
Address Copilot review findings on PR bsv-blockchain#875:

- perShard could reach 0 when maxEntries < 2×shardCount, making
  cuckoo.NewH32 round up to a single 4-slot bucket that saturates and
  rotates every few inserts. Add a minPerShardCap=1024 floor (~1 KiB per
  generation) so degenerate configs stay functional.
- Remove the `capacity` field: it was stored in the constructor but never
  read anywhere in the package.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 4, 2026

Copy link
Copy Markdown

@freemans13 freemans13 requested review from ordishs and oskarszoon June 4, 2026 16:17

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both blockers from the last pass are fixed and verified, so approving. The persistence story is now consistent in-memory-only across the body / pruner_service.go / longdesc, and the bucketCASDelete/tryInsertCAS race is closed (break on slot-mismatch + reload-cur-on-retry; race detector clean across both packages, ×5 runs). The CAS fix checks out under review and the benchmarks are strong (66 ns/op add 0-alloc, near-linear parallel, no contention).

Recommend before merge (non-blocking — code is correct):

  • Memory: 10M ≈ ~16 MiB, not 10 MiB (settings/pruner_settings.go:28, pruned_tx_set.go:54, body). Power-of-two bucket rounding in NewH32 makes it ~1.68 bytes/entry at 10M; the ~1 byte/entry formula only holds at large configs.
  • The 99.9% catch / 2.6M rec/sec figures are for maxEntries=0 (~2 GiB), not the shipped 10M default. At production TPS a generation fills in ~3s, so cross-session catch — the feature's main motivation — is near zero at 10M. The "safe ceiling for general workloads" framing reads as if the optimization works at the default; it doesn't for cross-session parents. Worth telling operators to set 0 or 100M+ for the advertised benefit.

And two regression tests given the CAS-bug history (the cuckoo package's own tests wouldn't currently catch a regression):

  • A direct H32 concurrent insert/delete test that asserts Count() (today's TestPrunedTxSet_ConcurrentAccess only checks "no panics").
  • A previous-generation lookup test (insert → force rotation → query the pre-rotation hash) — the two-generation fallback branch (Contains/Len:211/237) is currently 0% covered.

Nice work closing both blockers cleanly.

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with minor follow-ups.

Excellent, exceptionally well-documented PR. The design is sound, the lock-free cuckoo hot path is correct in its memory model (naturally-aligned uint32 buckets, atomic load + SWAR equality, CAS insert/delete; altIndex involution holds), and shard/fingerprint/index byte usage is cleanly independent.

Suggested non-blocking follow-ups:

  1. Correctness rests entirely on the invariant that the deletedChildren bin is only read in defensive mode and prunedSet is nil there. Worth a runtime guard/assertion so a future regression fails loudly. A unit test asserting prunedSet == nil when defensiveEnabled would lock this in.
  2. executeBatchCleanupCombined has no unit test — the [0,parentEnd) vs [parentEnd,end) result-region split and UDF/BatchWrite TX_NOT_FOUND/KEY_NOT_FOUND accounting are mockable and worth covering.
  3. One line in the pruner_utxoPrunedSetMaxEntries longdesc noting that 0 eagerly allocates ~1 GiB of current-gen filters at construction.

Watch-items (not blockers): cuckoo Insert is not idempotent, so partition-retry re-scans inflate Count and can accelerate rotation, lowering catch rate under churn — the rotations gauge is the right signal. Count/Len are approximate by design.

Confirm go test -race and golangci-lint on util/cuckoo/... and stores/utxo/aerospike/pruner/... before merge.

@freemans13 freemans13 merged commit 1937972 into bsv-blockchain:main Jun 5, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants