Skip to content

Add opt-in drain mode for the outpoint batcher (off by default)#865

Merged
freemans13 merged 11 commits into
bsv-blockchain:mainfrom
freemans13:stu/aerospike-outpoint-drain-mode
Jun 5, 2026
Merged

Add opt-in drain mode for the outpoint batcher (off by default)#865
freemans13 merged 11 commits into
bsv-blockchain:mainfrom
freemans13:stu/aerospike-outpoint-drain-mode

Conversation

@freemans13

@freemans13 freemans13 commented May 14, 2026

Copy link
Copy Markdown
Collaborator

What this changes

Adds a new opt-in setting, utxostore_outpointBatcherDrainMode (default false).

The outpoint batcher looks up parent transaction data (amount + locking script) for the inputs on every tx. It's used by PreviousOutputsDecorate and BatchPreviousOutputsDecorate. Normally, when it has a small batch of inputs it waits up to its full flush timer (default 10ms) before sending the lookup to Aerospike, in case more inputs are coming.

When this setting is on, the batcher drains whatever is already queued (up to the size cap) and fires immediately, instead of waiting on the timer.

It's the same drain-mode mechanism the four existing batchers already expose (Get / Spend / Store / Locked), now wired up for the outpoint batcher too.

When to turn it on

The clean win is cmd/rewindblockchain. That's a single-producer, separate process that calls PreviousOutputsDecorate one tx at a time. Each call offers ~2 inputs — far below the batcher's size cap — so without drain mode every call sits idle for the full 10ms timer before its lookup goes out. Drain mode removes that wait. Benchmarked ~90%+ faster at these low, sequential tx counts.

Turn it on with utxostore_outpointBatcherDrainMode=true.

When to leave it off (the default)

Do not enable this inside a running node to try to speed up the per-tx decorate path. That batcher is shared store-wide, so the toggle hits the concurrent hot path too:

Benchmarking drain on vs off on that concurrent path (BenchmarkBatchPreviousOutputsDecorateDrainMode, added here) shows drain mode is bimodal / heavy-tailed at mid tx counts (~64–256): with drain off latency is rock-stable, with drain on the mean is unpredictable and some runs are several times slower. A node wants predictable decorate latency, so it stays off by default.

The batcher is store-wide — this toggle can't be applied to one caller without affecting the others in the same process. Measure per deployment before flipping it on.

Why off by default

  • Keeps existing behaviour identical on upgrade.
  • Small drained batches can hit Aerospike's per-item fallback path on some deployments.
  • The concurrent-path behaviour above is unpredictable, so opting in should be a deliberate, measured choice.

Files touched

  • settings/utxostore_settings.go — new setting, following the existing pattern
  • settings/settings.go — default value (false)
  • stores/utxo/aerospike/aerospike.go — conditional s.outpointBatcher.SetDrainMode(true) next to the other four, with a comment explaining the on/off tradeoff
  • stores/utxo/aerospike/aerospike_server_test.go — wiring smoke test (drain mode on, both decorate paths still decorate every input correctly)
  • stores/utxo/aerospike/outpoint_drain_bench_test.go — drain on-vs-off benchmark across tx counts, the evidence behind the "leave it off in a node" guidance

Testing

  • go build ./... clean
  • go vet ./stores/utxo/aerospike/... ./settings/... clean
  • go test -race -short ./stores/utxo/aerospike/ ./settings/ — passing (one pre-existing env-related failure in TestGetUTXOStoreURL that also fails on main)

The outpoint batcher (used by PreviousOutputsDecorate /
BatchPreviousOutputsDecorate to look up parent outputs for tx inputs)
was the only UtxoStore batcher missing a drain-mode toggle. The four
existing toggles (Get / Spend / Store / Locked) all default to false
and let operators opt in per batcher.

Add OutpointBatcherDrainMode following the same pattern:
- new bool field on UtxoStore settings with key/desc/default/longdesc
- default false in settings.go
- conditional s.outpointBatcher.SetDrainMode(true) alongside the others

Why this matters: BatchPreviousOutputsDecorate hands a whole block's
worth of inputs to PreviousOutputsDecorate in a tight loop, then
blocks waiting for results. Without drain mode the batcher waits its
full 10 ms timer before flushing because no further items arrive,
which makes the legacy / quick-validation decorate hot path pay 10 ms
per tx.

Default kept false for behavioural consistency with the existing
drain-mode toggles. Operators who want the speedup flip
utxostore_outpointBatcherDrainMode=true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. The PR introduces an optional drain mode toggle for the outpoint batcher, following the established pattern of the existing four drain-mode settings. The implementation is clean, well-documented, and thoroughly tested.

Key Strengths:

  • Consistent Pattern: Follows existing drain-mode toggles (Get/Spend/Store/Locked) with identical structure and naming conventions
  • Safe Default: Defaults to false to maintain existing behavior and predictable latency on concurrent paths
  • Honest Documentation: The extensive longdesc accurately reflects post-perf(aerospike): fan out BatchPreviousOutputsDecorate per-tx #893 concurrent behavior and explicitly warns about bimodal/heavy-tailed performance on concurrent paths
  • Evidence-Based: Includes comprehensive benchmark (BenchmarkBatchPreviousOutputsDecorateDrainMode) that measured the actual behavior across transaction counts
  • Test Coverage: Happy-path smoke test (TestAerospikeOutpointBatcherDrainMode) verifies the toggle wiring for both single-tx and concurrent paths using require assertions per repo convention
  • Clear Use Case: Identifies the specific win scenario (cmd/rewindblockchain single-producer sequential caller) vs concurrent node paths where it should stay off

Previous Review Items (Resolved):

All inline comments from Copilot and the author have been addressed and resolved in commits e2757b8 and e86917b.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-865 (e5f32b9)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.604µ 1.582µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.16n 71.59n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.19n 71.29n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.23n 71.16n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.03n 32.90n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 56.93n 55.63n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 129.9n 128.3n ~ 0.400
MiningCandidate_Stringify_Short-4 217.8n 220.5n ~ 0.100
MiningCandidate_Stringify_Long-4 1.646µ 1.639µ ~ 0.700
MiningSolution_Stringify-4 857.1n 850.8n ~ 0.200
BlockInfo_MarshalJSON-4 1.726µ 1.727µ ~ 1.000
NewFromBytes-4 130.8n 132.1n ~ 0.100
AddTxBatchColumnar_Validation-4 2.604µ 2.653µ ~ 0.100
OffsetValidationLoop-4 594.8n 595.4n ~ 1.000
Mine_EasyDifficulty-4 60.79µ 61.68µ ~ 0.700
Mine_WithAddress-4 7.075µ 7.000µ ~ 0.200
BlockAssembler_AddTx-4 0.02742n 0.02914n ~ 1.000
AddNode-4 11.23 11.13 ~ 1.000
AddNodeWithMap-4 12.08 12.17 ~ 0.400
DirectSubtreeAdd/4_per_subtree-4 58.05n 56.78n ~ 0.200
DirectSubtreeAdd/64_per_subtree-4 29.13n 29.15n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 28.08n 27.95n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.65n 26.61n ~ 0.400
DirectSubtreeAdd/2048_per_subtree-4 26.27n 26.24n ~ 0.300
SubtreeProcessorAdd/4_per_subtree-4 299.7n 297.3n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 301.4n 289.0n ~ 0.100
SubtreeProcessorAdd/256_per_subtree-4 303.8n 289.0n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 294.9n 279.3n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 299.9n 280.3n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 283.3n 285.3n ~ 0.400
SubtreeProcessorRotate/64_per_subtree-4 285.3n 286.4n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 281.8n 290.5n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 280.5n 295.9n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.26n 55.38n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 36.17n 36.15n ~ 1.000
SubtreeNodeAddOnly/256_per_subtree-4 35.28n 35.38n ~ 0.400
SubtreeNodeAddOnly/1024_per_subtree-4 34.81n 34.50n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 111.7n 111.6n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 359.5n 363.9n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.255µ 1.261µ ~ 0.200
SubtreeCreationOnly/1024_per_subtree-4 3.931µ 3.909µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 7.127µ 7.121µ ~ 1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 288.0n 282.0n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 287.6n 285.9n ~ 0.400
ParallelGetAndSetIfNotExists/1k_nodes-4 2.060m 2.005m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.258m 5.312m ~ 1.000
ParallelGetAndSetIfNotExists/50k_nodes-4 7.201m 7.348m ~ 0.400
ParallelGetAndSetIfNotExists/100k_nodes-4 9.886m 10.242m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.804m 1.832m ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 4.618m 4.541m ~ 0.700
SequentialGetAndSetIfNotExists/50k_nodes-4 14.47m 13.82m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 26.09m 25.36m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.111m 2.048m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.488m 8.417m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.75m 13.62m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.836m 1.800m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 9.305m 8.274m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 47.65m 45.88m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.154µ 4.216µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.886µ 3.786µ ~ 0.100
DiskTxMap_ExistenceOnly-4 338.3n 521.8n ~ 0.400
Queue-4 204.0n 200.5n ~ 0.700
AtomicPointer-4 8.134n 8.131n ~ 0.500
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 830.3µ 809.5µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 734.3µ 719.0µ ~ 0.200
ReorgOptimizations/AllMarkFalse/Old/10K-4 117.8µ 113.4µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 58.59µ 58.15µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 58.88µ 55.18µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.81µ 11.79µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.850µ 4.745µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.629µ 1.640µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.61m 10.68m ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/100K-4 11.44m 10.92m ~ 0.200
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.139m 1.185m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 736.8µ 733.8µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 586.2µ 582.6µ ~ 1.000
ReorgOptimizations/HashSlicePool/New/100K-4 302.0µ 328.4µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 48.31µ 48.63µ ~ 1.000
ReorgOptimizations/NodeFlags/New/100K-4 17.55µ 16.14µ ~ 0.100
TxMapSetIfNotExists-4 52.57n 55.20n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 48.24n 48.29n ~ 1.000
ChannelSendReceive-4 695.6n 686.0n ~ 0.100
CalcBlockWork-4 497.3n 477.2n ~ 0.700
CalculateWork-4 638.6n 735.9n ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.334µ 1.349µ ~ 0.300
BuildBlockLocatorString_Helpers/Size_100-4 12.85µ 13.03µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 135.5µ 128.4µ ~ 0.400
CatchupWithHeaderCache-4 104.3m 104.5m ~ 0.700
_prepareTxsPerLevel-4 423.7m 404.6m ~ 1.000
_prepareTxsPerLevelOrdered-4 5.357m 3.736m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 425.8m 417.4m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 5.322m 4.518m ~ 0.700
SubtreeSizes/10k_tx_4_per_subtree-4 1.248m 1.250m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 301.7µ 298.4µ ~ 0.700
SubtreeSizes/10k_tx_64_per_subtree-4 70.74µ 70.24µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 17.57µ 17.54µ ~ 0.400
SubtreeSizes/10k_tx_512_per_subtree-4 8.761µ 8.634µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.359µ 4.306µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.146µ 2.150µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 69.11µ 68.66µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 17.44µ 17.43µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.361µ 4.285µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 363.9µ 361.7µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 87.17µ 85.82µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.68µ 21.28µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 149.5µ 149.0µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 159.6µ 157.9µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 305.3µ 306.2µ ~ 1.000
SubtreeAllocations/medium_subtrees_exists_check-4 8.806µ 8.764µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 9.247µ 9.312µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 17.53µ 17.27µ ~ 0.200
SubtreeAllocations/large_subtrees_exists_check-4 2.096µ 2.056µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.215µ 2.254µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.325µ 4.338µ ~ 0.700
_BufferPoolAllocation/16KB-4 4.052µ 5.096µ ~ 0.400
_BufferPoolAllocation/32KB-4 11.30µ 10.43µ ~ 0.700
_BufferPoolAllocation/64KB-4 16.65µ 19.86µ ~ 0.100
_BufferPoolAllocation/128KB-4 33.18µ 33.32µ ~ 0.700
_BufferPoolAllocation/512KB-4 139.9µ 139.9µ ~ 1.000
_BufferPoolConcurrent/32KB-4 20.60µ 21.65µ ~ 0.700
_BufferPoolConcurrent/64KB-4 32.16µ 31.33µ ~ 0.100
_BufferPoolConcurrent/512KB-4 172.0µ 165.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 716.0µ 689.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 711.0µ 692.4µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/64KB-4 723.6µ 706.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 706.5µ 698.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 670.5µ 664.2µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 38.39m 38.08m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 38.38m 38.10m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 38.66m 38.16m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 38.54m 38.03m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 38.60m 38.22m ~ 0.100
_PooledVsNonPooled/Pooled-4 832.4n 825.6n ~ 0.100
_PooledVsNonPooled/NonPooled-4 8.455µ 7.688µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 8.791µ 7.935µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 12.86µ 11.56µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 11.78µ 10.83µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 324.9µ 322.3µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 322.3µ 324.4µ ~ 0.100
GetUtxoHashes-4 265.9n 266.0n ~ 1.000
GetUtxoHashes_ManyOutputs-4 43.38µ 47.92µ ~ 0.200
_NewMetaDataFromBytes-4 230.2n 228.2n ~ 0.700
_Bytes-4 406.0n 399.5n ~ 0.100
_MetaBytes-4 141.4n 138.2n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-06-03 09:46 UTC

@freemans13 freemans13 self-assigned this May 14, 2026
@freemans13 freemans13 changed the title feat(utxo/aerospike): optional drain mode for outpoint batcher Let the outpoint batcher flush immediately instead of waiting 10ms (opt-in) May 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in configuration flag to enable “drain mode” for the Aerospike outpoint batcher, allowing previous-output decoration lookups to flush immediately instead of waiting for the batching timer window—aimed at reducing per-tx latency in decorate-heavy paths.

Changes:

  • Introduces utxostore_outpointBatcherDrainMode to UtxoStoreSettings metadata.
  • Wires the new setting into NewSettings config loading (default false).
  • Applies the setting in the Aerospike UTXO store initialization via s.outpointBatcher.SetDrainMode(true).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
stores/utxo/aerospike/aerospike.go Enables drain mode on the outpoint batcher when the new setting is true.
settings/utxostore_settings.go Adds the new UTXO store setting metadata entry and long description.
settings/settings.go Loads the new boolean setting with a default of false.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread settings/utxostore_settings.go Outdated
SpendBatcherDrainMode bool `key:"utxostore_spendBatcherDrainMode" desc:"Enable drain mode for spend batcher" default:"false" category:"UtxoStore" usage:"Drain mode for UTXO spend operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the Spend batcher independently of other batchers.\n\n### When to Enable\nOnly if spend items arrive in large bursts and latency is critical.\n\n### When to Disable (Recommended)\nSpend items trickle in one-at-a-time as individual transactions complete Get+Validate. Drain mode fires single-item batches, causing Aerospike executeSingle fallback (one network round-trip per item instead of batched)."`
StoreBatcherDrainMode bool `key:"utxostore_storeBatcherDrainMode" desc:"Enable drain mode for store/create batcher" default:"false" category:"UtxoStore" usage:"Drain mode for UTXO create operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the Create/Store batcher independently of other batchers.\n\n### When to Enable\nCreate receives items in semi-bursts after Spend batches complete. Drain mode can be beneficial here.\n\n### When to Disable\nTimer-based batching accumulates items for better Aerospike batch efficiency."`
LockedBatcherDrainMode bool `key:"utxostore_lockedBatcherDrainMode" desc:"Enable drain mode for locked batcher" default:"false" category:"UtxoStore" usage:"Drain mode for UTXO lock operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the SetLocked batcher independently of other batchers.\n\n### When to Disable (Recommended)\nSetLocked is the last pipeline stage — items arrive one-at-a-time as individual transactions complete BA.Store. Drain mode fires single-item batches, causing Aerospike executeSingle fallback. This was measured at 21% CPU waste on propagation pods at 200k TPS."`
OutpointBatcherDrainMode bool `key:"utxostore_outpointBatcherDrainMode" desc:"Enable drain mode for outpoint batcher" default:"false" category:"UtxoStore" usage:"Drain mode for previous-outputs decorate operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the outpoint batcher independently of other batchers. The outpoint batcher is used by PreviousOutputsDecorate / BatchPreviousOutputsDecorate to look up parent outputs (amount + locking script) for transaction inputs.\n\n### When to Enable\nThe outpoint batcher receives items in bursts — BatchPreviousOutputsDecorate hands a whole block's worth of inputs to PreviousOutputsDecorate in a tight loop, then blocks waiting for results. Without drain mode the batcher waits OutpointBatcherDurationMillis (default 10ms) for the batch to fill before flushing, even though no more items are coming. Drain mode flushes the burst immediately and removes that per-tx 10ms timer wait.\n\n### When to Disable (Default)\nKept off by default for consistency with the other drain-mode toggles and to avoid behavioural changes on operators who haven't measured. Single-item burst sizes are rare on this batcher (a tx with 1 input still drains its 1 input as a 1-item batch — Aerospike executeSingle fallback applies), so enabling it is mostly safe but worth measuring per-deployment.\n\n### Related Settings\n- **OutpointBatcherSize** (default 100) — size cap; drain mode still respects this.\n- **OutpointBatcherDurationMillis** (default 10ms) — timer; drain mode bypasses it."`

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e2757b8d0. The longdesc no longer describes BatchPreviousOutputsDecorate as a tight serial loop; it now reflects #893's per-tx errgroup fan-out and the resulting size-based batch fill. The "when to enable" guidance is reframed around the remaining single-producer call sites (cmd/rewindblockchain, the two per-tx fallbacks in legacy/netsync/handle_block.go), and "when to disable" calls out the executeSingle fallback risk in the concurrent path explicitly.

Comment on lines 392 to +396
s.lockedBatcher.SetDrainMode(true)
}
if tSettings.UtxoStore.OutpointBatcherDrainMode {
s.outpointBatcher.SetDrainMode(true)
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e2757b8d0. The comment above the drain-mode toggles now covers the outpoint case explicitly and notes the post-#893 nuance: under BatchPreviousOutputsDecorate's concurrent producers the batcher fills by size (drain mode would shrink those wide batches), but the single-producer fallbacks still idle on the 10 ms timer (drain mode helps there).

@freemans13 freemans13 requested review from liam and ordishs May 21, 2026 10:32

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes — recommend closing in favour of #893.

Now that #893 is merged the underlying symptom (low per-tx producer concurrency → batcher idling on the 10ms timer) is fixed by fan-out at the producer side: per-tx PreviousOutputsDecorate calls run via errgroup, the batcher fills by size, and the timer rarely fires.

With concurrent producers, flipping this flag would invert the win — it would force many small BatchOperate round-trips instead of one wide one. The drain-mode pattern is right for batchers with a single producer, not after #893.

Suggest closing this PR.

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, surgical, follows the established per-batcher drain-mode pattern, and gated behind an opt-in default-false toggle so upgrade behaviour is unchanged. Two minor suggestions:

  1. Stale comment in stores/utxo/aerospike/aerospike.go:378-381. The block above the drain-mode toggles enumerates the four existing batchers ("beneficial for Get, Create — harmful for Spend, SetLocked"). With this PR landed, the comment no longer covers the full set. Worth a one-line tweak, e.g. "Outpoint receives input-burst traffic, similar to Get."

  2. Add a smoke test for the new toggle. The fixtures already exist in aerospike_server_test.go:1359 / :1429. A short subtest that constructs the store with OutpointBatcherDrainMode=true and re-runs PreviousOutputsDecorate would lock in that the toggle is wired correctly and that drain-mode doesn't break the happy path. Optional but cheap insurance — none of the four existing drain-mode toggles ship with a dedicated unit test either, so it'd raise the bar slightly for this whole family.

Quick sanity-check question: was the 19× number measured on top of #864's per-tx fanout (already visible at get.go:1191-1206), or on a pre-#864 base? The PR description suggests the former but worth confirming so the composition story is unambiguous.

@freemans13

Copy link
Copy Markdown
Collaborator Author

Thanks @oskarszoon — fair on the substance, but I'd push back on closing.

You're right that after #893 the BatchPreviousOutputsDecorate hot path no longer pays the per-tx 10 ms timer cost; the 10.70 s → 0.57 s number in the description was measured against the pre-#893 sequential producer (the bench harness in the closed #864) and shouldn't be read as a current claim.

But this PR doesn't enable drain mode for anyone. It only adds the toggle, default false, matching the existing per-batcher pattern (getBatcherDrainMode, spendBatcherDrainMode, storeBatcherDrainMode, lockedBatcherDrainMode). None of those four are universal wins either — their own longdesc blocks call out cases where they regress (SetLocked was measured at 21 % CPU waste at 200 k TPS, etc.). They exist as per-deployment tuning levers, not opinions about what's correct.

The outpoint batcher was the only one in that group without a corresponding toggle. The original motivation (sequential producer) is gone, but the symmetry argument still holds: an operator who wants to A/B drain-mode behaviour across the five batchers can now do that uniformly. And there are remaining single-producer call sites where the batcher still idles 10 ms — cmd/rewindblockchain and the two per-tx fallbacks in legacy/netsync/handle_block.go (lines 1362 and 1647). Not hot paths, but not non-existent either.

Default-off toggle that just fills out the per-batcher matrix — I'd rather land it than leave the next person to wonder why every batcher has the knob except this one.

Updates from @ordishs and Copilot reviews:

- Extend the comment above the drain-mode toggles in aerospike.go to
  cover the new outpoint case and the post-bsv-blockchain#893 nuance.
- Rewrite the OutpointBatcherDrainMode longdesc so the
  'when to enable / when to disable' guidance reflects the
  post-bsv-blockchain#893 fan-out path rather than the pre-bsv-blockchain#893 sequential producer.
- Add TestAerospikeOutpointBatcherDrainMode happy-path smoke test
  asserting the toggle stays wired (per-tx + concurrent decorate).
@freemans13

Copy link
Copy Markdown
Collaborator Author

Thanks @ordishs — all three points addressed in e2757b8d0:

  1. Stale comment (stores/utxo/aerospike/aerospike.go:378-381) — extended to mention the outpoint batcher and the post-perf(aerospike): fan out BatchPreviousOutputsDecorate per-tx #893 nuance (concurrent producers fill by size; single-producer fallbacks still idle on the timer).

  2. Smoke test — added TestAerospikeOutpointBatcherDrainMode in aerospike_server_test.go next to TestAerospikeWithBatchSize. Builds a store with OutpointBatcherDrainMode=true and exercises both the per-tx PreviousOutputsDecorate and the concurrent BatchPreviousOutputsDecorate paths. The toggle has no exported getter so the test is "decorate still works with the flag on" rather than "the flag is set" — its job is to catch the wiring being deleted or misnamed in aerospike.go.

  3. The 19× number — measured against the pre-Look up parent transactions in parallel instead of one at a time #864 / pre-perf(aerospike): fan out BatchPreviousOutputsDecorate per-tx #893 base (single sequential producer, ~5 ms timer per tx). After @oskarszoon's perf(aerospike): fan out BatchPreviousOutputsDecorate per-tx #893 the same workload (BatchPreviousOutputsDecorate) is already fast without this toggle, and flipping drain mode on in that path now hurts rather than helps (wide BatchOperate → many narrow ones). I've rewritten the OutpointBatcherDrainMode longdesc to reflect that — "when to enable" now points at the remaining single-producer call sites (cmd/rewindblockchain and the two per-tx fallbacks in legacy/netsync/handle_block.go), and "when to disable" explicitly names perf(aerospike): fan out BatchPreviousOutputsDecorate per-tx #893 and the fan-out path.

…fan-out; align docs

Add BenchmarkBatchPreviousOutputsDecorateDrainMode comparing the outpoint
batcher's drain mode on vs off across the concurrent BatchPreviousOutputsDecorate
path (bsv-blockchain#893 errgroup fan-out), using distinct parents per input so batch width
maps to real BatchOperate record counts.

Findings (count=10 + benchstat, distinct-parent inputs): drain mode is
neutral-to-faster in the mean at every tx count (-8% at 2856 tx to -94% at 16
tx), but bimodal/heavy-tailed at mid tx counts (~64-256). So the concurrent
node path keeps drain mode default off (predictable latency), and the clean win
is the single-producer, separate-process cmd/rewindblockchain caller.

Update the aerospike.go comment and the setting longdesc to match the measured
behaviour and to note the batcher is store-wide (the toggle cannot be applied
to one caller without affecting the other in the same process).
@freemans13

Copy link
Copy Markdown
Collaborator Author

@oskarszoon — took your "this inverts #893's win" seriously and benchmarked it rather than argue it. Added BenchmarkBatchPreviousOutputsDecorateDrainMode (in the latest push) — it runs the concurrent BatchPreviousOutputsDecorate fan-out with drain on vs off, using a distinct parent tx per input so batch width maps to real BatchOperate record counts (otherwise sendOutpointBatch's uniqueTxHashes dedup hides the effect). count=10 + benchstat, warmup per tier.

First, a correction to my own earlier framing: drain mode doesn't dispatch one outpoint per Put. go-batcher/v2 greedily drains everything currently queued (up to the size cap) and fires without the timer wait — so under your fan-out there genuinely is a pile to drain.

Median ns/op, drain=false → drain=true:

txs drain=false drain=true Δ mean drain=true stability
16 6.57 ms 0.40 ms −94% stable
64 7.40 ms 1.18 ms −84% bimodal (0.9→52 ms)
256 8.87 ms 4.97 ms −44% bimodal (~2.6 / ~7 ms)
1024 15.0 ms 8.80 ms −41% tight ±3%
2856 23.2 ms 21.4 ms −8% ±19%

So the mean doesn't regress on the concurrent path — it's neutral-to-faster everywhere (the drain=false low-tier cost is literally the 10 ms timer idling). But your instinct wasn't wrong: drain mode goes bimodal/heavy-tailed at mid tx counts (~64–256), occasionally 10–50× its own median, where drain=false is rock-stable. A node's concurrent decorate path wants predictable latency, so that instability is a good reason to keep it default-off — which I've now made the documented rationale, dropping the weak "legacy per-tx fallback" enable-case (it shares the process with the hot path anyway).

Caveat I can't measure locally: this is loopback Aerospike, so it under-weights the per-round-trip network RTT that's central to your "many small round-trips" point — on a real cluster the high tiers could move toward you.

Net: the only use I'm claiming is the single-producer, separate-process caller cmd/rewindblockchain, where it's the −94% timer-elimination win with no concurrent fan-out to fragment. Given it's default-off and scoped that way, are you OK to unblock — or do you still prefer we close it and leave rewindblockchain on the timer?

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed with a fresh benchmark run and a look at the actual go-batcher drain loop. Withdrawing my earlier close recommendation — it was based on a wrong model of drain mode. The drain loop greedily drains the channel up to the size cap before firing (go-batcher batcher.go:464-480), so under #893's concurrent fan-out the batches stay wide rather than collapsing to one-item round-trips. The bench bears this out (drain faster at the 64/256/2856 tx tiers), and cmd/rewindblockchain is a genuine serial single-producer caller where each PreviousOutputsDecorate otherwise eats the full 10ms timer — a real win no existing knob covers. The wiring is correct, default-off, race-free, and can't drop/dup/misorder lookups.

One required fix before merge:

  • The "neutral-to-faster in the mean at every tx count" claim (aerospike.go comment + longdesc) isn't supported by the benchmark. On the concurrent path drain=false is rock-stable but drain=true is bimodal with an unpredictable mean — at txs=16 and 64, two of three runs come in ~5× slower; txs=2856 had a 118ms spike in one run. The "bimodal/heavy-tailed" wording you already have is right; the "neutral-to-faster in the mean" next to it will lead operators to flip this on a node/concurrent path expecting a free win. Drop that phrase and keep the honest framing: bimodal/unpredictable on the concurrent path, clean win only for the single-producer (rewindblockchain) caller.

Minor: the longdesc warns about the legacy per-tx fallback but not the Validator's ValidateTransactionBatch errgroup path (also concurrent); and aerospike_server_test.go:1812+ uses assert where the repo wants require.

Fix the doc claim and I'll approve.

Address review feedback on PR bsv-blockchain#865:
- Drop the unsupported "neutral-to-faster in the mean" claim from the
  aerospike.go comment and the outpointBatcherDrainMode longdesc. The
  concurrent BatchPreviousOutputsDecorate path is bimodal/heavy-tailed
  with drain on (drain=false rock-stable, drain=true unpredictable mean);
  the clean win is the single-producer cmd/rewindblockchain caller.
- Name the Validator's ValidateTransactionBatch errgroup path alongside
  the BatchPreviousOutputsDecorate hot path as a concurrent caller that
  should keep drain off, not just the legacy per-tx fallback.
- Use require (not assert) in TestAerospikeOutpointBatcherDrainMode per
  repo test convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@freemans13

Copy link
Copy Markdown
Collaborator Author

Thanks @oskarszoon — and thanks for re-reviewing and withdrawing the close recommendation. Fixed in e86917b:

  • Dropped the "neutral-to-faster in the mean" claim from both the aerospike.go comment and the outpointBatcherDrainMode longdesc. Both now state the honest framing: on the concurrent BatchPreviousOutputsDecorate path drain mode is bimodal/heavy-tailed at mid tx counts (~64-256) — drain=false rock-stable, drain=true unpredictable mean (some runs several× slower) — so a node keeps it off; the clean win is the single-producer cmd/rewindblockchain caller.
  • Named the Validator's ValidateTransactionBatch errgroup path in the longdesc's "When to Disable" section alongside the BatchPreviousOutputsDecorate hot path, so the concurrent-caller warning isn't limited to the legacy per-tx fallback.
  • Switched assertrequire in TestAerospikeOutpointBatcherDrainMode per the repo convention.

Ready for another look when you have a moment.

@freemans13 freemans13 requested a review from oskarszoon June 3, 2026 09:32
@sonarqubecloud

sonarqubecloud Bot commented Jun 3, 2026

Copy link
Copy Markdown

@freemans13 freemans13 changed the title Let the outpoint batcher flush immediately instead of waiting 10ms (opt-in) Add opt-in drain mode for the outpoint batcher (off by default) Jun 4, 2026

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving on e86917b. The "neutral-to-faster in the mean" overclaim is gone; the aerospike.go comment + the longdesc now describe drain mode on the concurrent path honestly — bimodal/heavy-tailed at mid tx counts, drain=true with an unpredictable mean (some runs several× slower), default-off for node decorate paths, and the clean win scoped to the single-producer cmd/rewindblockchain caller. You also folded in the Validator ValidateTransactionBatch errgroup-path caveat and the store-wide-toggle note, and fixed the smoke-test require. Doc + test only; the drain wiring is unchanged.

Approving.

@freemans13 freemans13 merged commit 5c950aa into bsv-blockchain:main Jun 5, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants