Add opt-in drain mode for the outpoint batcher (off by default)#865
Conversation
The outpoint batcher (used by PreviousOutputsDecorate / BatchPreviousOutputsDecorate to look up parent outputs for tx inputs) was the only UtxoStore batcher missing a drain-mode toggle. The four existing toggles (Get / Spend / Store / Locked) all default to false and let operators opt in per batcher. Add OutpointBatcherDrainMode following the same pattern: - new bool field on UtxoStore settings with key/desc/default/longdesc - default false in settings.go - conditional s.outpointBatcher.SetDrainMode(true) alongside the others Why this matters: BatchPreviousOutputsDecorate hands a whole block's worth of inputs to PreviousOutputsDecorate in a tight loop, then blocks waiting for results. Without drain mode the batcher waits its full 10 ms timer before flushing because no further items arrive, which makes the legacy / quick-validation decorate hot path pay 10 ms per tx. Default kept false for behavioural consistency with the existing drain-mode toggles. Operators who want the speedup flip utxostore_outpointBatcherDrainMode=true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🤖 Claude Code Review Status: Complete Current Review: No issues found. The PR introduces an optional drain mode toggle for the outpoint batcher, following the established pattern of the existing four drain-mode settings. The implementation is clean, well-documented, and thoroughly tested. Key Strengths:
Previous Review Items (Resolved): All inline comments from Copilot and the author have been addressed and resolved in commits e2757b8 and e86917b. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-03 09:46 UTC |
There was a problem hiding this comment.
Pull request overview
Adds an opt-in configuration flag to enable “drain mode” for the Aerospike outpoint batcher, allowing previous-output decoration lookups to flush immediately instead of waiting for the batching timer window—aimed at reducing per-tx latency in decorate-heavy paths.
Changes:
- Introduces
utxostore_outpointBatcherDrainModetoUtxoStoreSettingsmetadata. - Wires the new setting into
NewSettingsconfig loading (defaultfalse). - Applies the setting in the Aerospike UTXO store initialization via
s.outpointBatcher.SetDrainMode(true).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
stores/utxo/aerospike/aerospike.go |
Enables drain mode on the outpoint batcher when the new setting is true. |
settings/utxostore_settings.go |
Adds the new UTXO store setting metadata entry and long description. |
settings/settings.go |
Loads the new boolean setting with a default of false. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SpendBatcherDrainMode bool `key:"utxostore_spendBatcherDrainMode" desc:"Enable drain mode for spend batcher" default:"false" category:"UtxoStore" usage:"Drain mode for UTXO spend operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the Spend batcher independently of other batchers.\n\n### When to Enable\nOnly if spend items arrive in large bursts and latency is critical.\n\n### When to Disable (Recommended)\nSpend items trickle in one-at-a-time as individual transactions complete Get+Validate. Drain mode fires single-item batches, causing Aerospike executeSingle fallback (one network round-trip per item instead of batched)."` | ||
| StoreBatcherDrainMode bool `key:"utxostore_storeBatcherDrainMode" desc:"Enable drain mode for store/create batcher" default:"false" category:"UtxoStore" usage:"Drain mode for UTXO create operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the Create/Store batcher independently of other batchers.\n\n### When to Enable\nCreate receives items in semi-bursts after Spend batches complete. Drain mode can be beneficial here.\n\n### When to Disable\nTimer-based batching accumulates items for better Aerospike batch efficiency."` | ||
| LockedBatcherDrainMode bool `key:"utxostore_lockedBatcherDrainMode" desc:"Enable drain mode for locked batcher" default:"false" category:"UtxoStore" usage:"Drain mode for UTXO lock operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the SetLocked batcher independently of other batchers.\n\n### When to Disable (Recommended)\nSetLocked is the last pipeline stage — items arrive one-at-a-time as individual transactions complete BA.Store. Drain mode fires single-item batches, causing Aerospike executeSingle fallback. This was measured at 21% CPU waste on propagation pods at 200k TPS."` | ||
| OutpointBatcherDrainMode bool `key:"utxostore_outpointBatcherDrainMode" desc:"Enable drain mode for outpoint batcher" default:"false" category:"UtxoStore" usage:"Drain mode for previous-outputs decorate operations" type:"bool" longdesc:"### Purpose\nEnables drain mode for the outpoint batcher independently of other batchers. The outpoint batcher is used by PreviousOutputsDecorate / BatchPreviousOutputsDecorate to look up parent outputs (amount + locking script) for transaction inputs.\n\n### When to Enable\nThe outpoint batcher receives items in bursts — BatchPreviousOutputsDecorate hands a whole block's worth of inputs to PreviousOutputsDecorate in a tight loop, then blocks waiting for results. Without drain mode the batcher waits OutpointBatcherDurationMillis (default 10ms) for the batch to fill before flushing, even though no more items are coming. Drain mode flushes the burst immediately and removes that per-tx 10ms timer wait.\n\n### When to Disable (Default)\nKept off by default for consistency with the other drain-mode toggles and to avoid behavioural changes on operators who haven't measured. Single-item burst sizes are rare on this batcher (a tx with 1 input still drains its 1 input as a 1-item batch — Aerospike executeSingle fallback applies), so enabling it is mostly safe but worth measuring per-deployment.\n\n### Related Settings\n- **OutpointBatcherSize** (default 100) — size cap; drain mode still respects this.\n- **OutpointBatcherDurationMillis** (default 10ms) — timer; drain mode bypasses it."` |
There was a problem hiding this comment.
Addressed in e2757b8d0. The longdesc no longer describes BatchPreviousOutputsDecorate as a tight serial loop; it now reflects #893's per-tx errgroup fan-out and the resulting size-based batch fill. The "when to enable" guidance is reframed around the remaining single-producer call sites (cmd/rewindblockchain, the two per-tx fallbacks in legacy/netsync/handle_block.go), and "when to disable" calls out the executeSingle fallback risk in the concurrent path explicitly.
| s.lockedBatcher.SetDrainMode(true) | ||
| } | ||
| if tSettings.UtxoStore.OutpointBatcherDrainMode { | ||
| s.outpointBatcher.SetDrainMode(true) | ||
| } |
There was a problem hiding this comment.
Addressed in e2757b8d0. The comment above the drain-mode toggles now covers the outpoint case explicitly and notes the post-#893 nuance: under BatchPreviousOutputsDecorate's concurrent producers the batcher fills by size (drain mode would shrink those wide batches), but the single-producer fallbacks still idle on the 10 ms timer (drain mode helps there).
oskarszoon
left a comment
There was a problem hiding this comment.
Request changes — recommend closing in favour of #893.
Now that #893 is merged the underlying symptom (low per-tx producer concurrency → batcher idling on the 10ms timer) is fixed by fan-out at the producer side: per-tx PreviousOutputsDecorate calls run via errgroup, the batcher fills by size, and the timer rarely fires.
With concurrent producers, flipping this flag would invert the win — it would force many small BatchOperate round-trips instead of one wide one. The drain-mode pattern is right for batchers with a single producer, not after #893.
Suggest closing this PR.
ordishs
left a comment
There was a problem hiding this comment.
LGTM — small, surgical, follows the established per-batcher drain-mode pattern, and gated behind an opt-in default-false toggle so upgrade behaviour is unchanged. Two minor suggestions:
-
Stale comment in
stores/utxo/aerospike/aerospike.go:378-381. The block above the drain-mode toggles enumerates the four existing batchers ("beneficial for Get, Create — harmful for Spend, SetLocked"). With this PR landed, the comment no longer covers the full set. Worth a one-line tweak, e.g. "Outpoint receives input-burst traffic, similar to Get." -
Add a smoke test for the new toggle. The fixtures already exist in
aerospike_server_test.go:1359/:1429. A short subtest that constructs the store withOutpointBatcherDrainMode=trueand re-runsPreviousOutputsDecoratewould lock in that the toggle is wired correctly and that drain-mode doesn't break the happy path. Optional but cheap insurance — none of the four existing drain-mode toggles ship with a dedicated unit test either, so it'd raise the bar slightly for this whole family.
Quick sanity-check question: was the 19× number measured on top of #864's per-tx fanout (already visible at get.go:1191-1206), or on a pre-#864 base? The PR description suggests the former but worth confirming so the composition story is unambiguous.
|
Thanks @oskarszoon — fair on the substance, but I'd push back on closing. You're right that after #893 the But this PR doesn't enable drain mode for anyone. It only adds the toggle, default The outpoint batcher was the only one in that group without a corresponding toggle. The original motivation (sequential producer) is gone, but the symmetry argument still holds: an operator who wants to A/B drain-mode behaviour across the five batchers can now do that uniformly. And there are remaining single-producer call sites where the batcher still idles 10 ms — Default-off toggle that just fills out the per-batcher matrix — I'd rather land it than leave the next person to wonder why every batcher has the knob except this one. |
Updates from @ordishs and Copilot reviews: - Extend the comment above the drain-mode toggles in aerospike.go to cover the new outpoint case and the post-bsv-blockchain#893 nuance. - Rewrite the OutpointBatcherDrainMode longdesc so the 'when to enable / when to disable' guidance reflects the post-bsv-blockchain#893 fan-out path rather than the pre-bsv-blockchain#893 sequential producer. - Add TestAerospikeOutpointBatcherDrainMode happy-path smoke test asserting the toggle stays wired (per-tx + concurrent decorate).
|
Thanks @ordishs — all three points addressed in
|
…fan-out; align docs Add BenchmarkBatchPreviousOutputsDecorateDrainMode comparing the outpoint batcher's drain mode on vs off across the concurrent BatchPreviousOutputsDecorate path (bsv-blockchain#893 errgroup fan-out), using distinct parents per input so batch width maps to real BatchOperate record counts. Findings (count=10 + benchstat, distinct-parent inputs): drain mode is neutral-to-faster in the mean at every tx count (-8% at 2856 tx to -94% at 16 tx), but bimodal/heavy-tailed at mid tx counts (~64-256). So the concurrent node path keeps drain mode default off (predictable latency), and the clean win is the single-producer, separate-process cmd/rewindblockchain caller. Update the aerospike.go comment and the setting longdesc to match the measured behaviour and to note the batcher is store-wide (the toggle cannot be applied to one caller without affecting the other in the same process).
…e' into stu/aerospike-outpoint-drain-mode
|
@oskarszoon — took your "this inverts #893's win" seriously and benchmarked it rather than argue it. Added First, a correction to my own earlier framing: drain mode doesn't dispatch one outpoint per Median ns/op, drain=false → drain=true:
So the mean doesn't regress on the concurrent path — it's neutral-to-faster everywhere (the drain=false low-tier cost is literally the 10 ms timer idling). But your instinct wasn't wrong: drain mode goes bimodal/heavy-tailed at mid tx counts (~64–256), occasionally 10–50× its own median, where drain=false is rock-stable. A node's concurrent decorate path wants predictable latency, so that instability is a good reason to keep it default-off — which I've now made the documented rationale, dropping the weak "legacy per-tx fallback" enable-case (it shares the process with the hot path anyway). Caveat I can't measure locally: this is loopback Aerospike, so it under-weights the per-round-trip network RTT that's central to your "many small round-trips" point — on a real cluster the high tiers could move toward you. Net: the only use I'm claiming is the single-producer, separate-process caller |
oskarszoon
left a comment
There was a problem hiding this comment.
Re-reviewed with a fresh benchmark run and a look at the actual go-batcher drain loop. Withdrawing my earlier close recommendation — it was based on a wrong model of drain mode. The drain loop greedily drains the channel up to the size cap before firing (go-batcher batcher.go:464-480), so under #893's concurrent fan-out the batches stay wide rather than collapsing to one-item round-trips. The bench bears this out (drain faster at the 64/256/2856 tx tiers), and cmd/rewindblockchain is a genuine serial single-producer caller where each PreviousOutputsDecorate otherwise eats the full 10ms timer — a real win no existing knob covers. The wiring is correct, default-off, race-free, and can't drop/dup/misorder lookups.
One required fix before merge:
- The "neutral-to-faster in the mean at every tx count" claim (aerospike.go comment + longdesc) isn't supported by the benchmark. On the concurrent path
drain=falseis rock-stable butdrain=trueis bimodal with an unpredictable mean — at txs=16 and 64, two of three runs come in ~5× slower; txs=2856 had a 118ms spike in one run. The "bimodal/heavy-tailed" wording you already have is right; the "neutral-to-faster in the mean" next to it will lead operators to flip this on a node/concurrent path expecting a free win. Drop that phrase and keep the honest framing: bimodal/unpredictable on the concurrent path, clean win only for the single-producer (rewindblockchain) caller.
Minor: the longdesc warns about the legacy per-tx fallback but not the Validator's ValidateTransactionBatch errgroup path (also concurrent); and aerospike_server_test.go:1812+ uses assert where the repo wants require.
Fix the doc claim and I'll approve.
Address review feedback on PR bsv-blockchain#865: - Drop the unsupported "neutral-to-faster in the mean" claim from the aerospike.go comment and the outpointBatcherDrainMode longdesc. The concurrent BatchPreviousOutputsDecorate path is bimodal/heavy-tailed with drain on (drain=false rock-stable, drain=true unpredictable mean); the clean win is the single-producer cmd/rewindblockchain caller. - Name the Validator's ValidateTransactionBatch errgroup path alongside the BatchPreviousOutputsDecorate hot path as a concurrent caller that should keep drain off, not just the legacy per-tx fallback. - Use require (not assert) in TestAerospikeOutpointBatcherDrainMode per repo test convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @oskarszoon — and thanks for re-reviewing and withdrawing the close recommendation. Fixed in e86917b:
Ready for another look when you have a moment. |
|
oskarszoon
left a comment
There was a problem hiding this comment.
Re-approving on e86917b. The "neutral-to-faster in the mean" overclaim is gone; the aerospike.go comment + the longdesc now describe drain mode on the concurrent path honestly — bimodal/heavy-tailed at mid tx counts, drain=true with an unpredictable mean (some runs several× slower), default-off for node decorate paths, and the clean win scoped to the single-producer cmd/rewindblockchain caller. You also folded in the Validator ValidateTransactionBatch errgroup-path caveat and the store-wide-toggle note, and fixed the smoke-test require. Doc + test only; the drain wiring is unchanged.
Approving.



What this changes
Adds a new opt-in setting,
utxostore_outpointBatcherDrainMode(defaultfalse).The outpoint batcher looks up parent transaction data (amount + locking script) for the inputs on every tx. It's used by
PreviousOutputsDecorateandBatchPreviousOutputsDecorate. Normally, when it has a small batch of inputs it waits up to its full flush timer (default 10ms) before sending the lookup to Aerospike, in case more inputs are coming.When this setting is on, the batcher drains whatever is already queued (up to the size cap) and fires immediately, instead of waiting on the timer.
It's the same drain-mode mechanism the four existing batchers already expose (Get / Spend / Store / Locked), now wired up for the outpoint batcher too.
When to turn it on
The clean win is
cmd/rewindblockchain. That's a single-producer, separate process that callsPreviousOutputsDecorateone tx at a time. Each call offers ~2 inputs — far below the batcher's size cap — so without drain mode every call sits idle for the full 10ms timer before its lookup goes out. Drain mode removes that wait. Benchmarked ~90%+ faster at these low, sequential tx counts.Turn it on with
utxostore_outpointBatcherDrainMode=true.When to leave it off (the default)
Do not enable this inside a running node to try to speed up the per-tx decorate path. That batcher is shared store-wide, so the toggle hits the concurrent hot path too:
BatchPreviousOutputsDecoratefan-out (since perf(aerospike): fan out BatchPreviousOutputsDecorate per-tx #893, per-tx calls run via errgroup), andValidateTransactionBatcherrgroup path.Benchmarking drain on vs off on that concurrent path (
BenchmarkBatchPreviousOutputsDecorateDrainMode, added here) shows drain mode is bimodal / heavy-tailed at mid tx counts (~64–256): with drain off latency is rock-stable, with drain on the mean is unpredictable and some runs are several times slower. A node wants predictable decorate latency, so it stays off by default.The batcher is store-wide — this toggle can't be applied to one caller without affecting the others in the same process. Measure per deployment before flipping it on.
Why off by default
Files touched
settings/utxostore_settings.go— new setting, following the existing patternsettings/settings.go— default value (false)stores/utxo/aerospike/aerospike.go— conditionals.outpointBatcher.SetDrainMode(true)next to the other four, with a comment explaining the on/off tradeoffstores/utxo/aerospike/aerospike_server_test.go— wiring smoke test (drain mode on, both decorate paths still decorate every input correctly)stores/utxo/aerospike/outpoint_drain_bench_test.go— drain on-vs-off benchmark across tx counts, the evidence behind the "leave it off in a node" guidanceTesting
go build ./...cleango vet ./stores/utxo/aerospike/... ./settings/...cleango test -race -short ./stores/utxo/aerospike/ ./settings/— passing (one pre-existing env-related failure inTestGetUTXOStoreURLthat also fails onmain)