perf(legacy): drop cumulative-stats replay; right-size getdata InvList#876
Conversation
The Stats() goroutine spawned by initStats was treating Aerospike's cumulative bucket counts and counters as if they were per-refresh samples. For each refresh (default 5s) it iterated every histogram bucket and called histogram.Observe(value) int(count) times, where count is monotonically growing since process start. On the legacy service after 28h uptime this dominated CPU. CPU profile from a probe captured mid-fat-block on testnet attributed ~42% of legacy CPU to prometheus.histogram.observe and friends along this path. A side-by-side benchmark (10M-warmup cumulative, 50k per-refresh delta, 11 histograms x 24 buckets) measures: BenchmarkProcessAerospikeStatsSteadyState 127 ms/op (delta) BenchmarkProcessAerospikeStatsLegacyReplay 25,612 ms/op (replay) i.e. ~200x faster per refresh; the previous path took longer than the configured refresh interval so the goroutine never caught up. Fix: - Track the previously reported cumulative value per counter and per histogram bucket. Emit only (current - previous) to Prometheus. - Tolerate cumulative drops by treating the new value as the delta (covers client restart). - Extract the stats-reflection body into a unit-testable processAerospikeStats helper. - Add a missing time.Sleep on the Stats() error path so transient errors do not produce a tight retry loop. Tests: three correctness tests (counter delta, histogram delta, reset handling) plus benchmarks demonstrating the saving.
fetchHeaderBlocks allocated the MsgGetData InvList with a size hint of sm.headerList.Len() (often 2000) but the loop below breaks at maxBlocks (1-20, scaled by dynamic in-flight tracker - down to 1 for blocks >2 GB). The over-sized backing slice was ~16 KB of pointer storage allocated repeatedly even when only one slot was ever used. Use maxBlocks as the size hint, which matches the loop's actual upper bound and shrinks the per-call allocation from ~16 KB to ~8-160 B.
|
🤖 Claude Code Review Status: Complete SummaryThis PR delivers two focused performance optimizations backed by profiling data and comprehensive testing. The Aerospike stats fix addresses a critical CPU bottleneck (200× improvement), and the InvList sizing change eliminates unnecessary allocations during legacy sync. No blocking issues found. The implementation is correct, well-tested, and follows the repository's verification standards (race detection, comprehensive unit tests, benchmarks preserving legacy behavior for regression checks). Code Quality✅ Correctness: Delta tracking correctly handles cumulative counters, including edge case of client restart/reset The tests align with AGENTS.md test-first approach (behavioral tests written before refactoring), and the benchmark preserves the legacy code path for ongoing regression verification. Notes
|
There was a problem hiding this comment.
Pull request overview
This PR addresses two performance issues found in legacy-sync profiling: (1) Aerospike stats reflection was re-observing cumulative bucket counts each 5s refresh (accounting for ~40% of legacy CPU), and (2) fetchHeaderBlocks was over-allocating the getdata InvList slice. The first fix tracks last-reported values per Prometheus key and emits deltas; the second right-sizes the InvList allocation to maxBlocks. The stats-processing loop is also extracted into a unit-testable helper, and a missing time.Sleep is added on the Stats() error path.
Changes:
- Track per-key/per-bucket last cumulative values in
util/aerospike.goand emit only deltas to Prometheus counters/histograms; extractprocessAerospikeStats; sleep on error. - Size
MsgGetData.InvListtomaxBlocksinstead ofsm.headerList.Len()infetchHeaderBlocks. - Add unit tests and benchmarks (delta vs. legacy replay) for
processAerospikeStats.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| util/aerospike.go | Refactors stats reflection into processAerospikeStats with per-key delta tracking via two new safemaps; adds error-path sleep. |
| util/aerospike_stats_test.go | New tests covering counter/histogram delta and reset handling, plus side-by-side benchmarks. |
| services/legacy/netsync/manager.go | Sizes getdata InvList to maxBlocks to avoid oversized allocations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-15 18:10 UTC |



Summary
Two perf fixes targeting legacy-sync CPU/alloc cost, found by analysing a probe captured on
bsva-ovh-teranode-ttn-eu-4mid-fat-block (height 1,276,417, 181,416 txs, v0.15.0-beta-13, 28h uptime). Captured profiles attributed ~42% of legacy CPU to a Prometheus histogram path that was replaying cumulative Aerospike bucket counts every 5s.1.
perf(aerospike): delta-record cumulative statsutil/aerospike.go:initStatspolledclient.Stats()every 5s and for each histogram bucket calledhistogram.Observe(value)int(count)times.countis Aerospike's cumulative-since-process-start bucket counter (seeaerospike-client-go/v8/cluster.go:aggregateNodeStats— per-nodegetAndResetresets node-local but accumulates into a persistentclstr.stats[h]), so every refresh re-observed the entire history. Same bug for scalar counters (counter.Add(cumulative)instead of delta).Fix tracks last reported value per Prometheus key (counter and per histogram bucket) and emits only the delta. Resets are tolerated (treat current as delta). The stats-reflection loop body was extracted into a unit-testable
processAerospikeStatshelper. Added a missingtime.Sleepon theStats()error path.Side-by-side benchmark in
util/aerospike_stats_test.go(10M-warmup cumulative, 50k per-refresh delta, 11 histograms × 24 buckets):BenchmarkProcessAerospikeStatsSteadyState(delta)BenchmarkProcessAerospikeStatsLegacyReplay(replay)~200× per refresh. With a 5s configured interval the replay path took longer than the interval itself, so the goroutine never caught up.
Note: the underlying
gocore.Stat.NewStatallocates&Stat{}beforesync.Map.LoadOrStorelookup even when the stat already exists. This contributes ~18 GB cumulative alloc totracing.UTracer.Startover a 28h run but is out of scope for this PR — would need an upstreamordishs/gocorepatch or a local per-spanName cache inutil/tracing.2.
perf(legacy): size getdata InvList tomaxBlocksfetchHeaderBlocksallocatedMsgGetData.InvListwithsm.headerList.Len()as the size hint (often 2000) but the loop already breaks atmaxBlocks(1–20 viablockSizeTracker.calculateMaxInFlightBlocks— drops to 1 for blocks >2 GB). Over-sized cap meant ~16 KB pointer storage allocated when only one slot would ever be used. Switched the size hint tomaxBlocks.Test plan
go test -race ./util/— 524 passgo test -race ./services/legacy/netsync/— 52 passgo vet ./util/ ./services/legacy/netsync/go build ./...TestProcessAerospikeStatsCumulativeCounterDelta,…HistogramBucketDelta,…HistogramResetHandling) verify counter and histogram deltas + reset toleranceBenchmarkProcessAerospikeStatsLegacyReplaypreserves a copy of the previous loop body for ongoing regression checks