Skip to content

fix(aerospike): stop enabling client metrics (#1001)#1003

Merged
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/aerospike-stats
Jun 1, 2026
Merged

fix(aerospike): stop enabling client metrics (#1001)#1003
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/aerospike-stats

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Stop calling client.EnableMetrics(nil) in initStats. The fork's per-record stats path (nodeStats.updateOrInsert) is quadratic per batch — batch commands have getNamespace()==nil, so updateOrInsert iterates every record's namespace via nsIter, and it is already called once per record. An N-record batch does ~N² increments against a sync.RWMutex map. This is a general cost on every batch command; mainnet IBD just exposes it most (44–60% of legacy CPU) because batches are largest, concurrency highest, and aerospike is local so there is no network latency to mask it.

The hot path is gated behind cluster.metricsEnabled, which only teranode flips on. Removing the call leaves it false and the path is never entered. Connection-pool and node-count metrics are recorded unconditionally, so the poller still exports them — only the per-command latency histograms and result-code counts go dark.

Stop-gap. Real fix is in the aerospike-client-go fork, tracked in #1001.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. The change is correct and well-documented.

Summary:
This PR comments out client.EnableMetrics(nil) to disable the aerospike client's per-record metrics tracking, which was consuming 44-60% of legacy CPU during mainnet IBD due to quadratic batch behavior. The change includes:

  1. Nil-check guard (if client == nil { return }) — prevents panic in test code that passes nil
  2. Detailed explanatory comment — documents the performance issue, why metrics are disabled, and references issue aerospike-client-go/v8 nodeStats.updateOrInsert consumes 44-60% of legacy CPU during mainnet IBD #1001
  3. Low-risk change — connection-pool and node-count metrics remain available; only per-command latency histograms go dark

The implementation aligns with AGENTS.md guidelines: minimal change, clear rationale, correctness over speed. The existing test at aerospike_test.go:876 confirms graceful nil-client handling.

@oskarszoon oskarszoon requested review from freemans13 and ordishs June 1, 2026 14:31

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Verified against the aerospike-client-go fork (v8.7.1-bsv3): the per-command metrics path is fully gated behind cluster.metricsEnabled, which only flips true via EnableMetrics — removing the call leaves it false so the quadratic updateOrInsert path is never entered. client.Stats() reads connection/node counts directly and is independent of EnableMetrics, so the poller still exports connection-pool and node-count series; only per-command latency histograms and result-code counters go dark, as described. Sound, low-risk stop-gap. Follow-up: ensure the re-enable is tracked against #1001 and give dashboard owners a heads-up about the zeroed series.

Stop calling client.EnableMetrics(nil) in initStats. The fork's per-record
stats path (nodeStats.updateOrInsert) is quadratic per batch: batch commands
have getNamespace()==nil, so updateOrInsert iterates every record's namespace
via nsIter, and it is already called once per record. An N-record batch does
~N^2 increments against a sync.RWMutex map. This is a general cost on every
batch command; mainnet IBD just exposes it most (44-60% of legacy CPU) because
batches are largest, concurrency highest, and aerospike is local so there is no
network latency to mask it.

The hot path is gated behind cluster.metricsEnabled, which only teranode flips
on. Removing the call leaves it false and the path is never entered.
Connection-pool and node-count metrics are recorded unconditionally, so the
poller still exports them; only the per-command latency histograms and
result-code counts go dark.

Stop-gap. Real fix is in the aerospike-client-go fork, tracked in bsv-blockchain#1001.

Guard initStats against a nil client: the old EnableMetrics call panicked
synchronously on nil, which TestInitStatsBasic relied on via recover().
With it gone the nil-deref moved into the background poller goroutine
(uncatchable), so add an explicit early return.
@oskarszoon oskarszoon force-pushed the fix/aerospike-stats branch from 2221c6b to d0b7d72 Compare June 1, 2026 14:38
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1003 (308b170)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.928µ 2.015µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.56n 61.56n ~ 0.800
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.56n 61.62n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.49n 61.44n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.19n 32.07n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 54.02n 54.37n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 118.2n 137.1n ~ 0.200
MiningCandidate_Stringify_Short-4 262.2n 260.7n ~ 0.400
MiningCandidate_Stringify_Long-4 1.956µ 1.930µ ~ 0.100
MiningSolution_Stringify-4 978.0n 978.5n ~ 1.000
BlockInfo_MarshalJSON-4 1.806µ 1.814µ ~ 1.000
NewFromBytes-4 130.5n 161.8n ~ 0.700
AddTxBatchColumnar_Validation-4 2.455µ 2.455µ ~ 1.000
OffsetValidationLoop-4 637.2n 636.7n ~ 0.700
Mine_EasyDifficulty-4 68.60µ 67.86µ ~ 0.400
Mine_WithAddress-4 7.465µ 7.390µ ~ 0.700
DirectSubtreeAdd/4_per_subtree-4 63.73n 57.28n ~ 0.200
DirectSubtreeAdd/64_per_subtree-4 29.36n 29.93n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 28.07n 28.07n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 26.70n 26.80n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 26.36n 26.38n ~ 0.600
SubtreeProcessorAdd/4_per_subtree-4 299.2n 310.2n ~ 0.200
SubtreeProcessorAdd/64_per_subtree-4 294.2n 305.3n ~ 0.100
SubtreeProcessorAdd/256_per_subtree-4 296.6n 298.1n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 284.2n 285.9n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 283.6n 285.5n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 295.2n 299.4n ~ 0.400
SubtreeProcessorRotate/64_per_subtree-4 304.2n 294.9n ~ 0.400
SubtreeProcessorRotate/256_per_subtree-4 293.0n 287.1n ~ 0.700
SubtreeProcessorRotate/1024_per_subtree-4 281.8n 298.9n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.70n 72.35n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 36.47n 56.40n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 35.53n 56.32n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 34.81n 56.20n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 113.0n 111.6n ~ 0.300
SubtreeCreationOnly/64_per_subtree-4 378.7n 367.4n ~ 0.200
SubtreeCreationOnly/256_per_subtree-4 1.278µ 1.287µ ~ 0.300
SubtreeCreationOnly/1024_per_subtree-4 3.971µ 4.023µ ~ 0.700
SubtreeCreationOnly/2048_per_subtree-4 7.226µ 7.459µ ~ 0.200
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 288.2n 285.8n ~ 1.000
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 286.6n 287.6n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 2.011m 2.095m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.330m 5.622m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.349m 8.596m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 10.23m 11.17m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.767m 1.818m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 4.530m 4.726m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 13.90m 15.43m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 25.64m 28.14m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.068m 2.116m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.547m 8.563m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.56m 14.31m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.799m 1.821m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.113m 8.649m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 43.95m 45.97m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.045µ 3.987µ ~ 0.200
DiskTxMap_SetIfNotExists_Parallel-4 3.866µ 3.778µ ~ 1.000
DiskTxMap_ExistenceOnly-4 337.5n 357.9n ~ 0.100
Queue-4 188.6n 188.8n ~ 1.000
AtomicPointer-4 3.701n 3.623n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 853.9µ 842.3µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/10K-4 781.9µ 769.8µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 125.2µ 107.1µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.72µ 64.92µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 55.31µ 52.43µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/10K-4 11.06µ 11.09µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/10K-4 4.496µ 4.580µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.513µ 1.542µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.630m 10.273m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.828m 11.230m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.094m 1.156m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 709.3µ 707.5µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 535.2µ 521.9µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/100K-4 215.1µ 221.4µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/100K-4 47.95µ 50.85µ ~ 0.700
ReorgOptimizations/NodeFlags/New/100K-4 16.90µ 17.73µ ~ 0.100
TxMapSetIfNotExists-4 49.56n 49.77n ~ 0.200
TxMapSetIfNotExistsDuplicate-4 41.28n 41.45n ~ 0.200
ChannelSendReceive-4 595.0n 601.1n ~ 0.400
BlockAssembler_AddTx-4 0.02819n 0.02682n ~ 0.400
AddNode-4 12.59 12.31 ~ 0.200
AddNodeWithMap-4 13.05 13.05 ~ 1.000
CalcBlockWork-4 510.9n 514.7n ~ 0.400
CalculateWork-4 700.4n 707.7n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.350µ 1.355µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_100-4 13.08µ 13.85µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 128.1µ 128.6µ ~ 1.000
CatchupWithHeaderCache-4 104.5m 104.4m ~ 0.700
SubtreeSizes/10k_tx_4_per_subtree-4 1.375m 1.335m ~ 0.400
SubtreeSizes/10k_tx_16_per_subtree-4 321.2µ 315.1µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 76.23µ 75.82µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 19.15µ 18.96µ ~ 0.700
SubtreeSizes/10k_tx_512_per_subtree-4 9.443µ 9.366µ ~ 0.400
SubtreeSizes/10k_tx_1024_per_subtree-4 4.762µ 4.631µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.365µ 2.358µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 75.01µ 75.00µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 19.17µ 18.96µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.751µ 4.795µ ~ 0.800
BlockSizeScaling/50k_tx_64_per_subtree-4 395.9µ 397.0µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 95.37µ 94.32µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.61µ 23.58µ ~ 1.000
SubtreeAllocations/small_subtrees_exists_check-4 160.3µ 159.6µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 168.8µ 165.6µ ~ 0.200
SubtreeAllocations/small_subtrees_full_validation-4 333.0µ 333.3µ ~ 1.000
SubtreeAllocations/medium_subtrees_exists_check-4 9.545µ 9.562µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.963µ 9.955µ ~ 1.000
SubtreeAllocations/medium_subtrees_full_validation-4 19.36µ 20.10µ ~ 0.200
SubtreeAllocations/large_subtrees_exists_check-4 2.284µ 2.287µ ~ 1.000
SubtreeAllocations/large_subtrees_data_fetch-4 2.433µ 2.438µ ~ 0.800
SubtreeAllocations/large_subtrees_full_validation-4 4.804µ 4.932µ ~ 0.100
_BufferPoolAllocation/16KB-4 4.105µ 4.134µ ~ 1.000
_BufferPoolAllocation/32KB-4 10.502µ 8.778µ ~ 0.100
_BufferPoolAllocation/64KB-4 17.24µ 19.33µ ~ 0.200
_BufferPoolAllocation/128KB-4 33.76µ 35.39µ ~ 0.100
_BufferPoolAllocation/512KB-4 143.5µ 145.4µ ~ 0.700
_BufferPoolConcurrent/32KB-4 22.50µ 22.79µ ~ 0.400
_BufferPoolConcurrent/64KB-4 31.62µ 35.18µ ~ 0.100
_BufferPoolConcurrent/512KB-4 176.1µ 171.7µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/16KB-4 741.6µ 712.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 724.6µ 703.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 698.0µ 711.1µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/128KB-4 699.0µ 710.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 696.7µ 685.0µ ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4 39.32m 38.98m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 39.41m 38.65m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 39.46m 38.65m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 39.37m 39.01m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 39.45m 38.90m ~ 0.100
_PooledVsNonPooled/Pooled-4 823.2n 823.1n ~ 1.000
_PooledVsNonPooled/NonPooled-4 7.810µ 8.414µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 8.695µ 8.115µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.83µ 11.71µ ~ 1.000
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.94µ 11.07µ ~ 0.700
_prepareTxsPerLevel-4 393.2m 389.8m ~ 0.700
_prepareTxsPerLevelOrdered-4 4.373m 4.430m ~ 1.000
_prepareTxsPerLevel_Comparison/Original-4 387.4m 388.0m ~ 0.700
_prepareTxsPerLevel_Comparison/Optimized-4 4.301m 4.360m ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 310.6µ 315.3µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 316.9µ 317.7µ ~ 1.000
GetUtxoHashes-4 256.3n 258.2n ~ 1.000
GetUtxoHashes_ManyOutputs-4 42.51µ 43.28µ ~ 0.700
_NewMetaDataFromBytes-4 227.9n 228.2n ~ 1.000
_Bytes-4 400.1n 397.0n ~ 0.100
_MetaBytes-4 140.6n 138.0n ~ 0.400

Threshold: >10% with p < 0.05 | Generated: 2026-06-01 14:53 UTC

@sonarqubecloud

sonarqubecloud Bot commented Jun 1, 2026

Copy link
Copy Markdown

@oskarszoon oskarszoon merged commit b22659f into bsv-blockchain:main Jun 1, 2026
34 checks passed
@oskarszoon oskarszoon self-assigned this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants