chore(settings): raise aerospike client pool to 128 for docker.m by oskarszoon · Pull Request #941 · bsv-blockchain/teranode

oskarszoon · 2026-05-26T07:37:05Z

Summary

Raise ConnectionQueueSize from 16 to 128 (and MinConnectionsPerNode from 8 to 16) on the utxostore.docker.m URL. docker.m (docker-microservice) context only — used by teranode-quickstart and other docker-based deployments. Other contexts unchanged.

Why

ConnectionQueueSize=16 + LimitConnectionsToQueueSize=true is too tight for the default pruner partition-worker fanout, and for legacy block-processing batch ops under mainnet IBD load.

The pruner detects this and emits:

WARN | pruner/pruner_service.go:425 | utxos | 
Pruner concurrency would exhaust Aerospike connection pool. 
Max pruner connections: 64, ConnectionQueueSize: 16, Recommended max: 11. 
Auto-adjusting pruner_utxoChunkGroupLimit from 1 to 1 to prevent exhaustion.

It auto-throttles pruner_utxoChunkGroupLimit but not pruner_utxoPartitionQueries (the outer 32-worker fanout). Even at chunk_group_limit=1, 32 partition workers × 1 = 32 concurrent ops, which still exceeds the 16-conn pool. Every pruner batch then errors with NO_AVAILABLE_CONNECTIONS_TO_NODE followed by TIMEOUT.

Observed

bsva-ovh-teranode-eu-3, mainnet, v0.15.2-beta-1, height ~795,360.

Cascade with the old pool size:

Legacy retries on fat blocks (compounded by legacy: createUtxos calls SetMinedMulti with unbounded slice — stalls aerospike on fat blocks (regression from #854) #936 before its fix) leak some client connections faster than they close
Pool saturates at 16
Pruner can't issue parent-update writes — perpetually NO_AVAILABLE_CONNECTIONS_TO_NODE
Spent UTXOs aren't deleted; objects climb to 832M
stop-writes-used-pct=70 (inherited from evict-used-pct=70) trips
Aerospike enters stop_writes=true + hwm_breached=true
Pruner needs to write to delete things, but stop_writes blocks writes → deadlock

The 16-conn ceiling was the bottleneck, not the aerospike server: proto-fd-max defaults to 15,000, and the actual server-side connection count peaked around 72 (across all teranode services combined) before the lockup, so there's plenty of server headroom.

Why 128 specifically

With the default pruner_utxoPartitionQueries=0 (auto-detect, ~32 workers on a typical host) and a chunk_group_limit that can rise to its 10 default once the pool is comfortable, peak pruner concurrency is ~320 ops. 128 connections gives that 2.5× headroom for the auto-adjust to land at 5–8 chunk groups rather than being pinned at 1. Other utxostore consumers (legacy, blockvalidation, subtreevalidation, blockassembly, blockchain, propagation, asset, pruner) each get their own pool; at 128/pool × 8 clients = 1,024 total potential connections, well under proto-fd-max=15000.

MinConnectionsPerNode raised proportionally (8 → 16) so warm-up provisions a reasonable baseline.

Scope

Only utxostore.docker.m changes. The commented-out utxostore.docker template right below remains at ConnectionQueueSize=32 (it's a template, not active). Operator/k8s contexts unchanged.

Verification

Config diff is one line, ConnectionQueueSize=16 → 128, MinConnectionsPerNode=8 → 16
Operator validation: bring up a docker.m deployment (e.g. teranode-quickstart), confirm docker exec aerospike asinfo -v statistics | grep client_connections reflects the larger ceiling and the pruner WARN at pruner_service.go:425 no longer auto-throttles to chunk_group_limit=1

legacy: createUtxos calls SetMinedMulti with unbounded slice — stalls aerospike on fat blocks (regression from #854) #936 — createUtxos unbounded SetMinedMulti (one of the connection-burner paths upstream of this issue)
blockvalidation: OOM during ttn catch-up sync — 70% of heap in go-bt tx/output decode #920 — go-bt arena fix (already merged in v0.15.2)

Not in this PR

The underlying aerospike-client-go/v8 connection-handling on timeout (InDoubt: true paths) leaks slower than steady-state churn returns them. Worth a separate investigation. Bigger pool buys time, doesn't eliminate the leak.
evict-used-pct=70 in config/aerospike.conf template silently anchors stop-writes-used-pct=70 even though eviction is a no-op for default-ttl 0. Separate concern; affects quickstart configs.
The pruner auto-adjust should also throttle pruner_utxoPartitionQueries, not just pruner_utxoChunkGroupLimit. Separate fix.

ConnectionQueueSize=16 + LimitConnectionsToQueueSize=true is too tight for the pruner's default partition-worker fanout (32 workers × chunk_group_limit) and for legacy block-processing batch ops under mainnet IBD load. Once the pool fills, requests time out, the pruner auto-adjusts pruner_utxoChunkGroupLimit to 1 (per the WARN at pruner_service.go:425) but its outer partition-worker count is not auto-throttled, so it still oversubscribes the pool and pruning stalls. Observed on bsva-ovh-teranode-eu-3 mainnet (v0.15.2-beta-1, mainnet height ~795360): pruner stuck with NO_AVAILABLE_CONNECTIONS_TO_NODE errors despite legacy stopped; aerospike namespace hwm-breached + stop_writes=true while 832M non-expirable records accumulated because the pruner couldn't write deletes through the saturated pool. Bump ConnectionQueueSize 16 -> 128 and MinConnectionsPerNode 8 -> 16 for the docker.m context only. Single-node aerospike with proto-fd-max default 15000 has plenty of server-side headroom; the 16-conn ceiling was the constraint, not the server.

github-actions · 2026-05-26T07:38:07Z

🤖 Claude Code Review

Status: Complete

Review Summary

This PR increases Aerospike connection pool settings for the docker.m context to prevent connection exhaustion under mainnet IBD load. The changes are minimal, well-scoped, and properly documented.

Configuration changes verified:

settings.conf: ConnectionQueueSize 16→128, MinConnectionsPerNode 8→16 for utxostore.docker.m
New context-specific override: pruner_utxoPartitionQueries.docker.m = 8
Documentation updated to reflect actual code default (128)

Scope correctly limited:

Only docker.m context affected as intended
Kubernetes operator config unchanged (still 16, which is appropriate per PR description)
Commented template unchanged

Documentation accuracy:
All documentation changes accurately reflect the code. The docs previously showed default 256 in the table but the actual code default is 128 (util/uaerospike/client.go:18), so the doc update to 128 is a correction not just cosmetic alignment.

No issues found. Changes align with AGENTS.md principles: minimal scope, properly verified plan, clear rationale in PR description.

github-actions · 2026-05-26T07:52:39Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-941 (0f98f93)

Summary

Regressions: 0
Improvements: 0
Unchanged: 144
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.974µ	1.722µ	~	0.200
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	61.67n	61.69n	~	1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	61.72n	61.90n	~	0.100
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	61.75n	61.73n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	29.77n	29.93n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	51.51n	50.12n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	110.6n	116.2n	~	0.700
MiningCandidate_Stringify_Short-4	262.5n	261.4n	~	0.400
MiningCandidate_Stringify_Long-4	1.892µ	1.855µ	~	0.100
MiningSolution_Stringify-4	984.4n	970.8n	~	0.100
BlockInfo_MarshalJSON-4	1.789µ	1.793µ	~	1.000
NewFromBytes-4	128.4n	128.4n	~	1.000
AddTxBatchColumnar_Validation-4	2.471µ	2.535µ	~	0.100
OffsetValidationLoop-4	635.3n	634.5n	~	1.000
Mine_EasyDifficulty-4	65.96µ	65.65µ	~	0.700
Mine_WithAddress-4	7.026µ	7.928µ	~	0.100
BlockAssembler_AddTx-4	0.02819n	0.02859n	~	1.000
AddNode-4	11.74	10.82	~	0.200
AddNodeWithMap-4	11.56	11.27	~	1.000
DiskTxMap_SetIfNotExists-4	3.647µ	3.738µ	~	1.000
DiskTxMap_SetIfNotExists_Parallel-4	4.226µ	18.282µ	~	0.700
DiskTxMap_ExistenceOnly-4	403.4n	341.6n	~	0.700
Queue-4	148.3n	150.5n	~	0.100
AtomicPointer-4	2.515n	2.492n	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	629.6µ	632.5µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4	616.3µ	601.0µ	~	0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4	80.74µ	81.28µ	~	0.400
ReorgOptimizations/AllMarkFalse/New/10K-4	49.96µ	49.51µ	~	0.700
ReorgOptimizations/HashSlicePool/Old/10K-4	39.09µ	41.40µ	~	0.700
ReorgOptimizations/HashSlicePool/New/10K-4	8.509µ	8.716µ	~	0.700
ReorgOptimizations/NodeFlags/Old/10K-4	3.317µ	3.253µ	~	0.700
ReorgOptimizations/NodeFlags/New/10K-4	1.120µ	1.133µ	~	0.400
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	7.658m	7.585m	~	0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4	8.432m	7.783m	~	0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4	869.0µ	863.3µ	~	0.400
ReorgOptimizations/AllMarkFalse/New/100K-4	547.2µ	545.5µ	~	0.100
ReorgOptimizations/HashSlicePool/Old/100K-4	408.4µ	378.8µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	202.0µ	199.3µ	~	0.700
ReorgOptimizations/NodeFlags/Old/100K-4	33.71µ	36.27µ	~	0.700
ReorgOptimizations/NodeFlags/New/100K-4	12.73µ	11.75µ	~	0.100
TxMapSetIfNotExists-4	38.14n	38.88n	~	0.100
TxMapSetIfNotExistsDuplicate-4	31.86n	32.22n	~	0.100
ChannelSendReceive-4	447.3n	443.1n	~	1.000
DirectSubtreeAdd/4_per_subtree-4	76.36n	76.77n	~	0.400
DirectSubtreeAdd/64_per_subtree-4	40.96n	41.47n	~	0.200
DirectSubtreeAdd/256_per_subtree-4	40.38n	39.85n	~	0.200
DirectSubtreeAdd/1024_per_subtree-4	38.43n	38.46n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	38.12n	38.02n	~	0.400
SubtreeProcessorAdd/4_per_subtree-4	369.3n	358.2n	~	0.100
SubtreeProcessorAdd/64_per_subtree-4	357.2n	350.7n	~	0.100
SubtreeProcessorAdd/256_per_subtree-4	339.1n	336.8n	~	0.400
SubtreeProcessorAdd/1024_per_subtree-4	334.5n	336.3n	~	0.700
SubtreeProcessorAdd/2048_per_subtree-4	340.8n	346.4n	~	0.100
SubtreeProcessorRotate/4_per_subtree-4	340.7n	350.9n	~	0.100
SubtreeProcessorRotate/64_per_subtree-4	339.0n	349.0n	~	0.100
SubtreeProcessorRotate/256_per_subtree-4	336.0n	349.8n	~	0.100
SubtreeProcessorRotate/1024_per_subtree-4	339.0n	338.2n	~	0.700
SubtreeNodeAddOnly/4_per_subtree-4	88.15n	88.37n	~	0.700
SubtreeNodeAddOnly/64_per_subtree-4	65.05n	64.90n	~	0.100
SubtreeNodeAddOnly/256_per_subtree-4	64.37n	64.06n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	63.60n	63.65n	~	0.700
SubtreeCreationOnly/4_per_subtree-4	147.4n	147.8n	~	1.000
SubtreeCreationOnly/64_per_subtree-4	526.8n	538.4n	~	0.100
SubtreeCreationOnly/256_per_subtree-4	1.907µ	1.925µ	~	0.100
SubtreeCreationOnly/1024_per_subtree-4	6.203µ	6.254µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	11.24µ	11.18µ	~	0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	342.5n	341.2n	~	1.000
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	341.6n	337.2n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	2.389m	2.342m	~	0.100
ParallelGetAndSetIfNotExists/10k_nodes-4	6.675m	6.480m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	8.497m	8.139m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	11.72m	11.24m	~	0.100
SequentialGetAndSetIfNotExists/1k_nodes-4	1.977m	1.955m	~	0.700
SequentialGetAndSetIfNotExists/10k_nodes-4	5.586m	5.477m	~	0.400
SequentialGetAndSetIfNotExists/50k_nodes-4	17.02m	16.21m	~	0.100
SequentialGetAndSetIfNotExists/100k_nodes-4	29.90m	31.45m	~	0.200
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	2.399m	2.417m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	9.500m	9.500m	~	1.000
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	14.68m	14.51m	~	0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	2.057m	2.028m	~	0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	9.259m	8.819m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	58.03m	55.00m	~	0.100
CalcBlockWork-4	357.0n	364.1n	~	1.000
CalculateWork-4	480.1n	500.0n	~	0.100
BuildBlockLocatorString_Helpers/Size_10-4	1.342µ	1.356µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	13.10µ	13.18µ	~	0.100
BuildBlockLocatorString_Helpers/Size_1000-4	156.4µ	160.3µ	~	0.700
CatchupWithHeaderCache-4	104.5m	104.5m	~	1.000
SubtreeSizes/10k_tx_4_per_subtree-4	1.341m	1.320m	~	0.700
SubtreeSizes/10k_tx_16_per_subtree-4	313.6µ	313.8µ	~	0.700
SubtreeSizes/10k_tx_64_per_subtree-4	75.09µ	75.11µ	~	1.000
SubtreeSizes/10k_tx_256_per_subtree-4	18.71µ	18.92µ	~	0.200
SubtreeSizes/10k_tx_512_per_subtree-4	9.377µ	9.397µ	~	0.200
SubtreeSizes/10k_tx_1024_per_subtree-4	4.696µ	4.664µ	~	0.700
SubtreeSizes/10k_tx_2k_per_subtree-4	2.335µ	2.325µ	~	0.700
BlockSizeScaling/10k_tx_64_per_subtree-4	73.66µ	74.49µ	~	1.000
BlockSizeScaling/10k_tx_256_per_subtree-4	18.65µ	18.72µ	~	0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.672µ	4.683µ	~	0.500
BlockSizeScaling/50k_tx_64_per_subtree-4	385.2µ	386.0µ	~	1.000
BlockSizeScaling/50k_tx_256_per_subtree-4	91.94µ	92.84µ	~	0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4	22.87µ	23.03µ	~	0.700
SubtreeAllocations/small_subtrees_exists_check-4	160.3µ	161.4µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	159.6µ	160.1µ	~	1.000
SubtreeAllocations/small_subtrees_full_validation-4	321.5µ	321.6µ	~	1.000
SubtreeAllocations/medium_subtrees_exists_check-4	9.504µ	9.630µ	~	0.700
SubtreeAllocations/medium_subtrees_data_fetch-4	9.353µ	9.435µ	~	0.700
SubtreeAllocations/medium_subtrees_full_validation-4	18.75µ	18.81µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.296µ	2.265µ	~	0.100
SubtreeAllocations/large_subtrees_data_fetch-4	2.253µ	2.268µ	~	0.700
SubtreeAllocations/large_subtrees_full_validation-4	4.683µ	4.717µ	~	0.700
_BufferPoolAllocation/16KB-4	4.980µ	3.681µ	~	0.100
_BufferPoolAllocation/32KB-4	7.415µ	7.184µ	~	0.100
_BufferPoolAllocation/64KB-4	14.31µ	16.94µ	~	0.400
_BufferPoolAllocation/128KB-4	25.05µ	27.02µ	~	0.200
_BufferPoolAllocation/512KB-4	113.4µ	106.3µ	~	0.700
_BufferPoolConcurrent/32KB-4	18.28µ	18.22µ	~	1.000
_BufferPoolConcurrent/64KB-4	29.01µ	29.00µ	~	1.000
_BufferPoolConcurrent/512KB-4	142.8µ	149.6µ	~	0.100
_SubtreeDeserializationWithBufferSizes/16KB-4	611.9µ	607.6µ	~	0.700
_SubtreeDeserializationWithBufferSizes/32KB-4	614.9µ	606.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	613.7µ	613.1µ	~	0.400
_SubtreeDeserializationWithBufferSizes/128KB-4	606.8µ	591.9µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	595.6µ	586.5µ	~	0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4	36.60m	37.19m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4	36.58m	37.06m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4	36.44m	37.00m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4	36.15m	36.74m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4	36.45m	36.89m	~	0.700
_PooledVsNonPooled/Pooled-4	833.9n	834.8n	~	0.400
_PooledVsNonPooled/NonPooled-4	7.072µ	7.292µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	7.493µ	7.045µ	~	0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.650µ	9.808µ	~	0.700
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.406µ	10.110µ	~	0.100
_prepareTxsPerLevel-4	427.7m	428.2m	~	1.000
_prepareTxsPerLevelOrdered-4	4.093m	5.300m	~	0.200
_prepareTxsPerLevel_Comparison/Original-4	429.1m	429.5m	~	1.000
_prepareTxsPerLevel_Comparison/Optimized-4	4.319m	4.942m	~	0.400
StoreBlock_Sequential/BelowCSVHeight-4	335.3µ	334.4µ	~	0.700
StoreBlock_Sequential/AboveCSVHeight-4	335.9µ	335.9µ	~	1.000
GetUtxoHashes-4	255.5n	252.8n	~	1.000
GetUtxoHashes_ManyOutputs-4	48.77µ	49.14µ	~	0.400
_NewMetaDataFromBytes-4	225.0n	228.7n	~	0.200
_Bytes-4	410.0n	413.9n	~	0.100
_MetaBytes-4	137.4n	137.4n	~	1.000

Threshold: >10% with p < 0.05 | Generated: 2026-05-26 08:33 UTC

blockpusher

LGTM

sonarqubecloud · 2026-05-26T08:31:12Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Merge branch 'main' into chore/aerospike-pool-128-docker-m

4c68289

oskarszoon enabled auto-merge (squash) May 26, 2026 07:39

Update default ConnectionQueueSize in docs

953351a

blockpusher approved these changes May 26, 2026

View reviewed changes

oskarszoon disabled auto-merge May 26, 2026 07:53

icellan approved these changes May 26, 2026

View reviewed changes

Limit pruner parallel partition pruning

f54bc3f

freemans13 approved these changes May 26, 2026

View reviewed changes

oskarszoon merged commit ba54431 into bsv-blockchain:main May 26, 2026
25 checks passed

oskarszoon mentioned this pull request May 27, 2026

spend circuit breaker counts KEY_NOT_FOUND as infrastructure failure, defeats orphanage during catch-up #953

Closed

3 tasks

icellan mentioned this pull request May 27, 2026

fix(utxo/aerospike): exclude data-state errors from spend circuit breaker #957

Merged

5 tasks

This was referenced Jun 1, 2026

aerospike-client-go/v8 nodeStats.updateOrInsert consumes 44-60% of legacy CPU during mainnet IBD #1001

Open

blob/file.SetFromReader is the new top CPU sink (23-44%) on legacy IBD after #1001/#1002 #1012

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(settings): raise aerospike client pool to 128 for docker.m#941

chore(settings): raise aerospike client pool to 128 for docker.m#941
oskarszoon merged 4 commits into
bsv-blockchain:mainfrom
oskarszoon:chore/aerospike-pool-128-docker-m

oskarszoon commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

blockpusher left a comment

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

oskarszoon commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Observed

Why 128 specifically

Scope

Verification

Related

Not in this PR

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Summary

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

blockpusher left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

oskarszoon commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading