chore(settings): raise aerospike client pool to 128 for docker.m#941
Conversation
ConnectionQueueSize=16 + LimitConnectionsToQueueSize=true is too tight for the pruner's default partition-worker fanout (32 workers × chunk_group_limit) and for legacy block-processing batch ops under mainnet IBD load. Once the pool fills, requests time out, the pruner auto-adjusts pruner_utxoChunkGroupLimit to 1 (per the WARN at pruner_service.go:425) but its outer partition-worker count is not auto-throttled, so it still oversubscribes the pool and pruning stalls. Observed on bsva-ovh-teranode-eu-3 mainnet (v0.15.2-beta-1, mainnet height ~795360): pruner stuck with NO_AVAILABLE_CONNECTIONS_TO_NODE errors despite legacy stopped; aerospike namespace hwm-breached + stop_writes=true while 832M non-expirable records accumulated because the pruner couldn't write deletes through the saturated pool. Bump ConnectionQueueSize 16 -> 128 and MinConnectionsPerNode 8 -> 16 for the docker.m context only. Single-node aerospike with proto-fd-max default 15000 has plenty of server-side headroom; the 16-conn ceiling was the constraint, not the server.
|
🤖 Claude Code Review Status: Complete Review SummaryThis PR increases Aerospike connection pool settings for the Configuration changes verified:
Scope correctly limited:
Documentation accuracy: No issues found. Changes align with AGENTS.md principles: minimal scope, properly verified plan, clear rationale in PR description. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-26 08:33 UTC |
|



Summary
Raise
ConnectionQueueSizefrom 16 to 128 (andMinConnectionsPerNodefrom 8 to 16) on theutxostore.docker.mURL.docker.m(docker-microservice) context only — used byteranode-quickstartand other docker-based deployments. Other contexts unchanged.Why
ConnectionQueueSize=16+LimitConnectionsToQueueSize=trueis too tight for the default pruner partition-worker fanout, and for legacy block-processing batch ops under mainnet IBD load.The pruner detects this and emits:
It auto-throttles
pruner_utxoChunkGroupLimitbut notpruner_utxoPartitionQueries(the outer 32-worker fanout). Even atchunk_group_limit=1, 32 partition workers × 1 = 32 concurrent ops, which still exceeds the 16-conn pool. Every pruner batch then errors withNO_AVAILABLE_CONNECTIONS_TO_NODEfollowed byTIMEOUT.Observed
bsva-ovh-teranode-eu-3, mainnet,v0.15.2-beta-1, height ~795,360.Cascade with the old pool size:
NO_AVAILABLE_CONNECTIONS_TO_NODEstop-writes-used-pct=70(inherited fromevict-used-pct=70) tripsstop_writes=true+hwm_breached=truestop_writesblocks writes → deadlockThe 16-conn ceiling was the bottleneck, not the aerospike server:
proto-fd-maxdefaults to 15,000, and the actual server-side connection count peaked around 72 (across all teranode services combined) before the lockup, so there's plenty of server headroom.Why 128 specifically
With the default
pruner_utxoPartitionQueries=0(auto-detect, ~32 workers on a typical host) and a chunk_group_limit that can rise to its 10 default once the pool is comfortable, peak pruner concurrency is ~320 ops. 128 connections gives that 2.5× headroom for the auto-adjust to land at 5–8 chunk groups rather than being pinned at 1. Other utxostore consumers (legacy, blockvalidation, subtreevalidation, blockassembly, blockchain, propagation, asset, pruner) each get their own pool; at 128/pool × 8 clients = 1,024 total potential connections, well underproto-fd-max=15000.MinConnectionsPerNoderaised proportionally (8 → 16) so warm-up provisions a reasonable baseline.Scope
Only
utxostore.docker.mchanges. The commented-oututxostore.dockertemplate right below remains atConnectionQueueSize=32(it's a template, not active). Operator/k8s contexts unchanged.Verification
ConnectionQueueSize=16→128,MinConnectionsPerNode=8→16teranode-quickstart), confirmdocker exec aerospike asinfo -v statistics | grep client_connectionsreflects the larger ceiling and the pruner WARN atpruner_service.go:425no longer auto-throttles tochunk_group_limit=1Related
createUtxosunboundedSetMinedMulti(one of the connection-burner paths upstream of this issue)Not in this PR
aerospike-client-go/v8connection-handling on timeout (InDoubt: truepaths) leaks slower than steady-state churn returns them. Worth a separate investigation. Bigger pool buys time, doesn't eliminate the leak.evict-used-pct=70inconfig/aerospike.conftemplate silently anchorsstop-writes-used-pct=70even though eviction is a no-op fordefault-ttl 0. Separate concern; affects quickstart configs.pruner_utxoPartitionQueries, not justpruner_utxoChunkGroupLimit. Separate fix.