Skip to content

feat(uaerospike): expose connection-semaphore multiplier as a setting#933

Merged
icellan merged 4 commits into
mainfrom
perf/aerospike-semaphore-multiplier
May 26, 2026
Merged

feat(uaerospike): expose connection-semaphore multiplier as a setting#933
icellan merged 4 commits into
mainfrom
perf/aerospike-semaphore-multiplier

Conversation

@icellan

@icellan icellan commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a knob to scale (or disable) the in-process connection-semaphore inside util/uaerospike.Client. The semaphore exists to cap concurrent Aerospike operations so over-parallel callers don't oversubscribe the underlying aerospike-client-go connection pool — but under heavy parallel workloads it can flip from "helpful throttle" to "memory tax": each parked waiter pins its per-call closure state (NewKey, NewBatchRead, addAbstractedBins, FieldNamesToStrings clones) until a permit frees.

On scaling-2 the subtree-validator under cold-cache load was parking ~21 K goroutines on acquirePermit, holding ~14 GB of closure state, and OOM-killing at the 180 GiB pod limit. This setting lets that throttle be turned down — or off — without rebuilding the binary.

Change

New setting:

aerospike_semaphore_multiplier (float64, default 1.0)

Value Effect
0 Disables the semaphore entirely. All acquirePermit / releasePermit calls become no-ops; only the underlying aerospike client's own connection pool governs concurrency. Use when the caller already bounds concurrency upstream.
1.0 Default — preserves prior behavior (one semaphore slot per ConnectionQueueSize-derived permit).
<1 Over-restricts. Useful when sharing the aerospike server with other clients.
>1 Raises the cap. Only safe when the deployment has been verified to handle more concurrent operations than the default queue size implies.

Negative values are treated as 0 (semaphore disabled).

Implementation

  • util/uaerospike.WithSemaphoreMultiplier(float64) — a ClientOption applied via:
    • NewClient(host, port, opts ...ClientOption) — variadic, backwards-compatible.
    • NewClientWithPolicyAndHostOpts(policy, hosts, opts) — slice-args variant (the two variadic slice arguments couldn't coexist on the existing function).
    • NewClientWithPolicyAndHost(policy, hosts...) keeps its existing signature verbatim and now forwards to the opts-aware variant with nil options.
  • buildConnSemaphore(queueSize, multiplier) — returns make(chan struct{}, max(1, round(queueSize * multiplier))) or nil when multiplier <= 0.
  • acquirePermit / releasePermit no-op when the underlying channel is nil.
  • settings/aerospike_settings.go adds the float64 SemaphoreMultiplier field with longdesc covering all four value bands.
  • util/aerospike.go threads the setting through at the production NewClientWithPolicyAndHostOpts call site.

Tests

util/uaerospike/client_test.go adds:

  • TestBuildConnSemaphore — covers default 1.0 (passthrough), 0/-1 (disabled), fractional with round-half-up, double, tiny positive (clamps to 1), and large (8×) multipliers.
  • TestWithSemaphoreMultiplier_DisableMakesAcquireNoOp — proves 1000 sequential acquire/release cycles complete without blocking when the semaphore is disabled.
  • TestWithSemaphoreMultiplier_Scaling — confirms newClientConfig applies the option and tolerates nil entries in the option slice.

All existing tests in util/uaerospike and settings continue to pass under -race.

Test plan

  • go build ./... clean
  • go vet ./util/uaerospike/... ./util/ ./settings/ clean
  • go test -race ./util/uaerospike/... ./settings/ pass
  • Deploy to scaling-2 with aerospike_semaphore_multiplier=0 and verify the goroutine count on acquirePermit drops to zero and the cold-cache OOM cycle breaks.

Risk

Low. Default value preserves prior behavior exactly. Callers wanting the new behavior opt in via the setting. The existing NewClientWithPolicyAndHost signature is preserved (it forwards to the opts-aware variant with nil options), so no caller break.

The util/uaerospike.Client wraps the aerospike-client-go client with an
in-process semaphore sized at the policy's ConnectionQueueSize (or
DefaultConnectionQueueSize = 128 when no policy supplies one). Every
Put / PutBins / Delete / Get / Operate / BatchOperate call must acquire
a slot before proceeding; release on return.

Under heavy parallel workloads — especially batch-decorate paths that
fan out hundreds-to-thousands of concurrent goroutines into Aerospike —
the parked-goroutine memory tax dominates: each waiter pins its
NewKey, NewBatchRead, addAbstractedBins, and FieldNamesToStrings
allocations until a permit frees. On scaling-2 a single subtree-validator
pod parked ~21 K goroutines on acquirePermit, holding ~14 GB of closure
state, and OOM-killed at the 180 GiB memory limit.

This change adds an option-pattern knob so the in-process throttle can
be tuned (or removed) without rebuilding the binary:

  aerospike_semaphore_multiplier (float64, default 1.0)
    0   disables the semaphore entirely. All acquirePermit /
        releasePermit calls become no-ops; only the underlying
        aerospike client's own connection pool governs concurrency.
        Use when the caller already bounds concurrency upstream
        (e.g. via a batcher) and the parking overhead is undesirable.
    1.0 preserves prior behavior (one semaphore slot per
        ConnectionQueueSize-derived permit).
    <1  over-restricts — useful when sharing the aerospike server
        with other clients.
    >1  raises the cap — only safe when the deployment has been
        verified to handle more concurrent operations than the
        default queue size implies.

Negative values are treated as 0 (semaphore disabled).

Implementation:
  * util/uaerospike.WithSemaphoreMultiplier(float64) — a ClientOption
    applied via the new variadic-options NewClient(host, port, opts...)
    and the slice-args NewClientWithPolicyAndHostOpts(policy, hosts, opts).
    The existing NewClientWithPolicyAndHost(policy, hosts...) signature
    is preserved verbatim — it now forwards to the opts-aware variant
    with a nil options slice.
  * buildConnSemaphore(queueSize, multiplier) computes
    max(1, round(queueSize * multiplier)) or returns nil when the
    multiplier <= 0. acquirePermit / releasePermit no-op when the
    underlying channel is nil.
  * settings/aerospike_settings.go adds the float64 SemaphoreMultiplier
    field with longdesc covering all four value bands.
  * util/aerospike.go threads the setting through at the production
    NewClientWithPolicyAndHostOpts call site.

Unit tests in util/uaerospike/client_test.go cover:
  * buildConnSemaphore capacity math across default / disabled /
    fractional / huge / tiny-positive-clamps-to-1 multipliers.
  * acquirePermit / releasePermit no-op behavior on a Client whose
    semaphore was disabled (1000 sequential cycles).
  * newClientConfig option application + nil-option tolerance.
@icellan icellan requested review from Copilot, ordishs and teracoder May 22, 2026 10:22
@icellan icellan requested review from oskarszoon and removed request for ordishs May 22, 2026 10:22
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

No critical issues found. The PR implements a well-designed semaphore multiplier feature with comprehensive test coverage and proper handling of edge cases.

Copilot Comment Addressed:

  • ✅ NaN/Inf handling issue resolved (see inline comment thread)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configurable multiplier (aerospike_semaphore_multiplier) to scale or disable the in-process connection semaphore used by util/uaerospike.Client, enabling operators to reduce goroutine parking/memory pressure under highly parallel workloads without rebuilding.

Changes:

  • Introduces ClientOption plumbing in util/uaerospike and implements WithSemaphoreMultiplier, buildConnSemaphore, and nil-semaphore no-op behavior in acquirePermit/releasePermit.
  • Threads the new setting through the production Aerospike client construction path in util/aerospike.go.
  • Adds the new setting to settings/AerospikeSettings and adds unit tests for the sizing/disable behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
util/uaerospike/client.go Adds option-driven semaphore sizing/disable behavior and nil-safe permit acquire/release.
util/uaerospike/client_test.go Adds tests covering semaphore sizing, disable/no-op semantics, and option application.
util/aerospike.go Applies the new setting by passing WithSemaphoreMultiplier(...) into client construction.
settings/aerospike_settings.go Exposes aerospike_semaphore_multiplier as a documented float64 setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread util/uaerospike/client.go
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-933 (d785dab)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.737µ 1.725µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.73n 61.65n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.77n 61.68n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.72n 61.67n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 29.90n 30.86n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 50.53n 52.07n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 109.3n 113.0n ~ 0.100
MiningCandidate_Stringify_Short-4 261.4n 267.7n ~ 0.100
MiningCandidate_Stringify_Long-4 1.872µ 1.903µ ~ 0.100
MiningSolution_Stringify-4 978.7n 974.8n ~ 0.100
BlockInfo_MarshalJSON-4 1.791µ 1.779µ ~ 0.200
NewFromBytes-4 123.8n 122.8n ~ 0.100
AddTxBatchColumnar_Validation-4 2.765µ 2.363µ ~ 0.100
OffsetValidationLoop-4 544.0n 718.2n ~ 0.100
Mine_EasyDifficulty-4 67.19µ 67.71µ ~ 0.200
Mine_WithAddress-4 7.759µ 7.036µ ~ 0.100
BlockAssembler_AddTx-4 0.02575n 0.02609n ~ 1.000
AddNode-4 10.73 10.63 ~ 0.700
AddNodeWithMap-4 11.34 10.68 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 57.86n 57.96n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 31.52n 28.61n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 30.38n 27.37n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.15n 26.43n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 28.72n 25.93n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 282.4n 281.9n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 278.5n 280.7n ~ 0.400
SubtreeProcessorAdd/256_per_subtree-4 278.6n 280.4n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 272.5n 269.5n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 272.5n 268.6n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 275.6n 274.3n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 278.4n 270.5n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 274.4n 272.1n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 274.7n 272.7n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.53n 54.86n ~ 0.200
SubtreeNodeAddOnly/64_per_subtree-4 34.46n 34.42n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 33.50n 33.58n ~ 0.300
SubtreeNodeAddOnly/1024_per_subtree-4 32.64n 32.77n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 113.8n 115.3n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 399.8n 409.7n ~ 0.200
SubtreeCreationOnly/256_per_subtree-4 1.372µ 1.396µ ~ 0.200
SubtreeCreationOnly/1024_per_subtree-4 4.433µ 4.534µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.205µ 8.670µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 272.4n 273.0n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 271.9n 271.6n ~ 1.000
ParallelGetAndSetIfNotExists/1k_nodes-4 2.176m 2.199m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.257m 5.354m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.085m 7.191m ~ 0.200
ParallelGetAndSetIfNotExists/100k_nodes-4 9.586m 9.730m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.950m 1.938m ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 4.314m 4.334m ~ 0.400
SequentialGetAndSetIfNotExists/50k_nodes-4 12.07m 12.34m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 22.05m 22.75m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.218m 2.225m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.085m 8.120m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 12.80m 12.94m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.949m 1.955m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.335m 7.598m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.59m 39.85m ~ 0.700
DiskTxMap_SetIfNotExists-4 3.636µ 3.692µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.472µ 3.430µ ~ 0.700
DiskTxMap_ExistenceOnly-4 320.1n 326.0n ~ 0.200
Queue-4 184.9n 187.3n ~ 0.200
AtomicPointer-4 3.267n 3.651n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 810.4µ 817.2µ ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/10K-4 766.2µ 771.6µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 102.7µ 102.6µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 64.17µ 63.99µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/10K-4 51.26µ 50.20µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/10K-4 11.13µ 10.91µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.296µ 4.357µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.455µ 1.448µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.541m 9.212m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.234m 9.675m ~ 0.200
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.195m 1.069m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 705.3µ 705.8µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/100K-4 496.3µ 463.9µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 202.5µ 201.8µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 45.42µ 48.34µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 14.53µ 16.69µ ~ 0.100
TxMapSetIfNotExists-4 49.50n 49.56n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 41.34n 41.92n ~ 0.400
ChannelSendReceive-4 597.6n 591.4n ~ 0.400
CalcBlockWork-4 463.7n 473.4n ~ 0.100
CalculateWork-4 627.2n 644.3n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.041µ 1.073µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 10.26µ 11.99µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 100.5µ 100.9µ ~ 0.700
CatchupWithHeaderCache-4 103.9m 104.0m ~ 0.400
_prepareTxsPerLevel-4 411.0m 403.6m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.777m 3.384m ~ 1.000
_prepareTxsPerLevel_Comparison/Original-4 407.5m 410.8m ~ 0.700
_prepareTxsPerLevel_Comparison/Optimized-4 3.597m 3.414m ~ 0.400
_BufferPoolAllocation/16KB-4 3.810µ 3.644µ ~ 0.100
_BufferPoolAllocation/32KB-4 9.791µ 9.266µ ~ 1.000
_BufferPoolAllocation/64KB-4 14.99µ 16.50µ ~ 0.100
_BufferPoolAllocation/128KB-4 26.69µ 34.34µ ~ 0.100
_BufferPoolAllocation/512KB-4 115.84µ 87.97µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.74µ 17.86µ ~ 0.100
_BufferPoolConcurrent/64KB-4 29.17µ 27.28µ ~ 0.100
_BufferPoolConcurrent/512KB-4 158.4µ 149.1µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 631.2µ 702.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 632.1µ 691.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 609.6µ 669.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 635.7µ 650.0µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/512KB-4 614.9µ 662.7µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.32m 36.68m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.67m 36.64m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.61m 36.26m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.08m 36.05m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.22m 36.36m ~ 0.700
_PooledVsNonPooled/Pooled-4 832.1n 830.8n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.477µ 7.204µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.766µ 7.049µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.381µ 9.308µ ~ 0.300
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.550µ 8.899µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 985.3µ 1003.1µ ~ 0.100
SubtreeSizes/10k_tx_16_per_subtree-4 231.1µ 235.8µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 55.29µ 56.43µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 13.86µ 14.25µ ~ 0.200
SubtreeSizes/10k_tx_512_per_subtree-4 6.900µ 6.946µ ~ 0.400
SubtreeSizes/10k_tx_1024_per_subtree-4 3.412µ 3.537µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 1.715µ 1.715µ ~ 0.800
BlockSizeScaling/10k_tx_64_per_subtree-4 54.49µ 55.02µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 13.93µ 13.81µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 3.418µ 3.452µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 291.9µ 296.3µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 68.88µ 69.61µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 16.94µ 16.99µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 116.3µ 117.8µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 124.0µ 124.6µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 236.7µ 239.3µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 6.946µ 7.049µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 7.326µ 7.378µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 13.78µ 13.83µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 1.696µ 1.692µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 1.763µ 1.768µ ~ 0.500
SubtreeAllocations/large_subtrees_full_validation-4 3.446µ 3.465µ ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 334.5µ 337.2µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 344.3µ 337.3µ ~ 1.000
GetUtxoHashes-4 257.5n 257.7n ~ 1.000
GetUtxoHashes_ManyOutputs-4 42.72µ 43.63µ ~ 0.400
_NewMetaDataFromBytes-4 227.8n 229.4n ~ 0.100
_Bytes-4 399.7n 404.5n ~ 0.700
_MetaBytes-4 136.1n 137.4n ~ 0.400

Threshold: >10% with p < 0.05 | Generated: 2026-05-26 09:27 UTC

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes — default is broken.

Critical: SemaphoreMultiplier float64 has default:"1.0" in the struct tag, but the imperative settings loader at settings/settings.go:158-175 doesn't include a getFloat64("aerospike_semaphore_multiplier", 1.0, ...) entry. Struct tag is documentation-only. Empirically settings.NewSettings() returns SemaphoreMultiplier=0 — every prod deployment without an explicit override gets the semaphore disabled. Opposite of the documented "default preserves prior behavior".

Cascading regression: Client.GetConnectionQueueSize() returns cap(nil chan) = 0. The pruner at stores/utxo/aerospike/pruner/pruner_service.go:399 computes recommendedMax = 0 × threshold = 0 and auto-adjusts chunkGroupLimit to 1 — pruner concurrency collapses everywhere.

One-line fix in the loader block:

SemaphoreMultiplier: getFloat64("aerospike_semaphore_multiplier", 1.0, alternativeContext...),

Plus a regression test that asserts the runtime default via settings.NewSettings() (current tests bypass the broken seam by calling newClientConfig / buildConnSemaphore directly).

Worth fixing in the same PR:

  • GetConnectionQueueSize returning 0 when semaphore is disabled is conceptually wrong — should return the underlying aerospike client's queue size so the pruner heuristic keeps working when an operator deliberately opts out.
  • No upper clamp on buildConnSemaphore. 1.0e10 typo would make(chan, 1.28e12) ≈ 10 GB.
  • PR #927 (credential redaction in the same file) merges cleanly, no conflict.

… size, clamp NaN/Inf

Three correctness fixes from review of the original PR:

1. settings/settings.go was missing the imperative loader entry for
   aerospike_semaphore_multiplier, so the struct-tag default of "1.0"
   never took effect — settings.NewSettings() returned 0.0 and every
   deployment without an explicit override silently ran with the
   in-process semaphore disabled. Add the getFloat64 call so the
   documented 1.0 default actually applies.

2. Client.GetConnectionQueueSize returned cap(nil) = 0 when the
   semaphore was deliberately disabled, which made the pruner's
   recommendedMax = 0 × threshold and collapsed chunkGroupLimit to 1.
   Store the resolved underlying connection-queue size on Client and
   report it when the semaphore is nil so external pool-capacity
   heuristics survive an opt-out.

3. buildConnSemaphore did not guard NaN/+Inf or runaway scaled values
   before make(chan, ...). NaN now disables the semaphore (treated as
   garbage input); +Inf and any scaled value above maxSemaphoreCapacity
   (1<<20) are clamped so a typo like 1.0e10 cannot allocate a multi-GB
   channel.

Adds a settings.NewSettings() regression test that asserts the runtime
default is 1.0 — the existing tests bypass the broken seam by calling
newClientConfig / buildConnSemaphore directly. Also extends
TestBuildConnSemaphore with NaN/+Inf/clamp cases and adds an explicit
fallback test for GetConnectionQueueSize.
@icellan

icellan commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@oskarszoon thanks for the catch — pushed 1665f77e6 addressing all three points from the review:

1. Broken default (the critical one)
Added the missing getFloat64("aerospike_semaphore_multiplier", 1.0, alternativeContext...) in the imperative loader at settings/settings.go. The struct-tag default:"1.0" was documentation-only; without the loader entry NewSettings() returned 0.0 and every deployment without an explicit override was silently running with the semaphore disabled.

2. GetConnectionQueueSize returning 0 when semaphore disabled
Client now stores the resolved underlying connection-queue size in a new connQueueSize field. GetConnectionQueueSize returns that when connSemaphore == nil, so the pruner heuristic at pruner_service.go:399 (and any other pool-capacity hint consumer) keeps seeing a meaningful value — same as it would see on main — when an operator deliberately sets multiplier=0.

3. No upper clamp / NaN-Inf guard in buildConnSemaphore

  • NaN multiplier now returns nil (treated as garbage input — same effect as <=0).
  • +Inf and any scaled value >= maxSemaphoreCapacity clamp to maxSemaphoreCapacity (1<<20, ≈1M slots) before make(chan, ...). Picked 1<<20 because it's far above any legit Aerospike ConnectionQueueSize (typical 128–1000) while bounding worst-case channel-buffer overhead from a typo like 1.0e10. Happy to tighten if you'd prefer e.g. 1<<16.

Tests added

  • settings.TestAerospikeSemaphoreMultiplier_Default — goes through NewSettings() and asserts the runtime default is 1.0. Would have caught the broken-default bug; the existing tests bypassed the broken seam by calling newClientConfig / buildConnSemaphore directly.
  • TestBuildConnSemaphore extended with NaN, +Inf, 1.0e10-typo, and exact-max cases.
  • TestGetConnectionQueueSize_FallsBackWhenDisabled exercises both the disabled-semaphore fallback and the active-semaphore path.
  • TestWithSemaphoreMultiplier_DisableMakesAcquireNoOp updated to assert the underlying-queue-size fallback.

Verification run locally

  • go build ./... clean
  • go vet ./util/uaerospike/... ./settings/... clean
  • go test -race ./util/uaerospike/... ./settings/... ./stores/utxo/aerospike/pruner/... PASS
  • staticcheck / golangci-lint: 0 issues

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — all three prior concerns addressed in 1665f77e.

  • Loader default wired at settings/settings.go:173. Empirical default now 1.0 (was 0). New TestAerospikeSemaphoreMultiplier_Default calls NewSettings() directly so the runtime-default seam is now protected by a regression test.
  • GetConnectionQueueSize falls back to connQueueSize (captured at construction from underlying client policy) when the semaphore is nil. Right architectural shape — pruner heuristic keeps a meaningful pool-capacity hint when an operator opts out.
  • NaN treated as garbage-input (disabled), positive Inf and runaway scaled values clamped to maxSemaphoreCapacity = 1<<20. Switch catches the overflow cases before int(math.Round(...)). New tests cover NaN, +Inf, 1.0e10, exact-at-max.

No new issues. 191 tests pass with -race (+8 from new tests).

@icellan

icellan commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Closing briefly to force CI re-trigger on the netsync fixture flake fix (push at 13:09 didn't fire a PR-tests workflow run for some reason). Reopening immediately.

@icellan icellan closed this May 22, 2026
@icellan icellan reopened this May 22, 2026
buildOOBFixture and buildInRangeFixture created their SyncedMap with
limit=2 and then inserted exactly 2 entries. The _NilParentTx tests
subsequently Set on an existing key; once len == limit, go-tx-map's
setUnlocked evicts a random item from the map before writing — which
~50% of the time drops the child the SUT is about to look up, leaving
the test failing with "tx <hash> not found in txMap". Reproduced
locally with -count=20 under -race.

Use an unbounded SyncedMap in the fixtures (the production path sizes
by block tx count, not 2) and leave a comment explaining the eviction
trap so future fixtures don't fall into the same hole.

Duplicates the fix in PR #931 so this PR's CI is unblocked
independently; will fold into theirs if #931 lands first.

(Amended to force a fresh CI trigger — the prior push at 13:09 didn't
fire a pull_request event.)
@icellan icellan force-pushed the perf/aerospike-semaphore-multiplier branch from 1064267 to 1ddd8f5 Compare May 22, 2026 13:59
…ader

# Conflicts:
#	services/legacy/netsync/handle_block_test.go
@sonarqubecloud

Copy link
Copy Markdown

@icellan icellan merged commit 719eb8a into main May 26, 2026
30 checks passed
@icellan icellan deleted the perf/aerospike-semaphore-multiplier branch May 26, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants