Skip to content

fix(asset): admission control for subtree_data + 503 retry in peer callers#830

Closed
icellan wants to merge 1 commit into
feat/teranode-native-opsfrom
fix/asset-subtree-data-admission-control
Closed

fix(asset): admission control for subtree_data + 503 retry in peer callers#830
icellan wants to merge 1 commit into
feat/teranode-native-opsfrom
fix/asset-subtree-data-admission-control

Conversation

@icellan

@icellan icellan commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Hotfix for production OOMKilled crashloops on the asset service caused by unbounded concurrent on-demand /subtree_data file creation.

The streaming work in #827 capped per-request memory but did not add admission control. Each dualStreamWithFileCreation call holds chunk-sized batches of full transaction metadata in memory; multiplied by however many clients arrive at once and by per-tx size variance (data carriers, multi-output txs), the asset process trivially exceeds even a 64Gi limit. This PR adds the missing admission cap, surfaces it as an HTTP 503 with Retry-After, and teaches the three known peer-side callers (subtreevalidation, blockvalidation catchup) to back off and retry instead of failing through.

Targets feat/teranode-native-ops because the dualStreamWithFileCreation path being protected lives on that branch (added via #826's cherry-pick chain).

Production evidence

The trigger for this work was the nginx-cache error stream:

upstream prematurely closed connection while reading response header from upstream,
client: 10.224.5.49, server: , request: "GET /api/v1/subtree_data/746a4271...",
upstream: "http://10.96.115.8:8090/api/v1/subtree_data/746a4271..."

Investigation on dev-scale-1-scale-1:

Signal Observation
kubectl describe pod on all 4 asset replicas Reason: OOMKilled, Exit Code: 137, restart counts 18-22
Pod lifespan 60-90s between restart and next OOM
Memory limit 64Gi (already huge)
pprof goroutine on a freshly-started pod (~6 min uptime) 66,355 goroutines — top groups: 3,616 in go-batcher.SetMaxConcurrent, 1,898 in errgroup.Group.Go, 103 active Repository.getTxs
Quorum lock dir hundreds of stale *.lock files left by previous crashes

The "upstream prematurely closed connection" line is the only externally visible symptom because SIGKILL drops the TCP connection before any response headers can flush. Fixing the OOM eliminates it.

Root cause

Per-request memory was bounded by #827, but four amplifiers compound:

  1. meta.Data.Tx is the full parsed bt.Tx. For OP_RETURN data carriers / multi-output txs it can run 10KB-500KB+. A 10000-tx chunk is 100MB-5GB, not the documented "~5MB".
  2. dualStreamWithFileCreation writes through io.MultiWriter(storer, httpWriter). Both must accept each byte before the next chunk can write. Slow client OR slow file storer (Ensure atomic file store publication #826's atomic-rename path competing for global write semaphore) makes the producer block, and chunks pile up in flight + resultsChan + pending.
  3. writeTransactionsViaSubtreeStoreStreaming is shared by GetSubtreeData, GetLegacyBlock, and mining_candidate_legacy_block — they fan out into the same getTxs machinery, multiplying goroutine pressure on Aerospike.
  4. ConcurrencyGetSubtreeDataReader only caps the reader slot — it doesn't distinguish the cheap file-exists path from the memory-heavy on-demand creation path. With four expensive creations in flight you've already scheduled 50+ chunks worth of tx data in heap.

Changes

Asset — admission control

File What
settings/asset_settings.go + settings/settings.go New setting asset_concurrency_subtree_data_create (default 4).
services/asset/repository/repository.go New semSubtreeDataCreate semaphore + tryAcquireSemaphorePermit helper.
services/asset/repository/GetSubtreeData.go Restructured GetSubtreeDataReader: Exists check first (no permit held), file-exists fast path uses semGetSubtreeDataReader (bounded by FD count, can be raised), on-demand creation uses non-blocking TryAcquire on semSubtreeDataCreate and returns ErrServiceUnavailable immediately when at capacity. New fileAppearedReadback helper handles the "file appeared during setup" race cleanly.
services/asset/httpimpl/GetSubtreeData.go Maps ErrServiceUnavailable → HTTP 503 with Retry-After: 1. 404 still 404; everything else still 500.
services/asset/repository/GetLegacyBlock.go Defensive cap on pending chunk map (2 × concurrency); aborts with a clear error if a future scheduler regression grows it. ctx check every 256 txs in writeChunkToWriter so client disconnect releases chunkMetaSlice promptly.

Util — typed 503 + retry helper

File What
util/http.go buildHTTPError now produces typed errors.ErrServiceUnavailable on 503 (callers can errors.Is). New DoHTTPRequestBodyReaderWithRetry: exponential backoff (250ms → 5s, max 6 attempts), honors server Retry-After header, retries only on 503. Non-503 errors and ctx cancellation return immediately.
util/http_test.go 7 unit tests / 15 sub-cases. Race-clean.

Callers — retry on peer 503

Three call sites switched from DoHTTPRequestBodyReader to DoHTTPRequestBodyReaderWithRetry:

  • services/subtreevalidation/SubtreeValidation.gogetSubtreeMissingTxs
  • services/subtreevalidation/check_block_subtrees.goCheckBlockSubtrees
  • services/blockvalidation/get_blocks.gofetchSubtreeDataFromPeer (catchup)

Behavior changes — what clients should expect

Asset server

  • Under nominal load: no observable change.
  • Under create-path saturation: the 5th simultaneous on-demand creation gets HTTP 503 with Retry-After: 1 instead of waiting up to 30s for a permit (then 503'ing anyway via timeout). The pod stays up and serves all already-created files via the fast path.
  • For an unhealthy cluster: a client that hammers the server during a stuck Aerospike batch gets 503s and is expected to retry. Far better failure mode than crashing.

Peer validation services

  • subtreevalidation / blockvalidation: a peer's transient 503 is now retried (up to ~7.75s total worst case) instead of immediately falling through to "peer cannot provide subtree data" / next-peer attempt.
  • This may slightly slow down detection of genuinely broken peers (a peer that always 503s now takes ~8s to give up), but eliminates the spurious "peer is broken" classification when the peer is just temporarily admission-throttled. Net positive for sync stability.

Test plan

Unit tests (in this PR)

  • TestDoHTTPRequestBodyReaderWithRetry_SuccessOnFirstTry — no retry overhead when healthy
  • TestDoHTTPRequestBodyReaderWithRetry_RetriesOn503ThenSucceeds — returns body of successful attempt, not 503 body
  • TestDoHTTPRequestBodyReaderWithRetry_ExhaustsAttemptsOnPersistent503 — final error is typed ErrServiceUnavailable
  • TestDoHTTPRequestBodyReaderWithRetry_HonorsRetryAfter — server Retry-After: 1 overrides much-smaller initialDelay
  • TestDoHTTPRequestBodyReaderWithRetry_NoRetryOnNon503 (4 sub-cases) — 500/502/504/404 fail in 1 attempt
  • TestDoHTTPRequestBodyReaderWithRetry_ContextCancelAbortsRetries — ctx cancel short-circuits the loop
  • TestParseRetryAfter — empty/negative/non-numeric inputs return 0

Local verification done

  • go build clean across util/, services/asset/, services/subtreevalidation/, services/blockvalidation/
  • go vet clean (only pre-existing warnings in test/utils/ unrelated to this change)
  • go test ./util/ -race passes in ~7s
  • Pre-commit hooks: gci, gofmt, golangci-lint, etc. all green

Pending verification (cannot do locally)

  • CI test suite (deferring to CI run on this PR)
  • Deploy to dev-scale-1 and confirm:
    • asset pods stay up (no OOMKilled events)
    • 503s appear in metrics/logs under load instead of crashes
    • subtreevalidation / blockvalidation peers tolerate the 503s and complete catchup

Risks and rollout notes

  • Feature flag-able via setting: asset_concurrency_subtree_data_create=0 reverts to unlimited (the prior behavior). Keep a knob in case the cap turns out to be too aggressive.
  • The 503 path is new: clients of /subtree_data that don't go through the new retry helper will see 503s they didn't see before. The three known internal callers are updated; any external/unknown caller falls back to existing peer-failure handling, which already treats network errors as transient.
  • Retry storm risk: 6 attempts × ~7.75s worst case per request × N peers could amplify load on a struggling asset server. Mitigated by the exponential backoff + Retry-After honoring + the 503-only filter (we don't retry on 5xx in general).
  • No schema/wire changes: pure server-side and client-side error-handling change.
  • Rollback: revert this commit. The deployed feat/teranode-native-ops branch returns to the prior (crashlooping) behavior — only do this if the new behavior is worse than the OOM, which would be surprising.

Companion / follow-up work (not in this PR)

  • Lower default asset_subtreeDataStreamingChunkSize (currently 10000) — config-only change, can ship via Helm without a code change.
  • Background quorum lock cleanup on startup — current per-request lazy expiration is fine but creates surprising latency when many stale locks exist after a crash storm.
  • processTxMetaUsingStoreConcurrency review — getTxs fan-out is the largest goroutine multiplier; we may want a global cap rather than per-call.

Production asset pods were OOMKilling under load on /api/v1/subtree_data,
manifesting downstream as nginx "upstream prematurely closed connection
while reading response header from upstream". Goroutine profiles showed
60K+ goroutines accumulating in the chunk-fetch fan-out before SIGKILL.

The earlier streaming work bounded per-request memory but did nothing to
cap concurrent on-demand subtreeData file creations: each one holds
chunkSize tx-metadata batches in memory, multiplied by however many
clients arrive at once. With large transactions and slow clients the
process trivially exceeds even a 64Gi limit.

Asset side - admission control:
- New asset_concurrency_subtree_data_create setting (default 4) gates
  the dualStreamWithFileCreation path with non-blocking TryAcquire.
  When the cap is reached, requests get HTTP 503 with Retry-After: 1
  instead of waiting up to 30s for a permit.
- Restructured GetSubtreeDataReader to check Exists first without
  holding the reader semaphore. File-exists fast path uses the existing
  reader sem; on-demand creation uses the new create sem.
- Defensive cap on the pending chunk map in
  writeTransactionsViaSubtreeStoreStreaming (2 * concurrency); aborts
  with a clear error if a future scheduler regression grows it.
- ctx check every 256 txs in writeChunkToWriter so a client disconnect
  releases chunkMetaSlice promptly instead of waiting for the next pipe
  write to fail.

HTTP utility - typed 503 + retry helper:
- buildHTTPError now produces errors.ErrServiceUnavailable on 503 so
  callers can errors.Is on it.
- New DoHTTPRequestBodyReaderWithRetry: exponential backoff (250ms ->
  5s, max 6 attempts), honors Retry-After header, retries only on 503.
  Non-503 errors and ctx cancellation return immediately.

Callers - retry on peer 503:
- subtreevalidation/SubtreeValidation.go (getSubtreeMissingTxs)
- subtreevalidation/check_block_subtrees.go (CheckBlockSubtrees)
- blockvalidation/get_blocks.go (fetchSubtreeDataFromPeer)

Tests:
- 7 new unit tests for the retry helper covering success, retry-then-
  succeed, attempt exhaustion, Retry-After honoring, no-retry on non-
  503, ctx cancellation, and parseRetryAfter parsing. Race-clean.

Verified: go build, go vet, go test ./util/ -race all clean.
@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Review Summary

This PR implements a critical production hotfix for OOMKilled crashes in the asset service by adding admission control for on-demand subtree data creation and retry logic for 503 responses.

Current Review: No issues found

The implementation is solid:

  • Admission control: Non-blocking TryAcquire pattern correctly fails fast with HTTP 503 when capacity is reached, preventing memory exhaustion
  • Semaphore lifecycle: All permits are properly acquired and released, including error paths and the fileAppearedReadback helper
  • Retry logic: Well-designed exponential backoff (250ms→5s, max 6 attempts) with Retry-After header support
  • Context handling: Proper cancellation checks (ctxCheckEvery=256) in writeChunkToWriter prevent holding memory when clients disconnect
  • Test coverage: Comprehensive unit tests (7 tests covering success, retry, exhaustion, Retry-After, non-503, context cancel, edge cases)
  • Documentation: Clear godoc explaining concurrency model, ownership semantics, and behavior changes
  • Defensive checks: pendingCap prevents scheduler regressions from causing unbounded memory growth

The PR description provides excellent production evidence, clear risk analysis, and detailed rollout notes.

@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-830 (b4b32dc)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.659µ 1.660µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.56n 61.92n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.60n 62.22n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.66n 61.86n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.38n 30.81n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 51.90n 52.33n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 106.3n 106.5n ~ 1.000
MiningCandidate_Stringify_Short-4 263.6n 268.4n ~ 0.100
MiningCandidate_Stringify_Long-4 1.884µ 1.877µ ~ 0.300
MiningSolution_Stringify-4 969.2n 972.3n ~ 0.400
BlockInfo_MarshalJSON-4 1.744µ 1.742µ ~ 1.000
NewFromBytes-4 126.9n 126.0n ~ 1.000
Mine_EasyDifficulty-4 67.62µ 67.18µ ~ 0.700
Mine_WithAddress-4 6.894µ 7.002µ ~ 0.100
BlockAssembler_AddTx-4 0.02833n 0.02798n ~ 1.000
AddNode-4 11.30 11.85 ~ 0.400
AddNodeWithMap-4 11.55 11.23 ~ 0.400
DirectSubtreeAdd/4_per_subtree-4 62.57n 58.13n ~ 0.200
DirectSubtreeAdd/64_per_subtree-4 31.64n 28.55n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 30.70n 27.22n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.17n 26.10n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 28.76n 25.82n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 288.2n 276.2n ~ 0.200
SubtreeProcessorAdd/64_per_subtree-4 272.8n 272.0n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 275.1n 274.1n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 269.8n 267.9n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 266.0n 266.8n ~ 0.700
SubtreeProcessorRotate/4_per_subtree-4 272.2n 270.9n ~ 1.000
SubtreeProcessorRotate/64_per_subtree-4 272.4n 269.8n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 268.5n 272.6n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 268.4n 274.4n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 53.78n 53.85n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 34.16n 34.33n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 33.23n 33.47n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.54n 32.61n ~ 0.700
SubtreeCreationOnly/4_per_subtree-4 112.6n 112.2n ~ 1.000
SubtreeCreationOnly/64_per_subtree-4 393.8n 392.9n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.312µ 1.352µ ~ 0.200
SubtreeCreationOnly/1024_per_subtree-4 4.374µ 4.482µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 7.783µ 8.111µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 267.9n 271.4n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 269.5n 270.6n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 789.9µ 815.1µ ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 1.331m 1.572m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 6.686m 6.687m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 13.46m 13.40m ~ 1.000
SequentialGetAndSetIfNotExists/1k_nodes-4 661.4µ 649.3µ ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 2.804m 2.901m ~ 0.200
SequentialGetAndSetIfNotExists/50k_nodes-4 10.37m 10.50m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 20.03m 19.94m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 632.4µ 841.6µ ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.151m 4.329m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.71m 16.65m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 688.4µ 686.7µ ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 5.651m 5.664m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 37.88m 37.46m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.167µ 3.935µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.768µ 3.593µ ~ 0.100
DiskTxMap_ExistenceOnly-4 464.7n 339.7n ~ 0.700
Queue-4 194.5n 200.2n ~ 0.100
AtomicPointer-4 4.531n 4.581n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 856.3µ 897.9µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 833.9µ 860.6µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 111.0µ 112.6µ ~ 0.400
ReorgOptimizations/AllMarkFalse/New/10K-4 62.38µ 62.26µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 68.88µ 65.66µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/10K-4 11.27µ 11.66µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/10K-4 5.529µ 5.790µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 1.862µ 1.912µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.17m 12.23m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.27m 10.58m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.141m 1.143m ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 685.4µ 690.5µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 641.1µ 779.6µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 313.6µ 309.9µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 56.21µ 61.69µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 19.65µ 20.96µ ~ 0.100
TxMapSetIfNotExists-4 51.67n 51.98n ~ 0.600
TxMapSetIfNotExistsDuplicate-4 38.16n 38.64n ~ 0.700
ChannelSendReceive-4 609.6n 604.4n ~ 0.200
CalcBlockWork-4 472.3n 472.8n ~ 1.000
CalculateWork-4 624.9n 641.9n ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.456µ 1.329µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_100-4 12.36µ 14.90µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 123.0µ 124.6µ ~ 0.400
CatchupWithHeaderCache-4 104.4m 104.4m ~ 1.000
_BufferPoolAllocation/16KB-4 4.431µ 3.286µ ~ 0.100
_BufferPoolAllocation/32KB-4 8.310µ 8.574µ ~ 1.000
_BufferPoolAllocation/64KB-4 15.74µ 17.17µ ~ 0.700
_BufferPoolAllocation/128KB-4 32.27µ 27.51µ ~ 0.100
_BufferPoolAllocation/512KB-4 108.4µ 115.0µ ~ 0.200
_BufferPoolConcurrent/32KB-4 17.72µ 18.71µ ~ 0.100
_BufferPoolConcurrent/64KB-4 27.36µ 30.32µ ~ 0.100
_BufferPoolConcurrent/512KB-4 140.2µ 147.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 667.2µ 626.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 660.6µ 628.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 650.4µ 631.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 657.4µ 621.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 690.9µ 637.9µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 35.20m 35.70m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 35.29m 35.69m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 35.23m 35.55m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.00m 35.14m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 34.57m 34.90m ~ 0.100
_PooledVsNonPooled/Pooled-4 738.8n 735.4n ~ 0.700
_PooledVsNonPooled/NonPooled-4 7.410µ 6.753µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.759µ 7.161µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.63µ 10.36µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.27µ 10.53µ ~ 0.100
_prepareTxsPerLevel-4 403.8m 400.0m ~ 0.400
_prepareTxsPerLevelOrdered-4 4.085m 4.056m ~ 1.000
_prepareTxsPerLevel_Comparison/Original-4 405.2m 401.2m ~ 1.000
_prepareTxsPerLevel_Comparison/Optimized-4 3.622m 4.006m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.253m 1.247m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 295.1µ 296.8µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 71.18µ 72.08µ ~ 0.200
SubtreeSizes/10k_tx_256_per_subtree-4 17.61µ 17.79µ ~ 0.600
SubtreeSizes/10k_tx_512_per_subtree-4 8.762µ 8.760µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.350µ 4.361µ ~ 0.600
SubtreeSizes/10k_tx_2k_per_subtree-4 2.147µ 2.160µ ~ 0.400
BlockSizeScaling/10k_tx_64_per_subtree-4 69.48µ 69.47µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 17.46µ 17.24µ ~ 0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.342µ 4.294µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 367.8µ 366.2µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 86.99µ 88.59µ ~ 0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.74µ 21.43µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 149.5µ 148.3µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 160.4µ 158.2µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 305.3µ 308.6µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 8.750µ 8.877µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.260µ 9.310µ ~ 0.400
SubtreeAllocations/medium_subtrees_full_validation-4 17.50µ 17.26µ ~ 0.200
SubtreeAllocations/large_subtrees_exists_check-4 2.088µ 2.077µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 2.209µ 2.195µ ~ 0.600
SubtreeAllocations/large_subtrees_full_validation-4 4.318µ 4.292µ ~ 0.200
StoreBlock_Sequential/BelowCSVHeight-4 335.1µ 325.2µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 335.7µ 327.4µ ~ 0.700
GetUtxoHashes-4 255.3n 258.9n ~ 0.400
GetUtxoHashes_ManyOutputs-4 44.57µ 44.65µ ~ 1.000
_NewMetaDataFromBytes-4 240.1n 238.8n ~ 1.000
_Bytes-4 630.9n 624.2n ~ 0.100
_MetaBytes-4 576.6n 562.5n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-08 08:21 UTC

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Prod-driven design (66k goroutines at OOM, getTxs fan-out amplifier identified precisely), layered fix — non-blocking TryAcquire on a new dedicated semSubtreeDataCreate separate from the reader sem, 503+Retry-After on saturation, typed ErrServiceUnavailable for caller errors.Is, 503-only retry with exponential backoff + Retry-After honoring, pending chunk-map cap, ctx-every-256-txs in the writer. Each layer has a clear purpose.

Three confirmations before merge:

  • test CI job is failing on the latest run while everything else (smoketest, sequential-{sqlite,postgres,aerospike}, lint, 14 benches) is green. The PR adds 7 new tests in util/http_test.go so worth confirming this is the same flake hitting other PRs this week vs a real failure.
  • "Deploy to dev-scale-1 and verify peer-side recovers from saturation" is unchecked in the PR's verification list. Given retry-storm is correctly identified in the Risks section, the load test is the proof-point.
  • This targets feat/teranode-native-ops, not main. Worth a one-line note in the PR description so the next reader doesn't expect it on main and so the eventual main-port stays tracked.

@icellan icellan self-assigned this Jun 10, 2026
@icellan

icellan commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Closing as already integrated.

This PR's single commit (3567cc05d) has the identical patch-id as commit fdaeadb7e on the base branch feat/teranode-native-ops — it is the same change, already merged. The base has since evolved further with superseding fixes on top of it:

A rebase onto the current base produces zero unique commits (the base is a strict superset of this PR), so there is nothing left to merge.

@icellan icellan closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants