feat(asset): on-demand subtree-data admission control + 503/Retry-After#896
Conversation
/subtree_data on-demand creations are now gated by a non-blocking TryAcquire on a dedicated semaphore (asset_concurrency_subtree_data_create, default 4). At capacity the handler returns HTTP 503 immediately instead of queueing — each in-flight create holds chunk-sized tx metadata in memory plus pipe buffers, so an unbounded queue here is the difference between graceful 503 and OOM. Callers that fetch subtree_data over HTTP (subtree validation, block validation peer fetches) now use DoHTTPRequestBodyReaderWithRetry, which honors Retry-After on 503 with a bounded retry budget. Other errors fail through immediately. Surfaced as a follow-up split out of PR bsv-blockchain#828; reviewable on its own.
|
🤖 Claude Code Review Status: Complete Current Review: ✅ No critical issues found The PR implements admission control and retry logic correctly. The code is well-tested with comprehensive test coverage. One minor observation:
Previous inline comment resolved:
Summary:
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-19 13:42 UTC |
…c validation time.ParseDuration silently accepts fractional, signed and unit-suffixed inputs that aren't valid RFC 7231 delta-seconds (e.g. "1.5s", "1m", "-5s"). Switch to strconv.Atoi so anything that isn't a plain non-negative integer falls through to "no retry hint" instead of being silently reinterpreted. Extends TestParseRetryAfter with HTTP-date, fractional, unit-suffix and whitespace cases.
|
Addressed in af7fb36: |
oskarszoon
left a comment
There was a problem hiding this comment.
Clean split out of #828, ready to land.
Semaphore + 503 design verified: acquire/release balance correct across all exit paths in GetSubtreeDataReader, setting wired at construction (not per-request), nil semaphore (value=0) correctly treated as unlimited. parseRetryAfter strictly numeric — HTTP-date / fractions / units / whitespace all rejected and tested. Retry helper budget ≤30s wall with ctx cancellation honored on both sleep and in-flight request. All three callers (blockvalidation fetchSubtreeDataFromPeer, subtreevalidation getSubtreeMissingTxs, check_block_subtrees) use the retry correctly; the blockvalidation peer-fallback model is compatible. 1953 tests pass -race.
Memory math: writeTransactionsViaSubtreeStoreStreaming streams 10k-tx chunks (not the full block), so the real worst-case at default concurrency=4 is ~40 MB, not "block_size × 4". No operator concern at the default.
Two doc-only follow-ups worth noting in the PR body for operators:
Retry-After: 1is a hardcoded fixed value with no jitter. Under burst-503, all callers retry in lockstep 1s later; convergence under heavy contention takesceil(N/concurrency)retry windows. Bounded by the semaphore, but jitter would smooth it.- During catchup, up to 28 of blockvalidation's 32 concurrent subtree-fetch goroutines can stall up to ~5s when the asset is at create-capacity. The conscious OOM-vs-latency trade-off is the right call but isn't called out in the PR body.
Both are documentation items, not blocking.
|
Brings in fff6d3e..origin/main (4 commits since previous merge): - #897 blockassembly columnar->row fallback on Unimplemented - #896 asset on-demand subtree-data admission control + 503/Retry-After - #766 utxo: verify spend ownership in Unspend - #700 health: disk space check Conflict resolutions: - services/blockassembly/Client.go, util/http.go, util/http_test.go: take theirs. Main has the post-Copilot-fix versions of these files from #896 and #897 (parseRetryAfter strict-Atoi, debug log on columnar fallback, extended TestParseRetryAfter cases) — they supersede the pre-fix versions still living on pr-828. - stores/utxo/aerospike/un_spend.go: take ours (native-op call, no SpendingData arg). The ownership check #766 added to the UDF path requires a coordinated update to the BSV-forked aerospike- server's subOpUnspend dispatcher. Until that lands, the native-op path on this branch skips ownership verification. Tracking issue filed.


Split out of #828 so it can be reviewed independently.
Summary
asset_concurrency_subtree_data_create(default 4) caps simultaneous on-demand subtree-data file creations triggered by/subtree_datawhen the file does not yet exist.TryAcquireinGetSubtreeDataReader: at capacity the handler returns HTTP 503 immediately instead of queueing. Each in-flight create holds chunk-sized tx metadata in memory plus pipe buffers; an unbounded queue here trades graceful 503 for OOM.util.DoHTTPRequestBodyReaderWithRetry+parseRetryAfterhonor theRetry-Afterheader with a bounded retry budget. Callers in subtree validation (getSubtreeMissingTxs,CheckBlockSubtrees) and block validation (get_blocks.go) switched to the retry helper. Other errors fail through immediately — only 503 is retried.Test plan
go build ./...cleango vet ./services/asset/... ./services/subtreevalidation/... ./services/blockvalidation/... ./util/...cleanmake lint-newcleanutil/http_test.gocover the retry path and Retry-After parsing