Skip to content

Stream asset subtree responses to reduce memory use#827

Merged
icellan merged 3 commits into
mainfrom
fix/asset-subtree-streaming-memory
May 15, 2026
Merged

Stream asset subtree responses to reduce memory use#827
icellan merged 3 commits into
mainfrom
fix/asset-subtree-streaming-memory

Conversation

@icellan

@icellan icellan commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Stream subtree binary/hex responses without buffering all node hashes.
  • Stream paginated subtree JSON and subtree transaction JSON from subtree files without loading full subtree data.
  • Rework subtree_data generation to keep bounded chunks in flight and write full serialized transaction bytes directly.
  • Add safer asset concurrency defaults and skip gzip for large binary asset responses.

Tests

  • go test ./services/asset/httpimpl -run TestGetSubtree
  • go test ./services/asset/repository -run 'TestSubtreeNodeHashesReader|TestGetSubtreeNodesPage|TestGetSubtreePage'
  • make lint

Notes

  • make test was not run locally; CI will run the full target.

@icellan icellan self-assigned this May 7, 2026
@icellan icellan requested review from oskarszoon and teracoder May 7, 2026 16:47
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Review Summary:

The streaming implementation effectively addresses memory usage for large subtrees. The approach is sound and test coverage is good. One minor issue found with gzip skipper logic.

Issue Found:

Previously Reported (Now Fixed):

  • ✅ Pagination memory issue - Fixed by skipping conflicting nodes for partial pages

Comment thread services/asset/repository/subtree_stream.go
Comment thread services/asset/repository/subtree_stream.go Dismissed
Comment thread services/asset/repository/subtree_stream.go Dismissed
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-827 (fd494a4)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.946µ 1.710µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.47n 61.55n ~ 0.200
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.40n 61.73n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.48n 61.68n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.21n 31.22n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 52.15n 53.41n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 107.9n 107.3n ~ 1.000
MiningCandidate_Stringify_Short-4 263.5n 261.9n ~ 0.700
MiningCandidate_Stringify_Long-4 1.912µ 1.881µ ~ 0.100
MiningSolution_Stringify-4 986.1n 973.9n ~ 0.100
BlockInfo_MarshalJSON-4 1.783µ 1.750µ ~ 0.100
NewFromBytes-4 128.9n 129.3n ~ 1.000
Mine_EasyDifficulty-4 61.34µ 60.91µ ~ 0.700
Mine_WithAddress-4 6.728µ 6.694µ ~ 0.100
BlockAssembler_AddTx-4 0.02732n 0.02766n ~ 0.700
AddNode-4 10.58 10.62 ~ 1.000
AddNodeWithMap-4 10.38 10.37 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 62.41n 59.54n ~ 0.400
DirectSubtreeAdd/64_per_subtree-4 29.56n 29.45n ~ 0.400
DirectSubtreeAdd/256_per_subtree-4 27.85n 27.76n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 26.48n 26.48n ~ 0.800
DirectSubtreeAdd/2048_per_subtree-4 26.12n 26.07n ~ 0.700
SubtreeProcessorAdd/4_per_subtree-4 282.5n 282.4n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 276.3n 274.8n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 274.9n 275.1n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 267.3n 268.0n ~ 0.300
SubtreeProcessorAdd/2048_per_subtree-4 265.2n 270.6n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 269.9n 269.9n ~ 0.800
SubtreeProcessorRotate/64_per_subtree-4 269.4n 268.0n ~ 0.400
SubtreeProcessorRotate/256_per_subtree-4 270.2n 269.2n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 271.8n 267.5n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.41n 55.36n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 36.19n 36.10n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 35.05n 35.08n ~ 1.000
SubtreeNodeAddOnly/1024_per_subtree-4 34.46n 34.51n ~ 0.500
SubtreeCreationOnly/4_per_subtree-4 110.4n 110.3n ~ 0.600
SubtreeCreationOnly/64_per_subtree-4 347.6n 344.2n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.216µ 1.302µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 3.755µ 3.993µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 6.907µ 7.202µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 269.0n 270.7n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 268.0n 270.5n ~ 0.300
ParallelGetAndSetIfNotExists/1k_nodes-4 547.5µ 542.5µ ~ 0.400
ParallelGetAndSetIfNotExists/10k_nodes-4 1.397m 1.320m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 6.783m 6.660m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 13.49m 13.41m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 629.6µ 620.4µ ~ 0.200
SequentialGetAndSetIfNotExists/10k_nodes-4 2.904m 2.951m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 11.21m 11.13m ~ 0.700
SequentialGetAndSetIfNotExists/100k_nodes-4 21.45m 21.40m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 599.5µ 589.5µ ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.571m 4.553m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.94m 16.62m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 672.6µ 663.5µ ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.627m 6.204m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.63m 40.08m ~ 0.100
DiskTxMap_SetIfNotExists-4 3.934µ 3.894µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.740µ 3.738µ ~ 0.700
DiskTxMap_ExistenceOnly-4 311.2n 310.2n ~ 0.700
Queue-4 201.5n 200.7n ~ 0.800
AtomicPointer-4 8.120n 8.127n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 779.2µ 769.4µ ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4 757.8µ 754.8µ ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/10K-4 112.9µ 118.4µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 58.17µ 58.20µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 63.45µ 72.99µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.78µ 11.78µ ~ 0.300
ReorgOptimizations/NodeFlags/Old/10K-4 5.312µ 6.276µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.779µ 1.793µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.19m 10.08m ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.611m 10.336m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.131m 1.146m ~ 0.400
ReorgOptimizations/AllMarkFalse/New/100K-4 723.8µ 723.5µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/100K-4 584.5µ 579.2µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 307.4µ 307.7µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 54.47µ 53.80µ ~ 1.000
ReorgOptimizations/NodeFlags/New/100K-4 18.97µ 17.92µ ~ 0.100
TxMapSetIfNotExists-4 49.56n 49.39n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 43.34n 43.18n ~ 1.000
ChannelSendReceive-4 681.8n 663.2n ~ 0.100
CalcBlockWork-4 497.6n 500.4n ~ 0.100
CalculateWork-4 699.5n 676.5n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.308µ 1.312µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_100-4 12.48µ 12.43µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 152.3µ 158.5µ ~ 0.700
CatchupWithHeaderCache-4 104.7m 104.7m ~ 1.000
_prepareTxsPerLevel-4 404.9m 404.5m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.479m 3.681m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 408.8m 403.3m ~ 0.400
_prepareTxsPerLevel_Comparison/Optimized-4 3.651m 3.554m ~ 0.700
SubtreeSizes/10k_tx_4_per_subtree-4 1.290m 1.273m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 303.1µ 301.9µ ~ 0.700
SubtreeSizes/10k_tx_64_per_subtree-4 72.22µ 72.38µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 18.24µ 17.89µ ~ 0.200
SubtreeSizes/10k_tx_512_per_subtree-4 8.970µ 8.828µ ~ 0.200
SubtreeSizes/10k_tx_1024_per_subtree-4 4.402µ 4.387µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.196µ 2.198µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 69.64µ 69.67µ ~ 1.000
BlockSizeScaling/10k_tx_256_per_subtree-4 17.60µ 17.43µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.352µ 4.337µ ~ 0.800
BlockSizeScaling/50k_tx_64_per_subtree-4 372.9µ 368.6µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 88.63µ 87.62µ ~ 0.200
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.93µ 21.47µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 148.9µ 151.7µ ~ 0.400
SubtreeAllocations/small_subtrees_data_fetch-4 161.5µ 158.3µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 305.7µ 310.4µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 8.835µ 8.894µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.413µ 9.258µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 17.69µ 17.45µ ~ 0.200
SubtreeAllocations/large_subtrees_exists_check-4 2.086µ 2.110µ ~ 1.000
SubtreeAllocations/large_subtrees_data_fetch-4 2.242µ 2.231µ ~ 1.000
SubtreeAllocations/large_subtrees_full_validation-4 4.386µ 4.339µ ~ 0.200
_BufferPoolAllocation/16KB-4 3.930µ 3.711µ ~ 0.700
_BufferPoolAllocation/32KB-4 7.486µ 8.633µ ~ 0.700
_BufferPoolAllocation/64KB-4 15.61µ 17.80µ ~ 0.100
_BufferPoolAllocation/128KB-4 32.77µ 30.22µ ~ 0.200
_BufferPoolAllocation/512KB-4 121.5µ 132.2µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.36µ 18.54µ ~ 0.100
_BufferPoolConcurrent/64KB-4 28.68µ 29.03µ ~ 0.200
_BufferPoolConcurrent/512KB-4 176.6µ 185.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 707.5µ 728.8µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/32KB-4 702.4µ 724.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 695.3µ 723.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 683.7µ 723.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 702.8µ 732.1µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 38.48m 38.32m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 38.31m 38.27m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4 38.02m 38.05m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/128KB-4 38.30m 38.00m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 38.29m 38.19m ~ 0.100
_PooledVsNonPooled/Pooled-4 820.6n 822.4n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.167µ 7.200µ ~ 1.000
_MemoryFootprint/Current_512KB_32concurrent-4 8.967µ 9.295µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.85µ 12.48µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 11.57µ 11.60µ ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 309.9µ 316.2µ ~ 0.400
StoreBlock_Sequential/AboveCSVHeight-4 312.5µ 315.9µ ~ 0.400
GetUtxoHashes-4 264.0n 253.0n ~ 0.400
GetUtxoHashes_ManyOutputs-4 42.79µ 42.67µ ~ 0.700
_NewMetaDataFromBytes-4 237.5n 236.3n ~ 0.500
_Bytes-4 618.6n 615.1n ~ 0.600
_MetaBytes-4 569.0n 561.0n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-13 10:15 UTC

@icellan icellan requested review from ordishs and removed request for teracoder May 13, 2026 10:00
case "subtree_data":
return true
case "subtree":
tail := parts[i+1:]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential index out of bounds: When part == subtree, the code accesses parts[i+1:] without checking if i+1 < len(parts). If the path ends with /subtree, this could panic. Suggested bounds check: if i+1 >= len(parts) { return false }

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
74.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Memory savings verified — binary/hex streaming drops peak from ~48 MB (full 1M-node deserialise) to ~32 KB (bufio buffer), and GetSubtreeData drops from ~250 MB to a 5–7 MB constant. Streaming impl is clean (pull-mode io.ReadCloser for binary/hex, io.Pipe + io.MultiWriter for GetSubtreeData; back-pressure correct, all goroutine spawn bounded). The pagination fix in b08a88e3 (only read conflicting nodes when pageCoversFullSubtree==true) addresses the original 1M-node-scan concern correctly. DoS posture is net-positive — four concurrency keys flip from unlimited (0) to bounded 2/2/4/2, with a 30 s fallback timeout on semaphore acquisition.

A few things worth confirming or following up on before merge:

  1. FeeHash regression in /subtree/:hash/json? The new readSubtreePageFromReader returns a Subtree without populating FeeHash, but the handler still serialises &subtree.FeeHash. JSON consumers will see zero-value FeeHash. If the old code also returned zero here, no regression; if not, this needs restoring. Could you confirm?

  2. Offset-wrap inconsistency between the two readers. readSubtreePageFromReader:158 silently wraps an out-of-range offset back to 0 (encoded by TestGetSubtreePage_ResetsOffsetPastEnd), while the companion readSubtreeNodesPageFromReader:101 returns an empty page for the same case. Pick one and document in OpenAPI — wrapping to 0 will surprise a paginated client.

  3. settings.conf doc mismatch (ops hazard). The conf documents these keys as commented-out at = 0 (unlimited), but the code defaults are now ConcurrencyGetSubtreeData=2, ConcurrencyGetSubtreeTransactions=2, ConcurrencyGetSubtreeDataReader=4, SubtreeDataStreamingConcurrency=2 (was 4). An operator reading settings.conf will believe the defaults are unlimited. Worth updating the comments + a release-note line about the silent flip for deployments that didn't explicitly set these.

  4. Benchmark gap. Asset streaming benchmarks are gated behind RUN_SUBTREE_BENCHMARK=true and t.Skip() in CI — so the memory savings claim isn't machine-verified by anything that runs on PR. The math checks out (manually traced) but a regression in 6 months would be silent. Recommend adding at least one un-gated testing.B for the streaming path with an AllocsPerOp check.

Two CodeQL findings (subtree_stream.go:118 and :178) are false positives — pageSize is provably bounded to ≤ 100 via two redundant clamps (helpers.go:107-109 HTTP layer + subtree_stream.go:86-88, 142-144 repository defense-in-depth), max allocation ~4.8 KB per request. Safe to dismiss in the Security tab with that rationale rather than adding dead defensive code.

One operational note worth surfacing: the gzip-skip middleware (correct decision — binary hash/tx payloads have near-zero compressibility and gzip would re-introduce full buffering) creates an uncompressed bandwidth amplification surface on these three endpoints, all unauthenticated GETs. Defense relies on (a) ban-list middleware, (b) the new per-method semaphores, (c) upstream reverse-proxy rate limiting. Worth documenting the reverse-proxy ACL/rate-limit requirement explicitly in deployment notes; per-IP rate limiting in the asset service itself would be a defense-in-depth follow-up.

Side note (cosmetic, not blocking): scheduleSubtreeChunkFetches at GetLegacyBlock.go:339-365 masks the real worker error with context.Canceled on return when a worker fails. The real error still reaches the caller via the resultsChan path (consumer loop at :295-300), so no behaviour bug — but confusing to read in isolation. Prefer waitErr over readErr when both are set.

Also worth refuting the claude[bot] inline finding on http.go:435: the parts[i+1:] slice access cannot panic — Go allows slicing to len, and the len(tail)==1/2 guards prevent tail[1] OOB. Safe as written.

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend addressing:

  1. Coverage gap (SonarQube quality gate).
  2. Run make test per repo guardrails.
  3. Document the ConflictingNodes paginated-response change in the PR / changelog.
  4. Trivial: clean up unreachable return nil in GetSubtree.go:190.

@icellan icellan merged commit 7eb1aa6 into main May 15, 2026
30 checks passed
@icellan icellan deleted the fix/asset-subtree-streaming-memory branch May 15, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants