fix(legacy/netsync): cap orphan tx pool size to prevent memory DoS#776
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: This PR adds a hard cap to the orphan transaction pool to prevent memory exhaustion DoS attacks. The implementation is well-structured with comprehensive tests. Minor Observation: One design choice worth noting: When an existing key is updated via Impact: A repeatedly updated orphan transaction will never be evicted based on age, only via TTL expiry. For the orphan pool use case, this may be acceptable since orphan transactions are typically write-once. However, if other parts of the codebase use The existing test No action required unless the intended semantics are "evict by insertion time" rather than "evict by modification time." |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-22 13:31 UTC |
oskarszoon
left a comment
There was a problem hiding this comment.
Follow-up after closer look — approval still stands, but flagging one concern that's worth addressing before merge, plus a couple of follow-ups.
Concern (worth fixing before merge): cap-eviction holds the write lock across a synchronous validationClient.Validate gRPC call per insert at cap. Under sustained orphan flood, the lock is held one gRPC round-trip per evict, which stalls:
sm.orphanTxs.Len()from the Prometheus metric goroutine (every 5s)processOrphanTransactions→sm.orphanTxs.Items()snapshot
TTL clean() has the same antipattern but runs every 10 min; cap-evict runs per insert at cap, potentially hundreds-to-thousands/sec under attack.
Fix is short: run the eviction function outside m.mu — find + delete oldest under the lock, capture the value, drop the lock, then call evictionFn on the captured value. Or make orphan eviction-fn push to a buffered channel + worker.
Worth a follow-up issue (not blocking):
MaxOrphanTxSizeis commented out insettings.go:96— 100 × 10 MB ≈ 1 GB worst case. Bitcoin Core's equivalent is ~10 MB (per-entry size also capped).- Shared cap = one peer flooding 100+ orphans can displace all honest peers' pending orphans (oldest-first eviction). No per-peer accounting.
- Negative
legacy_maxOrphanTxssilently disables the cap (WithMaxSizegates on> 0) — should warn or clamp. - Other unbounded
expiringmapcallers worth a separate audit:blockvalidation.blockExistsCache(120min TTL, peer-driven),subtreevalidation.invalidSubtreeDeDuplicateMap(peer-driven).
Test gaps: cap=1 boundary, deterministic-ordering test (current relies on time.Sleep(2ms) between inserts), assertion that the correct oldest key is evicted.
|



Closes #4575.
Summary
The orphan transaction pool (
SyncManager.orphanTxs) was anexpiringmapconfigured with only a TTL — no max-entry cap. A peer flooding unique orphan txs faster than the TTL eviction rate would cause unbounded memory growth.Fix
Two layers:
util/expiringmap: newWithMaxSize(n int)option enforces a hard cap with oldest-first eviction on insert. ExistingWithEvictionChannel/WithEvictionFunctionstill fire on cap eviction. Default (noWithMaxSize) preserves existing unbounded behaviour for backwards compatibility.services/legacy/netsync/manager.go: orphan pool wired with cap from newLegacySettings.MaxOrphanTxs(default 100, matches Bitcoin Core's historical default).Test plan
expiringmap.WithMaxSizeunit tests: cap enforced, update doesn't trigger eviction, eviction-channel fires, eviction-function fires (and can veto), cap=0 unbounded, no-cap default.SyncManagerwires it with cap=5, assertLen() == 5.tSettings.Legacy.MaxOrphanTxs > 0so the cap is enforced out-of-the-box.go test -race ./util/expiringmap/...andgo test -race ./services/legacy/netsync/...pass.expiringmap(blockvalidation,s3,subtreevalidation,blockpersister/utxoset/model) are unaffected — they pass-raceand continue to default to unbounded.