Summary
blockvalidation OOMs during testnet catch-up sync around block ~5,000. Captured heap profile at 13.17 GiB RSS shows ~70% of inuse heap in go-bt tx/output decode hot paths. Container restarted (RestartCount=1) seconds after the peak snapshot, consistent with OOM kill.
Environment
Evidence
Heap snapshot captured at 2026-05-21T15:07:57Z, RSS = 13.17 GiB at sample time. Container restart timestamp: 2026-05-21T15:09:53Z.
Top of 8.29 GB inuse:
| % |
MB |
Function |
| 27.3% |
2264 |
go-bt/v2.(*Output).ReadFrom (output.go:51) |
| 20.9% |
1731 |
go-bt/v2/bscript.NewFromBytes (script.go:43) |
| 13.7% |
1138 |
go-bt/v2.(*Tx).ReadFrom (tx.go:198) |
| 8.6% |
716 |
go-bt/v2.(*Tx).ReadFrom (tx.go:205) |
| 7.8% |
648 |
runtime.mallocgc (uncategorized) |
| 7.8% |
642 |
bytes.growSlice (bytes/buffer.go:267) |
| 6.3% |
524 |
go-bt/v2.(*Tx).toBytesHelper (tx.go:540) |
(go-bt/v2@v2.6.3)
RSS trajectory (sawtooth)
| UTC |
RSS (MiB) |
Reason |
| 14:08:42 |
2704 |
first-cross |
| 14:20:21 |
5620 |
cooldown |
| 14:25:49 |
5030 |
cooldown |
| 14:31:18 |
3219 |
cooldown |
| 14:36:48 |
3164 |
cooldown |
| 14:39:41 |
3746 |
delta |
| 14:44:39 |
5274 |
delta |
| 14:50:07 |
2939 |
cooldown |
| 14:54:01 |
3481 |
delta |
| 14:59:29 |
3126 |
cooldown |
| 15:03:56 |
8022 |
delta |
| 15:07:52 |
13168 |
delta |
| ~15:09:53 |
OOM kill |
container restart |
Sawtooth (climbing → GC-reclaiming → climbing) → allocation churn, not a leak. But the peak crosses cgroup limit before GC can intervene.
Suspected cause
The Output.ReadFrom / bscript.NewFromBytes / Tx.ReadFrom triad allocates fresh []byte per script per output per tx. On historical mainnet blocks containing very large outputs (OP_RETURN data, large script bodies), the per-decode allocation outpaces GC.
Looks structurally similar to the wire-side issue addressed by #885 (per-payload []byte allocation in go-wire.ReadMessageWithEncodingN), but on the validation side via go-bt. The streaming fix in #885 helps legacy ingestion; this hot path is hit during blockvalidation's parse of subtrees / txs received via gRPC + the catchup pipeline, so #885's win does not apply here.
Potential fix directions
Likely needs work in go-bt (similar to the go-wire follow-ups planned):
- Pool the per-output script
[]byte buffers across a block's tx decode.
- Replace the per-tx
toBytesHelper re-serialization (524 MB at peak — needed only for hashing?) with a streaming hash or a reusable buffer.
- Audit
bscript.NewFromBytes for unnecessary copies — if the source buffer is already owned, wrap rather than copy.
Repro
- Point a Teranode node at testnet, fresh state.
- Let it catch up through block ~5,000.
- Watch
blockvalidation RSS — expect a sawtooth pattern crossing 13 GiB.
Captured artifacts (local, not attached)
Heap-raw + goroutines + CPU profile for the 13 GiB peak and 14 other snapshots across the sawtooth available on request (probe/eu3-bv-2026-05-21/watcher/ in my local checkout).
Related
Summary
blockvalidationOOMs during testnet catch-up sync around block ~5,000. Captured heap profile at 13.17 GiB RSS shows ~70% of inuse heap ingo-bttx/output decode hot paths. Container restarted (RestartCount=1) seconds after the peak snapshot, consistent with OOM kill.Environment
bsva-ovh-teranode-ttn-eu-3v0.15.1-beta-3(commit857a1b18a— includes perf(legacy): stream-decode incoming block messages off the socket #885 legacy streaming fix)Evidence
Heap snapshot captured at 2026-05-21T15:07:57Z, RSS = 13.17 GiB at sample time. Container restart timestamp:
2026-05-21T15:09:53Z.Top of 8.29 GB inuse:
go-bt/v2.(*Output).ReadFrom(output.go:51)go-bt/v2/bscript.NewFromBytes(script.go:43)go-bt/v2.(*Tx).ReadFrom(tx.go:198)go-bt/v2.(*Tx).ReadFrom(tx.go:205)runtime.mallocgc(uncategorized)bytes.growSlice(bytes/buffer.go:267)go-bt/v2.(*Tx).toBytesHelper(tx.go:540)(
go-bt/v2@v2.6.3)RSS trajectory (sawtooth)
Sawtooth (climbing → GC-reclaiming → climbing) → allocation churn, not a leak. But the peak crosses cgroup limit before GC can intervene.
Suspected cause
The
Output.ReadFrom/bscript.NewFromBytes/Tx.ReadFromtriad allocates fresh[]byteper script per output per tx. On historical mainnet blocks containing very large outputs (OP_RETURN data, large script bodies), the per-decode allocation outpaces GC.Looks structurally similar to the wire-side issue addressed by #885 (per-payload
[]byteallocation ingo-wire.ReadMessageWithEncodingN), but on the validation side viago-bt. The streaming fix in #885 helpslegacyingestion; this hot path is hit duringblockvalidation's parse of subtrees / txs received via gRPC + the catchup pipeline, so #885's win does not apply here.Potential fix directions
Likely needs work in
go-bt(similar to the go-wire follow-ups planned):[]bytebuffers across a block's tx decode.toBytesHelperre-serialization (524 MB at peak — needed only for hashing?) with a streaming hash or a reusable buffer.bscript.NewFromBytesfor unnecessary copies — if the source buffer is already owned, wrap rather than copy.Repro
blockvalidationRSS — expect a sawtooth pattern crossing 13 GiB.Captured artifacts (local, not attached)
Heap-raw + goroutines + CPU profile for the 13 GiB peak and 14 other snapshots across the sawtooth available on request (
probe/eu3-bv-2026-05-21/watcher/in my local checkout).Related
legacystreaming block decoder (same class of issue, wire side)