feat: arena-backed tx decode + zero-alloc txid hashing#146
Conversation
Single-goroutine bump-pointer arena that amortises per-script []byte allocations across batches of tx decodes. Returned slices use 3-arg slicing so writes cannot overflow into subsequent allocations. MaxArenaAlloc capped at 1 GiB to admit legitimate large BSV transactions while rejecting obviously-bogus varint lengths. API is additive; no existing public surface changed.
The branch was guarded against len(slab) < pos+n, but both NewArena and the grow path use make([]byte, n) which always returns len==cap. Remove the dead branch and tighten the comment to document why the 3-arg slice is always in bounds. Also clarify ResetAndShrink doc for the maxKeep <= 0 case and add a test proving Reset reuses the same backing array.
Decodes Output using a caller-provided Arena for the locking-script []byte. Existing ReadFrom is unchanged. Rejects script lengths greater than MaxArenaAlloc (1 GiB) to prevent unbounded allocation from adversarial input.
Mirrors Input.ReadFrom / ReadFromExtended via an internal helper readFromWithArena. Pulls the unlocking-script byte slice (and the extended-format PreviousTxScript) from a caller-provided Arena. Existing ReadFrom / ReadFromExtended unchanged. Fixed-size fields (previousTxID, prevIndex, sequence, prevSatoshis) use stack arrays. Rejects script lengths exceeding MaxArenaAlloc.
a.Alloc(0) on a fresh arena returns a nil slice, breaking equivalence with the existing ReadFrom which uses make([]byte, 0) (non-nil empty). Add a sentinel branch and a zero-length-script fixture. Same pattern as Input.ReadFromWithArena.
Mirrors Tx.ReadFrom field-for-field with extended-format detection preserved. Routes per-script allocations through the caller-provided Arena via Input/Output ReadFromWithArena variants. Existing ReadFrom is unchanged. Verified equivalent on canonical tx fixtures and a 320 MB OP_RETURN payload.
Lets callers (e.g. teranode's catch-up stream readers) reuse a single scratch buffer across millions of txid hashes instead of allocating tx.Bytes() per call. Honours the existing SetTxHash cache: when cached, the scratch buffer is passed through unchanged and no allocation occurs.
Rejects script lengths greater than MaxArenaAlloc (1 GiB) in the existing Output.ReadFrom and Input.readFrom (unlocking + extended PreviousTxScript). Previously a malformed or adversarial varint claiming e.g. 4 GiB would cause an immediate make([]byte, 2^32) allocation and likely OOM. The cap remains loose enough to admit legitimate large BSV transactions (~320 MB).
Compares Tx.ReadFrom vs Tx.ReadFromWithArena for single-tx and 5000-tx subtree-style streams. Compares HashTxIDInto vs the existing TxIDChainHash. Benchmark results (Apple M3 Max, go test -benchmem -count=1): BenchmarkTx_ReadFrom_vs_WithArena/ReadFrom 1682888 702.2 ns/op 1096 B/op 44 allocs/op BenchmarkTx_ReadFrom_vs_WithArena/ReadFromWithArena 1948824 653.5 ns/op 696 B/op 39 allocs/op → -7% ns/op, -36% B/op, -11% allocs/op per tx BenchmarkSubtreeDecode_5kTxs/PerTxReadFrom 344 3407017 ns/op 5240099 B/op 215001 allocs/op BenchmarkSubtreeDecode_5kTxs/PerSubtreeArena 402 2982810 ns/op 3247887 B/op 190001 allocs/op → -12% ns/op, -38% B/op, -12% allocs/op per 5k-tx batch BenchmarkHashTxIDInto_vs_TxIDChainHash/TxIDChainHash 3291541 353.3 ns/op 608 B/op 2 allocs/op BenchmarkHashTxIDInto_vs_TxIDChainHash/HashTxIDInto 3892976 298.6 ns/op 0 B/op 0 allocs/op → -15% ns/op, -100% B/op, zero allocs
mrz1836
left a comment
There was a problem hiding this comment.
LGTM - CI checks are failing
…cation Output.ReadFrom, Input.readFrom (the helper backing ReadFrom and ReadFromExtended), and Tx.ReadFrom now delegate to their arena variants with a nil Arena. The arena variant checks for nil at script-allocation sites and falls back to make([]byte, l) — preserving byte-identical behaviour with the previous implementations. Net effects: - Removes ~140 lines of duplicated decode body - Stack-array optimisations (var satoshis [8]byte, etc.) now apply to all callers, including the existing ReadFrom paths - MaxArenaAlloc hardening guard now lives in a single place Addresses SonarQube 6.3% duplication-on-new-code gate.
Pulls the varint + cap + alloc-or-make + ReadFull pattern out of Output.ReadFromWithArena and the two script-reading blocks in Input.readFromWithArena into a single readArenaScript helper. Net effect: - ~60 lines of duplicated decode logic removed - Single source of truth for the MaxArenaAlloc cap message and the nil-arena fallback path - Error message labels are now consistent (lockingScript / unlockingScript / prevTxScript) across both cap and short-read paths Addresses SonarQube 3.2% duplication-on-new-code (gate is <= 3%).
- gofmt: input_test.go, output_test.go - misspell (UK -> US): amortise, serialise, serialisation across arena.go, tx.go, tx_arena_test.go, plus a sweep of remaining British spellings in the PR's touched files - prealloc: preallocate fixture-builder slices in input_test.go and output_test.go Restores the Code Quality gate on PR #146.
Addresses PR #146 reviewer flag: Clone/ShallowClone previously wrapped or pointer-shared script bytes from the original, which would silently corrupt clones if the original's scripts were arena-backed and the arena was reset. cloneScript helper produces an isolated heap-owned copy. Used in all three clone paths. The clone is now safe against arena resets and in-place mutation of script bytes — at the cost of one make+copy per script (negligible for typical txs; proportional to script size for large data-carrier txs, which is unavoidable for a real clone). Also: - arena.go: clarify Reset doc re: pre-grow slab survivors - arena.go: add 64-bit assumption note on MaxArenaAlloc - arena_test.go: cover Alloc(0) on zero-value arena - tx_arena_test.go: arena reset + overwrite isolation tests for both Clone and ShallowClone
- Collapse TestTx_Clone_IsolatesFromArena + TestTx_ShallowClone_IsolatesFromArena into a single table-driven TestTx_CloneVariants_IsolateFromArena. Eliminates ~80% of the duplicated test body that pushed SonarQube duplication-on-new-code to 8.4%. - tx_arena_test.go:98: serialisation -> serialization (misspell). - arena_test.go: TestArena_Alloc_ZeroOnZeroValueArena uses require.Empty per testifylint, not require.Len(..., 0).
There was a problem hiding this comment.
Pull request overview
This PR introduces an arena-backed decoding path for transactions to reduce per-script heap allocations during large-scale block validation, and adds a zero-allocation TxID hashing helper that can reuse a caller-provided scratch buffer. It also hardens script-length decoding by rejecting varint script lengths above MaxArenaAlloc (1 GiB).
Changes:
- Add
bt.Arenabump-pointer allocator and a sharedreadArenaScripthelper for script decoding with a 1 GiB per-script cap. - Add
ReadFromWithArenadecode variants forTx,Input(standard + extended), andOutputto route script byte allocations through the arena. - Add
Tx.HashTxIDInto(scratch)to compute txid without allocations when scratch is pre-sized.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
arena.go |
Adds the Arena allocator, MaxArenaAlloc, and shared readArenaScript (including oversize guards). |
tx.go |
Routes decoding through arena-aware Input/Output reads; adds HashTxIDInto; updates clone paths to deep-copy scripts. |
input.go |
Adds arena-backed read variants and routes script reads through readArenaScript (including oversize guards). |
output.go |
Adds arena-backed read variant and routes locking script reads through readArenaScript. |
tx_arena_test.go |
Adds equivalence/zero-alloc/caching tests for arena decoding and HashTxIDInto, plus arena-reset clone isolation test. |
input_test.go |
Adds fixtures and tests for arena decode equivalence and oversize script-length rejection (standard + extended). |
output_test.go |
Adds fixtures and tests for arena decode equivalence, reset invalidation behavior, and oversize rejection. |
arena_test.go |
Adds unit tests for Arena allocation/grow/reset/shrink semantics and limits. |
arena_benchmark_test.go |
Adds benchmarks comparing arena vs non-arena decoding and HashTxIDInto vs TxIDChainHash. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Tx.ReadFromWithArena, Output.ReadFromWithArena: docstrings now spell out that the arena-lifetime restriction only applies when a != nil; the nil-arena path is the heap-allocating implementation behind the standard ReadFrom. - Tx.HashTxIDInto: docstring no longer claims TxIDChainHash populates the cache (it doesn't — only SetTxHash does). - cloneScript: return a non-nil empty bscript.Script for zero-length source so deep-equality matches readArenaScript and make([]byte, 0). - Input.readFromWithArena: prevIndex error label corrected from previousTxID(4) to prevIndex(4).
- serialisation -> serialization (misspell, 2 occurrences) - Drop "Note:" prefix on the SetTxHash-cache clarification (godox flags it as a TODO-style marker). Same content, no prefix.
|



Summary
Adds an additive
Arenabump-pointer allocator and*.ReadFromWithArenadecode variants forTx,Input,Output, plus a zero-allocTx.HashTxIDInto. Targets the per-scriptmake([]byte, l)hot path responsible for ~70% of the heap during teranodeblockvalidationcatch-up sync (see bsv-blockchain/teranode#920).Also hardens the existing
Output.ReadFrom/Input.ReadFrompaths to reject script-length varints above 1 GiB (MaxArenaAlloc).What's in this PR
New public API (additive)
bt.Arena— bump-pointer allocator withNewArena,Alloc,Reset,ResetAndShrink,Cap,Used. Single-goroutine ownership. 1 GiB per-alloc cap. Returned slices use 3-arg slicing so writes cannot overflow into subsequent allocations.Output.ReadFromWithArena(r io.Reader, a *Arena) (int64, error)— Output decode routing locking-script bytes through the arena.Input.ReadFromWithArena+Input.ReadFromExtendedWithArena— Input decode (standard + extended formats) routing unlocking-script andPreviousTxScriptbytes through the arena.Tx.ReadFromWithArena— Tx decode mirroringReadFromfield-for-field, including extended-format detection. Routes every per-script allocation through the arena.Tx.HashTxIDInto(scratch []byte) (chainhash.Hash, []byte)— zero-alloc txid hashing with reusable scratch. Honours the existingSetTxHashcache.Hardening (existing API)
Output.ReadFrom,Input.ReadFrom,Input.ReadFromExtended(via the sharedreadFromhelper): reject script-length varints greater than 1 GiB. Previously a malformed or adversarial input could trigger an immediatemake([]byte, ~4 GiB)allocation.Backwards compatibility
All API additions are additive. Existing
ReadFrom,Bytes,TxIDChainHash,SerializeBytes,AppendBytessemantics are unchanged. The script-length hardening rejects inputs that would previously have allocated >1 GiB — strictly a safety improvement; no legitimate BSV transaction is affected.Benchmark results
Apple M3 Max, arm64:
Key deltas:
HashTxIDIntovsTxIDChainHash: -15% latency, zero allocations vs 608 B / 2 allocsNote: these micro-benchmarks measure CPU + per-decode allocations on a canonical 3-input 2-output tx with small scripts. The real-world win is on cumulative held heap during catch-up sync with large OP_RETURN-style payloads, where the arena bounds peak RSS by the slab size + worker count instead of growing linearly with tx count until GC. A 320 MB output round-trip is verified end-to-end by
TestTx_ReadFromWithArena_320MBOutput.Test plan
go test ./...— 649 tests passgo test -race ./...— cleango vet ./...— cleanFollow-up (separate PR, in teranode)
teranode
services/subtreevalidation/blockpersister/asset/repository/legacy/netsyncwill migrate hot-path decode loops to the new arena variants, bounded by per-subtree / per-block arena pools. Tracked at bsv-blockchain/teranode#920.