Skip to content

feat: arena-backed tx decode + zero-alloc txid hashing#146

Merged
oskarszoon merged 16 commits into
masterfrom
feat/arena-decode
May 21, 2026
Merged

feat: arena-backed tx decode + zero-alloc txid hashing#146
oskarszoon merged 16 commits into
masterfrom
feat/arena-decode

Conversation

@oskarszoon

Copy link
Copy Markdown

Summary

Adds an additive Arena bump-pointer allocator and *.ReadFromWithArena decode variants for Tx, Input, Output, plus a zero-alloc Tx.HashTxIDInto. Targets the per-script make([]byte, l) hot path responsible for ~70% of the heap during teranode blockvalidation catch-up sync (see bsv-blockchain/teranode#920).

Also hardens the existing Output.ReadFrom / Input.ReadFrom paths to reject script-length varints above 1 GiB (MaxArenaAlloc).

What's in this PR

New public API (additive)

  • bt.Arena — bump-pointer allocator with NewArena, Alloc, Reset, ResetAndShrink, Cap, Used. Single-goroutine ownership. 1 GiB per-alloc cap. Returned slices use 3-arg slicing so writes cannot overflow into subsequent allocations.
  • Output.ReadFromWithArena(r io.Reader, a *Arena) (int64, error) — Output decode routing locking-script bytes through the arena.
  • Input.ReadFromWithArena + Input.ReadFromExtendedWithArena — Input decode (standard + extended formats) routing unlocking-script and PreviousTxScript bytes through the arena.
  • Tx.ReadFromWithArena — Tx decode mirroring ReadFrom field-for-field, including extended-format detection. Routes every per-script allocation through the arena.
  • Tx.HashTxIDInto(scratch []byte) (chainhash.Hash, []byte) — zero-alloc txid hashing with reusable scratch. Honours the existing SetTxHash cache.

Hardening (existing API)

  • Output.ReadFrom, Input.ReadFrom, Input.ReadFromExtended (via the shared readFrom helper): reject script-length varints greater than 1 GiB. Previously a malformed or adversarial input could trigger an immediate make([]byte, ~4 GiB) allocation.

Backwards compatibility

All API additions are additive. Existing ReadFrom, Bytes, TxIDChainHash, SerializeBytes, AppendBytes semantics are unchanged. The script-length hardening rejects inputs that would previously have allocated >1 GiB — strictly a safety improvement; no legitimate BSV transaction is affected.

Benchmark results

Apple M3 Max, arm64:

BenchmarkTx_ReadFrom_vs_WithArena/ReadFrom              1682888   702.2 ns/op  1096 B/op  44 allocs/op
BenchmarkTx_ReadFrom_vs_WithArena/ReadFromWithArena     1948824   653.5 ns/op   696 B/op  39 allocs/op

BenchmarkSubtreeDecode_5kTxs/PerTxReadFrom              344   3407017 ns/op  5240099 B/op  215001 allocs/op
BenchmarkSubtreeDecode_5kTxs/PerSubtreeArena            402   2982810 ns/op  3247887 B/op  190001 allocs/op

BenchmarkHashTxIDInto_vs_TxIDChainHash/TxIDChainHash   3291541   353.3 ns/op  608 B/op  2 allocs/op
BenchmarkHashTxIDInto_vs_TxIDChainHash/HashTxIDInto    3892976   298.6 ns/op    0 B/op  0 allocs/op

Key deltas:

  • Single-tx arena decode: -7% latency, -36% B/op, -11% allocs/op
  • 5k-tx batch decode: -12% latency, -38% B/op, -12% allocs/op
  • HashTxIDInto vs TxIDChainHash: -15% latency, zero allocations vs 608 B / 2 allocs

Note: these micro-benchmarks measure CPU + per-decode allocations on a canonical 3-input 2-output tx with small scripts. The real-world win is on cumulative held heap during catch-up sync with large OP_RETURN-style payloads, where the arena bounds peak RSS by the slab size + worker count instead of growing linearly with tx count until GC. A 320 MB output round-trip is verified end-to-end by TestTx_ReadFromWithArena_320MBOutput.

Test plan

  • go test ./... — 649 tests pass
  • go test -race ./... — clean
  • go vet ./... — clean
  • Equivalence tests across canonical fixtures (small, coinbase, 3-in 2-out, 5KB script, 200B unlocking)
  • 320 MB OP_RETURN round-trip
  • Zero-length script handling (Output + Input)
  • Extended-format Input round-trip (unlocking + PreviousTxScript both arena-backed)
  • Adversarial varint guards (Output, Input, Input extended PreviousTxScript)
  • Arena lifetime contract — post-Reset slab reuse explicitly tested

Follow-up (separate PR, in teranode)

teranode services/subtreevalidation / blockpersister / asset/repository / legacy/netsync will migrate hot-path decode loops to the new arena variants, bounded by per-subtree / per-block arena pools. Tracked at bsv-blockchain/teranode#920.

Single-goroutine bump-pointer arena that amortises per-script []byte
allocations across batches of tx decodes. Returned slices use 3-arg
slicing so writes cannot overflow into subsequent allocations.

MaxArenaAlloc capped at 1 GiB to admit legitimate large BSV
transactions while rejecting obviously-bogus varint lengths.

API is additive; no existing public surface changed.
The branch was guarded against len(slab) < pos+n, but both NewArena
and the grow path use make([]byte, n) which always returns len==cap.
Remove the dead branch and tighten the comment to document why the
3-arg slice is always in bounds.

Also clarify ResetAndShrink doc for the maxKeep <= 0 case and add a
test proving Reset reuses the same backing array.
Decodes Output using a caller-provided Arena for the locking-script
[]byte. Existing ReadFrom is unchanged. Rejects script lengths greater
than MaxArenaAlloc (1 GiB) to prevent unbounded allocation from
adversarial input.
Mirrors Input.ReadFrom / ReadFromExtended via an internal helper
readFromWithArena. Pulls the unlocking-script byte slice (and the
extended-format PreviousTxScript) from a caller-provided Arena.
Existing ReadFrom / ReadFromExtended unchanged. Fixed-size fields
(previousTxID, prevIndex, sequence, prevSatoshis) use stack arrays.
Rejects script lengths exceeding MaxArenaAlloc.
a.Alloc(0) on a fresh arena returns a nil slice, breaking equivalence
with the existing ReadFrom which uses make([]byte, 0) (non-nil empty).
Add a sentinel branch and a zero-length-script fixture. Same pattern
as Input.ReadFromWithArena.
Mirrors Tx.ReadFrom field-for-field with extended-format detection
preserved. Routes per-script allocations through the caller-provided
Arena via Input/Output ReadFromWithArena variants. Existing ReadFrom
is unchanged. Verified equivalent on canonical tx fixtures and a
320 MB OP_RETURN payload.
Lets callers (e.g. teranode's catch-up stream readers) reuse a single
scratch buffer across millions of txid hashes instead of allocating
tx.Bytes() per call. Honours the existing SetTxHash cache: when cached,
the scratch buffer is passed through unchanged and no allocation occurs.
Rejects script lengths greater than MaxArenaAlloc (1 GiB) in the
existing Output.ReadFrom and Input.readFrom (unlocking + extended
PreviousTxScript). Previously a malformed or adversarial varint
claiming e.g. 4 GiB would cause an immediate make([]byte, 2^32)
allocation and likely OOM. The cap remains loose enough to admit
legitimate large BSV transactions (~320 MB).
Compares Tx.ReadFrom vs Tx.ReadFromWithArena for single-tx and
5000-tx subtree-style streams. Compares HashTxIDInto vs the
existing TxIDChainHash.

Benchmark results (Apple M3 Max, go test -benchmem -count=1):

BenchmarkTx_ReadFrom_vs_WithArena/ReadFrom          1682888   702.2 ns/op  1096 B/op  44 allocs/op
BenchmarkTx_ReadFrom_vs_WithArena/ReadFromWithArena 1948824   653.5 ns/op   696 B/op  39 allocs/op
  → -7% ns/op, -36% B/op, -11% allocs/op per tx

BenchmarkSubtreeDecode_5kTxs/PerTxReadFrom    344   3407017 ns/op  5240099 B/op  215001 allocs/op
BenchmarkSubtreeDecode_5kTxs/PerSubtreeArena  402   2982810 ns/op  3247887 B/op  190001 allocs/op
  → -12% ns/op, -38% B/op, -12% allocs/op per 5k-tx batch

BenchmarkHashTxIDInto_vs_TxIDChainHash/TxIDChainHash  3291541   353.3 ns/op  608 B/op  2 allocs/op
BenchmarkHashTxIDInto_vs_TxIDChainHash/HashTxIDInto   3892976   298.6 ns/op    0 B/op  0 allocs/op
  → -15% ns/op, -100% B/op, zero allocs
@oskarszoon oskarszoon requested a review from mrz1836 as a code owner May 21, 2026 16:56
@github-actions github-actions Bot added size/XL Very large change (>500 lines) feature Any new significant addition labels May 21, 2026

@mrz1836 mrz1836 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - CI checks are failing

…cation

Output.ReadFrom, Input.readFrom (the helper backing ReadFrom and
ReadFromExtended), and Tx.ReadFrom now delegate to their arena variants
with a nil Arena. The arena variant checks for nil at script-allocation
sites and falls back to make([]byte, l) — preserving byte-identical
behaviour with the previous implementations.

Net effects:
- Removes ~140 lines of duplicated decode body
- Stack-array optimisations (var satoshis [8]byte, etc.) now apply to
  all callers, including the existing ReadFrom paths
- MaxArenaAlloc hardening guard now lives in a single place

Addresses SonarQube 6.3% duplication-on-new-code gate.
Pulls the varint + cap + alloc-or-make + ReadFull pattern out of
Output.ReadFromWithArena and the two script-reading blocks in
Input.readFromWithArena into a single readArenaScript helper.

Net effect:
- ~60 lines of duplicated decode logic removed
- Single source of truth for the MaxArenaAlloc cap message and the
  nil-arena fallback path
- Error message labels are now consistent (lockingScript /
  unlockingScript / prevTxScript) across both cap and short-read paths

Addresses SonarQube 3.2% duplication-on-new-code (gate is <= 3%).
@oskarszoon oskarszoon requested a review from mrz1836 May 21, 2026 17:12
- gofmt: input_test.go, output_test.go
- misspell (UK -> US): amortise, serialise, serialisation across
  arena.go, tx.go, tx_arena_test.go, plus a sweep of remaining
  British spellings in the PR's touched files
- prealloc: preallocate fixture-builder slices in input_test.go
  and output_test.go

Restores the Code Quality gate on PR #146.

@mrz1836 mrz1836 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Addresses PR #146 reviewer flag: Clone/ShallowClone previously wrapped
or pointer-shared script bytes from the original, which would
silently corrupt clones if the original's scripts were arena-backed
and the arena was reset.

cloneScript helper produces an isolated heap-owned copy. Used in all
three clone paths. The clone is now safe against arena resets and
in-place mutation of script bytes — at the cost of one make+copy per
script (negligible for typical txs; proportional to script size for
large data-carrier txs, which is unavoidable for a real clone).

Also:
- arena.go: clarify Reset doc re: pre-grow slab survivors
- arena.go: add 64-bit assumption note on MaxArenaAlloc
- arena_test.go: cover Alloc(0) on zero-value arena
- tx_arena_test.go: arena reset + overwrite isolation tests for both
  Clone and ShallowClone
- Collapse TestTx_Clone_IsolatesFromArena + TestTx_ShallowClone_IsolatesFromArena
  into a single table-driven TestTx_CloneVariants_IsolateFromArena.
  Eliminates ~80% of the duplicated test body that pushed SonarQube
  duplication-on-new-code to 8.4%.
- tx_arena_test.go:98: serialisation -> serialization (misspell).
- arena_test.go: TestArena_Alloc_ZeroOnZeroValueArena uses require.Empty
  per testifylint, not require.Len(..., 0).
@mrz1836 mrz1836 self-requested a review May 21, 2026 17:48

@mrz1836 mrz1836 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR; LGTM

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an arena-backed decoding path for transactions to reduce per-script heap allocations during large-scale block validation, and adds a zero-allocation TxID hashing helper that can reuse a caller-provided scratch buffer. It also hardens script-length decoding by rejecting varint script lengths above MaxArenaAlloc (1 GiB).

Changes:

  • Add bt.Arena bump-pointer allocator and a shared readArenaScript helper for script decoding with a 1 GiB per-script cap.
  • Add ReadFromWithArena decode variants for Tx, Input (standard + extended), and Output to route script byte allocations through the arena.
  • Add Tx.HashTxIDInto(scratch) to compute txid without allocations when scratch is pre-sized.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
arena.go Adds the Arena allocator, MaxArenaAlloc, and shared readArenaScript (including oversize guards).
tx.go Routes decoding through arena-aware Input/Output reads; adds HashTxIDInto; updates clone paths to deep-copy scripts.
input.go Adds arena-backed read variants and routes script reads through readArenaScript (including oversize guards).
output.go Adds arena-backed read variant and routes locking script reads through readArenaScript.
tx_arena_test.go Adds equivalence/zero-alloc/caching tests for arena decoding and HashTxIDInto, plus arena-reset clone isolation test.
input_test.go Adds fixtures and tests for arena decode equivalence and oversize script-length rejection (standard + extended).
output_test.go Adds fixtures and tests for arena decode equivalence, reset invalidation behavior, and oversize rejection.
arena_test.go Adds unit tests for Arena allocation/grow/reset/shrink semantics and limits.
arena_benchmark_test.go Adds benchmarks comparing arena vs non-arena decoding and HashTxIDInto vs TxIDChainHash.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tx.go Outdated
Comment thread tx.go Outdated
Comment thread tx.go Outdated
Comment thread input.go Outdated
Comment thread output.go Outdated
- Tx.ReadFromWithArena, Output.ReadFromWithArena: docstrings now spell
  out that the arena-lifetime restriction only applies when a != nil;
  the nil-arena path is the heap-allocating implementation behind the
  standard ReadFrom.
- Tx.HashTxIDInto: docstring no longer claims TxIDChainHash populates
  the cache (it doesn't — only SetTxHash does).
- cloneScript: return a non-nil empty bscript.Script for zero-length
  source so deep-equality matches readArenaScript and make([]byte, 0).
- Input.readFromWithArena: prevIndex error label corrected from
  previousTxID(4) to prevIndex(4).
- serialisation -> serialization (misspell, 2 occurrences)
- Drop "Note:" prefix on the SetTxHash-cache clarification (godox
  flags it as a TODO-style marker). Same content, no prefix.
@sonarqubecloud

Copy link
Copy Markdown

@oskarszoon oskarszoon requested a review from Copilot May 21, 2026 18:03

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@oskarszoon oskarszoon merged commit ef8e945 into master May 21, 2026
45 checks passed
@github-actions github-actions Bot deleted the feat/arena-decode branch May 21, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Any new significant addition size/XL Very large change (>500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants