Skip to content

perf: eliminate CappedArraySource heap allocation in SpanSource#10734

Merged
benaadams merged 2 commits into
masterfrom
fix/span-source-allocation
Mar 9, 2026
Merged

perf: eliminate CappedArraySource heap allocation in SpanSource#10734
benaadams merged 2 commits into
masterfrom
fix/span-source-allocation

Conversation

@kamilchodola

Copy link
Copy Markdown
Contributor

Changes

  • Replace the discriminated-union object _obj field in SpanSource with explicit byte[]? _array + int _length fields
  • Remove the CappedArraySource sealed class that was heap-allocated on every SafeRentBuffer call
  • Eliminate type-checking overhead (3 branches) on every Span/Length/IsNull access
  • SpanSource(CappedArray<byte>) now extracts UnderlyingArray + Length directly — zero allocations

Types of changes

  • Optimization

Context

PR #8719 introduced SpanSource replacing RlpFactory/CappedArray<byte> in trie hot paths. The design used a discriminated union over object _obj (either byte[] or a CappedArraySource wrapper class). When SafeRentBuffer is called with a pool (the common path during trie node encode/decode), it called new CappedArraySource(capped) — a heap allocation on every call. This happens thousands of times per block (every trie node encode/decode), adding significant GC pressure.

Benchmark comparison between 79c0da0 (before #8719) and eb4a682 (after) showed:

  • AVG block processing: 42.6ms → 51.2ms (+20%)
  • P90 block processing: 102ms → 135ms (+32%)

The P90 amplification is consistent with GC pauses from short-lived Gen0 CappedArraySource objects.

This fix replaces the single object _obj with byte[]? _array + int _length (struct grows 8→16 bytes), eliminating all heap allocations and type checking. The TryGetCappedArray method reconstructs CappedArray<byte> from the stored fields when needed.

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • No (existing tests cover this — all 3 SpanSource tests and 418 Trie tests pass)

Notes on testing

  • SpanSource unit tests: 3/3 pass
  • Trie tests: 418/418 pass (5 skipped)
  • Full solution builds with 0 errors, 0 warnings
  • Needs benchmark validation to confirm the regression is recovered

@LukaszRozmej LukaszRozmej left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need SpanSource? Can we just replace it with Memory<byte>?

@kamilchodola

Copy link
Copy Markdown
Contributor Author

Good question. Short answer: SpanSource is essentially Memory<byte> with null awareness, and the fix already brings it to the minimal representation.

Why not Memory<byte> directly:

The trie code relies heavily on distinguishing null (RLP not loaded yet / value not present) from empty (loaded, zero-length). There are ~30 IsNull/IsNotNull checks across the trie — 21 in TrieNode.cs alone. Memory<byte> can't distinguish these through its public API: both default(Memory<byte>) and new Memory<byte>(Array.Empty<byte>()) return IsEmpty = true, Length = 0.

Workarounds exist (MemoryMarshal.TryGetArray() to peek at the internal array, or Memory<byte>? nullable) but they're either fragile or grow the struct to ~24 bytes.

What the fix does:

Replaces the original object _obj discriminated union (which heap-allocated CappedArraySource on every trie encode/decode) with byte[]? _array + int _length (16 bytes). This is the same layout as Memory<byte> minus the _index field we don't need (CappedArray always starts at offset 0), with clean null/empty distinction via _array is null.

@benaadams benaadams marked this pull request as ready for review March 6, 2026 12:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes SpanSource in Nethermind.Core to eliminate per-call heap allocations and type-dispatch overhead in trie/RLP hot paths by representing the source as byte[]? + length instead of a discriminated object union.

Changes:

  • Replace the previous object-based discriminated union with explicit byte[]? _array and int _length storage.
  • Remove the CappedArraySource wrapper allocation and simplify Span/Length accessors.
  • Add a unit test covering implicit conversion from a null byte array.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/Nethermind/Nethermind.Core/Buffers/SpanSource.cs Refactors SpanSource representation to remove allocations/type checks and reconstruct CappedArray<byte> on demand.
src/Nethermind/Nethermind.Core.Test/Buffers/SpanSourceTests.cs Adds test coverage for implicit conversion from byte[]? when the array is null.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Nethermind/Nethermind.Core/Buffers/SpanSource.cs Outdated
Comment thread src/Nethermind/Nethermind.Core/Buffers/SpanSource.cs Outdated
@LukaszRozmej

Copy link
Copy Markdown
Member

IMO better to remove SpanSource entirely, now it id basically CappedArray again without the previous optimizations for it.

SpanSource was introduced in PR #8719 as a discriminated union wrapping
byte[] or CappedArraySource. The CappedArraySource heap allocation on
every trie encode/decode caused a ~20% AVG / ~32% P90 block processing
regression.

Instead of fixing SpanSource's allocation, remove it entirely since
CappedArray<byte> already provides all needed functionality (null/empty
semantics, pool compatibility, span access) without any heap allocation.

Key changes:
- Replace SpanSource with CappedArray<byte> in TrieNode._rlp, Value,
  FullRlp, and all trie hot paths
- Delete SpanSource.cs, ISpanSource.cs, ISpanSourcePool.cs
- Add SafeRent/SafeReturn extensions on ICappedArrayPool
- Fix value comparison in PatriciaTree.Set to use SequenceEqual
  (CappedArray struct Equals compares refs, not content)
- Update memory size calculations for CappedArray<byte> layout

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/Nethermind/Nethermind.Trie/TrieNodeFactory.cs:20

  • TrieNodeFactory.CreateLeaf is public and its signature changed from SpanSource to CappedArray<byte>, which is a breaking change for downstream consumers. If this is intended, consider adding a compatibility overload (e.g., accepting byte[]?/ReadOnlySpan<byte>) or keeping the previous signature as [Obsolete] for a transition period.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 41 to 44
private byte _blockAndFlags = 0;
private SpanSource _rlp;
private CappedArray<byte> _rlp;
private INodeData? _nodeData;

Copilot AI Mar 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description describe optimizing SpanSource internals (removing CappedArraySource allocations), but the implementation here removes SpanSource from TrieNode entirely and switches the stored RLP representation to CappedArray<byte>. Please update the PR title/description (and any release notes/changelog, if applicable) to reflect the actual approach and the resulting API surface changes (e.g., FullRlp/Value types).

Copilot uses AI. Check for mistakes.
Comment on lines +19 to 24
public static CappedArray<byte> VerifyOneProof(byte[][] proof, Hash256 root)
{
if (proof.Length == 0)
{
return SpanSource.Null;
return default(CappedArray<byte>);
}

Copilot AI Mar 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProofVerifier.VerifyOneProof is a public API and its return type changed from SpanSource to CappedArray<byte>, which is a breaking change for any external callers. If the broader removal of SpanSource is intentional, consider adding a temporary overload returning byte[]? or keeping the old signature as [Obsolete] to reduce upgrade friction.

Copilot uses AI. Check for mistakes.

@LukaszRozmej LukaszRozmej left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it makes any sense in any places to return it by ref?

Comment thread src/Nethermind/Nethermind.Core/Buffers/ICappedArrayPool.cs Outdated
@benaadams benaadams merged commit 334488b into master Mar 9, 2026
111 checks passed
@benaadams benaadams deleted the fix/span-source-allocation branch March 9, 2026 23:20
kamilchodola added a commit that referenced this pull request Mar 17, 2026
…ent torn reads

CappedArray<byte> is a 12+ byte struct (byte[] _array + int _length).
On x64, only 8-byte reference reads/writes are atomic. When one thread
writes both fields and another reads simultaneously, the reader can see
_array from one write and _length from another, causing
IndexOutOfRangeException when _length > _array.Length.

This was introduced by PR #10734 which replaced the old SpanSource
(single object ref, 8 bytes, atomic) with CappedArray<byte>.

Fix: change _rlp from CappedArray<byte> to byte[]? (single 8-byte
reference, atomically readable on x64). In GenerateKey, extract
exact-size byte[] from the pooled CappedArray<byte> at the assignment
boundary. This eliminates both the CappedArraySource allocation from
the original code AND the torn read bug.
kamilchodola added a commit that referenced this pull request Mar 17, 2026
…ent torn reads

CappedArray<byte> is a 12+ byte struct (byte[] _array + int _length).
On x64, only 8-byte reference reads/writes are atomic. When one thread
writes both fields and another reads simultaneously, the reader can see
_array from one write and _length from another, causing
IndexOutOfRangeException when _length > _array.Length.

This was introduced by PR #10734 which replaced the old SpanSource
(single object ref, 8 bytes, atomic) with CappedArray<byte>.

Fix: change _rlp from CappedArray<byte> to byte[]? (single 8-byte
reference, atomically readable on x64). In GenerateKey, extract
exact-size byte[] from the pooled CappedArray<byte> at the assignment
boundary. This eliminates both the CappedArraySource allocation from
the original code AND the torn read bug.
kamilchodola added a commit that referenced this pull request Mar 18, 2026
kamilchodola added a commit that referenced this pull request Mar 18, 2026
node.Value and node.FullRlp return SpanSource instead of CappedArray<byte>,
so access .Span explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants