perf: eliminate CappedArraySource heap allocation in SpanSource#10734
Conversation
LukaszRozmej
left a comment
There was a problem hiding this comment.
Do we even need SpanSource? Can we just replace it with Memory<byte>?
|
Good question. Short answer: Why not The trie code relies heavily on distinguishing null (RLP not loaded yet / value not present) from empty (loaded, zero-length). There are ~30 Workarounds exist ( What the fix does: Replaces the original |
There was a problem hiding this comment.
Pull request overview
This PR optimizes SpanSource in Nethermind.Core to eliminate per-call heap allocations and type-dispatch overhead in trie/RLP hot paths by representing the source as byte[]? + length instead of a discriminated object union.
Changes:
- Replace the previous
object-based discriminated union with explicitbyte[]? _arrayandint _lengthstorage. - Remove the
CappedArraySourcewrapper allocation and simplifySpan/Lengthaccessors. - Add a unit test covering implicit conversion from a
nullbyte array.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/Nethermind/Nethermind.Core/Buffers/SpanSource.cs |
Refactors SpanSource representation to remove allocations/type checks and reconstruct CappedArray<byte> on demand. |
src/Nethermind/Nethermind.Core.Test/Buffers/SpanSourceTests.cs |
Adds test coverage for implicit conversion from byte[]? when the array is null. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
IMO better to remove SpanSource entirely, now it id basically CappedArray again without the previous optimizations for it. |
SpanSource was introduced in PR #8719 as a discriminated union wrapping byte[] or CappedArraySource. The CappedArraySource heap allocation on every trie encode/decode caused a ~20% AVG / ~32% P90 block processing regression. Instead of fixing SpanSource's allocation, remove it entirely since CappedArray<byte> already provides all needed functionality (null/empty semantics, pool compatibility, span access) without any heap allocation. Key changes: - Replace SpanSource with CappedArray<byte> in TrieNode._rlp, Value, FullRlp, and all trie hot paths - Delete SpanSource.cs, ISpanSource.cs, ISpanSourcePool.cs - Add SafeRent/SafeReturn extensions on ICappedArrayPool - Fix value comparison in PatriciaTree.Set to use SequenceEqual (CappedArray struct Equals compares refs, not content) - Update memory size calculations for CappedArray<byte> layout
efef27e to
6a61bf5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/Nethermind/Nethermind.Trie/TrieNodeFactory.cs:20
TrieNodeFactory.CreateLeafis public and its signature changed fromSpanSourcetoCappedArray<byte>, which is a breaking change for downstream consumers. If this is intended, consider adding a compatibility overload (e.g., acceptingbyte[]?/ReadOnlySpan<byte>) or keeping the previous signature as[Obsolete]for a transition period.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private byte _blockAndFlags = 0; | ||
| private SpanSource _rlp; | ||
| private CappedArray<byte> _rlp; | ||
| private INodeData? _nodeData; | ||
|
|
There was a problem hiding this comment.
The PR title/description describe optimizing SpanSource internals (removing CappedArraySource allocations), but the implementation here removes SpanSource from TrieNode entirely and switches the stored RLP representation to CappedArray<byte>. Please update the PR title/description (and any release notes/changelog, if applicable) to reflect the actual approach and the resulting API surface changes (e.g., FullRlp/Value types).
| public static CappedArray<byte> VerifyOneProof(byte[][] proof, Hash256 root) | ||
| { | ||
| if (proof.Length == 0) | ||
| { | ||
| return SpanSource.Null; | ||
| return default(CappedArray<byte>); | ||
| } |
There was a problem hiding this comment.
ProofVerifier.VerifyOneProof is a public API and its return type changed from SpanSource to CappedArray<byte>, which is a breaking change for any external callers. If the broader removal of SpanSource is intentional, consider adding a temporary overload returning byte[]? or keeping the old signature as [Obsolete] to reduce upgrade friction.
LukaszRozmej
left a comment
There was a problem hiding this comment.
Does it makes any sense in any places to return it by ref?
…ent torn reads CappedArray<byte> is a 12+ byte struct (byte[] _array + int _length). On x64, only 8-byte reference reads/writes are atomic. When one thread writes both fields and another reads simultaneously, the reader can see _array from one write and _length from another, causing IndexOutOfRangeException when _length > _array.Length. This was introduced by PR #10734 which replaced the old SpanSource (single object ref, 8 bytes, atomic) with CappedArray<byte>. Fix: change _rlp from CappedArray<byte> to byte[]? (single 8-byte reference, atomically readable on x64). In GenerateKey, extract exact-size byte[] from the pooled CappedArray<byte> at the assignment boundary. This eliminates both the CappedArraySource allocation from the original code AND the torn read bug.
…ent torn reads CappedArray<byte> is a 12+ byte struct (byte[] _array + int _length). On x64, only 8-byte reference reads/writes are atomic. When one thread writes both fields and another reads simultaneously, the reader can see _array from one write and _length from another, causing IndexOutOfRangeException when _length > _array.Length. This was introduced by PR #10734 which replaced the old SpanSource (single object ref, 8 bytes, atomic) with CappedArray<byte>. Fix: change _rlp from CappedArray<byte> to byte[]? (single 8-byte reference, atomically readable on x64). In GenerateKey, extract exact-size byte[] from the pooled CappedArray<byte> at the assignment boundary. This eliminates both the CappedArraySource allocation from the original code AND the torn read bug.
node.Value and node.FullRlp return SpanSource instead of CappedArray<byte>, so access .Span explicitly.
Changes
object _objfield inSpanSourcewith explicitbyte[]? _array + int _lengthfieldsCappedArraySourcesealed class that was heap-allocated on everySafeRentBuffercallSpan/Length/IsNullaccessSpanSource(CappedArray<byte>)now extractsUnderlyingArray+Lengthdirectly — zero allocationsTypes of changes
Context
PR #8719 introduced
SpanSourcereplacingRlpFactory/CappedArray<byte>in trie hot paths. The design used a discriminated union overobject _obj(eitherbyte[]or aCappedArraySourcewrapper class). WhenSafeRentBufferis called with a pool (the common path during trie node encode/decode), it callednew CappedArraySource(capped)— a heap allocation on every call. This happens thousands of times per block (every trie node encode/decode), adding significant GC pressure.Benchmark comparison between
79c0da0(before #8719) andeb4a682(after) showed:The P90 amplification is consistent with GC pauses from short-lived Gen0
CappedArraySourceobjects.This fix replaces the single
object _objwithbyte[]? _array + int _length(struct grows 8→16 bytes), eliminating all heap allocations and type checking. TheTryGetCappedArraymethod reconstructsCappedArray<byte>from the stored fields when needed.Testing
Requires testing
If yes, did you write tests?
Notes on testing