Skip to content

perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders#11617

Merged
LukaszRozmej merged 9 commits into
NethermindEth:masterfrom
0xDevNinja:feature/8573-pos-forward-header-cache
Jun 2, 2026
Merged

perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders#11617
LukaszRozmej merged 9 commits into
NethermindEth:masterfrom
0xDevNinja:feature/8573-pos-forward-header-cache

Conversation

@0xDevNinja

Copy link
Copy Markdown
Contributor

Resolves #8573

Changes

  • PosForwardHeaderProvider.GetBlockHeaders now fetches a 4 * maxHeader batch from ChainLevelHelper once, caches it, and serves subsequent calls by slicing the cached array.
  • Cache is invalidated when IBeaconPivot.ProcessDestination (hash or number) changes, when BestKnownNumber advances past the cached end, or when the requested slice cannot be satisfied without going below the cached start.
  • Pre-merge path (base.GetBlockHeaders) is unchanged; the LastResponseBatch mechanism in PowForwardHeaderProvider continues to drive that branch.
  • ValidateSeals runs once per cache fill (on the freshly fetched array) rather than on every peer allocation.

Types of changes

What types of changes does your code introduce?

  • Optimization

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

Notes on testing

PosForwardHeaderProviderCacheTests (5 cases) covers: cache hit on identical inputs, invalidation when ProcessDestination.Hash changes, invalidation when BestKnownNumber advances past the cached range, slice advances with BestKnownNumber while inside the range, and skipLastN honored on cache hits. Existing ForwardHeaderProviderTests (37/38 pass, 1 pre-existing skip) continue to pass.

Documentation

Requires documentation update

  • No

Requires explanation in Release Notes

  • No

Remarks

Invalidation strategy follows the hint left in the issue: with no explicit signal for when ChainLevelHelper would return different headers, identity of BeaconPivot.ProcessDestination (set by NewPayload/ForkchoiceUpdated) plus BestKnownNumber advance covers the cases where the underlying chain-level data could have moved. Happy to swap to a more aggressive (e.g. BlockTree.NewHeadBlock) invalidation if reviewers prefer.

@LukaszRozmej

Copy link
Copy Markdown
Member

Deep review

Verdict: request changes. The optimization is sound in concept, but the cache slicing has an off-by-one that drops the anchor block on every cache hit in the typical PoS-sync state (the walk-back case). The test suite doesn't catch it because the mock simulates a non-walk-back scenario that doesn't occur in real sync.


1. Critical: off-by-one in desiredStart drops the anchor block

TryServeFromCache computes:

long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber);

The + 1 is wrong for the dominant PoS-sync state:

  • During active beacon sync, beacon-only headers don't advance BestKnownNumber (BlockTree.cs only updates it when !beaconInsert). So BestKnownNumber == latest_canonical_block and BestKnownNumber + 1 is a beacon-info block.
  • ChainLevelHelper.GetStartingPoint() starts at min(BestKnown+1, ProcessDestination.Number). When ProcessDestination > BestKnown (active sync) the block at BestKnown+1 is IsBeaconInfo, so the do/while walks back one level and returns (BestKnown, hash_of_BestKnown_block) — i.e. headers[0].Number == BestKnownNumber in the array GetNextHeaders produces.
  • Same in the caught-up case (ProcessDestination == BestKnownstartingPoint = BestKnown, no walk-back, headers[0] = BestKnown).

So in both real-world states, the original chainLevelHelper.GetNextHeaders returns [block@BestKnown, block@BestKnown+1, …]. BlockDownloader relies on this: headers[0] is the already-known anchor used as parentHeader, and headers.Skip(1) is the to-download set (see AssembleRequest, AssembleSatisfiedEntries, and the ReceiptEdgeCase(…, headers[1].Number, …) call site).

With this PR, on the first call (cache miss) fresh starts at BestKnown as expected. On subsequent cache hits with the same BestKnownNumber, desiredStart = BestKnown+1, offset = 1, and the returned slice is [BestKnown+1, …, BestKnown+N]the anchor at BestKnown is dropped. Downstream effects:

  1. AssembleSatisfiedEntries iterates headers.Skip(1) starting at BestKnown+2, so the body for the block at BestKnown+1 (queued by the previous iteration's AssembleRequest) is never moved into a satisfied entry and never suggested — it lingers in _downloadRequests.
  2. AssembleRequest then sets parentHeader = headers[0] = block@BestKnown+1, which itself is a beacon-only header. BlockTree.SuggestBlock(K+2) will likely accept (beacon header at K+1 exists), but BestKnownNumber then advances past K+1 with the canonical chain having a state-less hole at K+1. In process mode this stalls the processor; in fast sync it leaks the request and silently skips one body download per cache hit.
  3. The previousStartingHeaderNumber == headers[0].Number guard in BlockDownloader.DoPrepareRequest flips one iteration earlier than before, mostly benign.

The new tests don't catch this because the chain-helper mock returns BuildSequentialHeaders(start: 1, count: 64) with BestKnownNumber == 0. That's a "no walk-back, headers start at BestKnown+1" shape — unreachable in real PoS sync. The test asserts second[0]!.Number.Should().Be(21) after advancing BestKnownNumber to 20, but in production the same call would have returned [20, 21, …, 35] (anchor at 20 included).

Fix. The cache must always present the anchor:

long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber);

(or equivalently offset = (int)(_blockTree.BestKnownNumber - cacheStart)), and add a regression test that mocks BuildSequentialHeaders(start: 0, count: 64) so the slice's first element is the anchor at BestKnownNumber, not the first new block.


2. ValidateSeals no longer revalidates the boundary on each slice

Original code called ValidateSeals on the returned slice, force-validating the last element and a random index (and any terminal-block transitions). This PR runs it once on the full fresh array, so cache hits get only the validations from the original fill. For PoS this is mostly fine (seal validation is effectively a no-op), but the forced check on suspected terminal blocks (headers[i+1].Difficulty == 0 transitions) is no longer guaranteed at the slice boundary. Likely benign post-merge, but call it out as an explicit trade-off, or rerun ValidateSeals(slice, cancellation) on cache hits (cheap for PoS, safer for the transition region).


3. skipLastN semantics drift

Original: chainLevelHelper.GetNextHeaders(maxHeader, long.MaxValue, skipLastN)skipLastBlockCount trims from the tail of the available chain (ChainLevelHelper.cs:69-70, 147-155).

PR: always passes skipLastBlockCount: 0, then applies skipLastN as a count cap inside BuildSlice / TryServeFromCache. When fresh.Length >= maxHeader + skipLastN the two are equivalent, but in the small/end-of-chain regime where ChainLevelHelper returns fewer headers than fetchSize, the new code returns up to fresh.Length - skipLastN from the front, while the original truncated from the tail of the available chain. Worth either a clarifying comment or restoring the parameter pass-through on cache miss.


4. Invalidation strategy is too coarse

ProcessDestination identity + BestKnownNumber > cacheEnd will miss:

  • Reorgs landing inside the cached range that don't change ProcessDestination (e.g. concurrent FCU keeping the same head but with a competing branch at lower height). Cached headers are then stale; ChainLevelHelper would re-read chain-level info.
  • ProcessDestination rewinds: the hash check catches this, good — but the Number comparison uses long.MaxValue sentinel for null, masking a null transition. Verify intentional.

The PR remarks already flag this. I'd prefer subscribing to BlockTree.NewHeadBlock (BlockDownloader already does), nuking the cache on reorg, rather than relying on ProcessDestination identity alone.


5. Smaller things

  • TryServeFromCache reads cache fields under _cacheLock but accesses cached[0].Number / cached[^1].Number outside the lock — safe today (arrays are immutable post-publish) but worth a comment.
  • cached.Length == 0 is unreachable given UpdateCache guards on MinCachedHeaderBatchSize. Drop it.
  • MinCachedHeaderBatchSize = 32 duplicates the same const in PowForwardHeaderProvider. Hoist to the base class.
  • TryServeFromCache allocates a new BlockHeader[take] and the caller wraps it via ToPooledList(slice.Length) — two allocations per hit. Either populate ArrayPoolList directly or hand the array to ToPooledList(array, count).
  • _cacheLock is redundant in practice (BlockDownloader._requestLock serializes callers), but fine as defense-in-depth — a one-liner comment would help.
  • BlockHeader[]? vs BlockHeader?[]? mismatch forces slice! casts; clean up the nullable-element shape in the slice path for consistency.

Summary

# Severity Issue
1 Blocker Off-by-one in desiredStart drops the anchor on cache hits in the walk-back path (typical PoS sync). Test mock hides it.
2 Medium ValidateSeals no longer runs per slice — terminal-transition forced check may be skipped.
3 Low skipLastN semantics drift from tail-trim to front-cap.
4 Medium Cache invalidation misses reorgs that don't touch ProcessDestination; consider subscribing to BlockTree.NewHeadBlock.
5 Nit Const duplication, double allocation on hit, dead branch, nullable-shape cleanup.

Fixing #1 and adding a walk-back regression test (mock start: 0 so the anchor is included) would unblock; #4 worth revisiting before merge.

@0xDevNinja 0xDevNinja force-pushed the feature/8573-pos-forward-header-cache branch 2 times, most recently from 30acbd5 to 376311a Compare May 20, 2026 07:28
@LukaszRozmej

Copy link
Copy Markdown
Member

@claude review

@claude

claude Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @LukaszRozmej's task in 3m 32s —— View job


Code Review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Gather context and read rules
  • Read changed files (PosForwardHeaderProvider.cs, ChainLevelHelper.cs, PowForwardHeaderProvider.cs, test file)
  • Analyze code correctness, performance, and test coverage
  • Post full review

Summary

The caching concept is sound — avoiding repeated ChainLevelHelper round-trips by serving a 4× pre-fetched batch is a worthwhile optimization. However there is one critical blocker: an off-by-one that silently drops the anchor block on every single cache hit, paired with a test mock that was accidentally written to hide that exact bug.


Critical

1. Off-by-one in TryServeFromCache drops the anchor on every cache hit (line 98)

ChainLevelHelper.GetStartingPoint() always returns the last processed (non-beacon-info) block as the anchor. In the typical active-sync state the walk-back loop returns (BestKnownNumber, hash_of_BestKnownNumber), so fresh[0].Number == BestKnownNumber and the anchor is stored at index 0 in the cache.

On the first call (cache miss), BuildSlice correctly returns [block@BestKnown, block@BestKnown+1, …].
On subsequent calls (cache hit, unchanged BestKnownNumber):

desiredStart = BestKnownNumber + 1   // +1 is wrong
offset       = 1
slice        = [block@BestKnown+1, …]  // anchor dropped

BlockDownloader relies on headers[0] being the already-known anchor that seeds AssembleSatisfiedEntries. Dropping it means:

  • The body for BestKnown+1 is never moved to a satisfied entry and lingers in _downloadRequests.
  • parentHeader on the next AssembleRequest becomes a beacon-only header, potentially causing a state hole in process mode or a silent skipped body in fast-sync mode.

Fix: long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber);

2. Test mock hides the bug (line 36 of test file)

The mock returns BuildSequentialHeaders(start: 1, count: 64) with BestKnownNumber = 0.
This gives cacheStart = 1, desiredStart = BestKnownNumber + 1 = 1, offset = 0 — the only case where the off-by-one is invisible.

In production GetStartingPoint() returns (BestKnownNumber, …) so cacheStart == BestKnownNumber and offset would be 1 on the next hit. The mock should use start: 0 to match. The Cached_slice_advances_with_best_known_number assertion second[0]!.Number.Should().Be(21) also asserts the buggy behaviour — it should be 20 (the anchor at the new BestKnownNumber).


Medium

3. ValidateSeals not re-run per slice (line 72)

Running it once on the large fill means the forced last-element check and terminal-block transition detection are fixed at fill positions, not at the slice boundaries served on cache hits. Safe in pure PoS, but a latent risk in the merge-transition region. Consider calling ValidateSeals(slice, cancellation) on cache hits — it is a no-op post-merge and restores the original safety invariant.

4. Cache invalidation misses intra-range reorgs (noted in PR remarks)

ProcessDestination identity + BestKnownNumber > cacheEnd won't catch a reorg that lands inside the cached range without changing ProcessDestination (e.g. a competing FCU at a lower height than current ProcessDestination). BlockDownloader already subscribes to BlockTree.NewHeadBlock; subscribing there and evicting the cache on reorg would be the more robust approach.


Low / Nit

5. skipLastN semantics drift (line 64)
Passing skipLastBlockCount: 0 to the large fetch includes headers right up to the chain tip; the original forwarded skipLastN to ChainLevelHelper which trimmed from the tail of the available window. In the end-of-chain regime the two diverge and may serve headers too close to a finalization boundary.

6. MinCachedHeaderBatchSize = 32 duplicated from base class — hoist to PowForwardHeaderProvider or a shared constant.

7. cached.Length == 0 guard on line 91 is unreachableUpdateCache only stores arrays of at least MinCachedHeaderBatchSize elements.

8. Double allocation on cache hitnew BlockHeader[take] in TryServeFromCache + ToPooledList(slice.Length) at the call site. Either populate ArrayPoolList directly inside TryServeFromCache, or expose an overload that accepts the pre-allocated array to avoid one copy.

9. _cacheLock comment — A one-liner noting it is defense-in-depth (callers are already serialized by BlockDownloader._requestLock) would help future readers understand why the lock exists given the serialized call pattern.


Verdict

Not mergeable — the off-by-one on desiredStart drops the anchor block on every cache hit in production PoS sync. The test suite does not exercise the real start == BestKnownNumber shape that ChainLevelHelper produces.

Fixing line 98 (+ 1 → no + 1) and updating the mock to start: 0 with the matching assertion change will unblock this. The medium findings (items 3–4) are worth addressing before merge; the rest are polish.

long currentNumber = processDestination?.Number ?? long.MaxValue;
if (cachedHash != currentHash || cachedNumber != currentNumber) return null;

long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: off-by-one drops the anchor block on every cache hit

ChainLevelHelper.GetStartingPoint() always returns the last processed (non-beacon) block as the anchor. In the typical active-sync state where BestKnownNumber + 1 is a beacon-info block, the walk-back loop returns (BestKnownNumber, hash_of_BestKnownNumber), so fresh[0].Number == BestKnownNumber. The cache therefore stores the anchor at index 0.

On the first call (cache miss) BuildSlice correctly returns [block@BestKnown, …]. On subsequent calls (cache hit, same BestKnownNumber):

desiredStart = BestKnownNumber + 1   // ← wrong
offset       = 1
slice        = [block@BestKnown+1, …]  // anchor dropped

BlockDownloader relies on headers[0] being the already-known anchor that seeds AssembleSatisfiedEntries / AssembleRequest. Dropping it means the body for BestKnown+1 is never moved to a satisfied entry and lingers in _downloadRequests, and parentHeader becomes a beacon-only header on the next iteration.

Fix:

Suggested change
long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber);
long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber);

public void SetUp()
{
_chainLevelHelper = Substitute.For<IChainLevelHelper>();
_chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test mock hides the off-by-one bug

The mock returns BuildSequentialHeaders(start: 1, count: 64) with BestKnownNumber = 0. This means desiredStart = Math.Min(0+1, 1000) = 1 = cacheStart, so offset = 0 — the exact case that avoids the off-by-one.

In production, ChainLevelHelper.GetStartingPoint() returns the anchor at BestKnownNumber, so fresh[0].Number == BestKnownNumber. The mock should match this:

Suggested change
_chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize));
_chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 0, count: CachedBatchSize));

With this change cacheStart = 0 = BestKnownNumber, so desiredStart = BestKnownNumber + 1 = 1 ≠ 0 and the off-by-one is exposed. The Cached_slice_advances_with_best_known_number assertion second[0]!.Number.Should().Be(21) would also need updating to Be(20) (the anchor at the new BestKnownNumber).


// Fetch a larger batch than asked so subsequent peer allocations can be served from the cache.
int fetchSize = Math.Max(maxHeader * CacheBatchMultiplier, MinCachedHeaderBatchSize);
BlockHeader?[]? fresh = chainLevelHelper.GetNextHeaders(fetchSize, long.MaxValue, skipLastBlockCount: 0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: skipLastN semantics differ from the original

The original code forwarded skipLastN directly to ChainLevelHelper as skipLastBlockCount, which appended skipLastN extra headers to the fetch window and then trimmed them from the tail — effectively ensuring skipLastN blocks at the tip of the available chain are not included.

With skipLastBlockCount: 0 here, the fetch retrieves fetchSize headers right up to the tip, caches them all, and then trims skipLastN at serve time from the end of the cached window. In the end-of-chain regime where ChainLevelHelper returns fewer than fetchSize headers (i.e. close to ProcessDestination), the two semantics diverge: the original would exclude the last skipLastN of the available chain; this version may include them if the slice happens to reach the end of the cache.

At minimum worth a comment; worst case it may serve headers too close to an in-progress finalization boundary.


// Alternatively we can do this in BeaconHeadersSyncFeed, but this seems easier.
ValidateSeals(headers!, cancellation);
ValidateSeals(fresh!, cancellation);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: ValidateSeals no longer runs on each served slice

The original code called ValidateSeals on every returned batch. Post-merge seal validation is a no-op, but the method also contains the terminal-block transition check:

bool terminalBlock = !lastBlock && headers[i+1].Difficulty == 0 && headers[i].Difficulty != 0;
bool forceValidation = lastBlock || i == randomNumberForValidation || terminalBlock;

Running it once on the large fresh array means: (a) the forced last-element check and the random-index check are fixed at fill time, not per-slice; (b) terminal-block transitions at a slice boundary (not present in the original fetch) are never force-validated on cache hits. This is safe in pure PoS but is a latent risk in the merge-transition region. Consider calling ValidateSeals(slice, cancellation) on cache hits too — it's cheap post-merge and restores the original safety invariant.

@0xDevNinja 0xDevNinja force-pushed the feature/8573-pos-forward-header-cache branch from 376311a to d9849ce Compare May 20, 2026 08:35
@LukaszRozmej

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @LukaszRozmej's task in 3m 48s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Load tools
  • Read changed files (PosForwardHeaderProvider.cs, PosForwardHeaderProviderCacheTests.cs, ChainLevelHelper.cs)
  • Verify prior findings against current code
  • Post final re-review

The PR has not been updated since the initial deep review (May 15) and the first automated review (today, May 20). All previously identified findings remain open on the current code.


Status of Prior Findings

# Severity Finding Status
1 Critical Off-by-one in TryServeFromCache (desiredStart = BKN+1) drops anchor on cache hit in walk-back path Still present — line 98 unchanged
2 Critical Test mock uses start: 1 / BKN=0 (no-walk-back shape) and hides the bug Still present — line 36 unchanged
3 Medium ValidateSeals not re-run per slice; terminal-transition forced check missed at slice boundaries Still present — line 72 unchanged
4 Medium Cache invalidation misses intra-range reorgs that don't change ProcessDestination Still present
5 Low skipLastN semantics drift: original trimmed from chain tail; PR front-caps the cached window Still present — line 64 unchanged
6 Nit MinCachedHeaderBatchSize = 32 duplicated from PowForwardHeaderProvider (base class already has it at line 40) Still present — line 33 unchanged
7 Nit cached.Length == 0 branch on line 91 is unreachable (UpdateCache guards on MinCachedHeaderBatchSize) Still present
8 Nit Double allocation per cache hit (new BlockHeader[take] + ToPooledList) Still present

Critical Analysis (re-confirmed against ChainLevelHelper.cs)

Reading ChainLevelHelper.GetStartingPoint() directly confirms the blocker. The method has two exit paths:

Path A — no walk-back (block at BKN+1 is already processed, !IsBeaconInfo):

// line 179-182
if (!beaconMainChainBlock.IsBeaconInfo)
    return (startingPoint, beaconMainChainBlock.BlockHash); // returns BKN+1

Here headers[0].Number == BKN+1. The +1 in line 98 is coincidentally correct.

Path B — walk-back (typical active beacon sync, BKN+1 is IsBeaconInfo):

// ChainLevelHelper.cs:207-208 (explicit comment):
// "Note: the starting point, points to the non-beacon info block.
//  MergeBlockDownloader does not download the first header so this is deliberate"
// Decrements startingPoint → returns (BKN, hash_of_BKN)

Here headers[0].Number == BKN. On every cache hit: desiredStart = BKN+1, offset = 1, anchor dropped.

The test only exercises Path A (start: 1, BKN=0cacheStart == BKN+1). Path B (the dominant production state) is not covered. Inline comments with exact fix suggestions have been posted on lines 98 and 36.


Verdict

Not mergeable. The critical off-by-one (finding #1) and the concealing test mock (finding #2) are both unaddressed. The medium findings (#3, #4) should also be resolved or explicitly acknowledged before merge.

long currentNumber = processDestination?.Number ?? long.MaxValue;
if (cachedHash != currentHash || cachedNumber != currentNumber) return null;

long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical — still present] Off-by-one drops the anchor on every cache hit in typical PoS sync

ChainLevelHelper.GetStartingPoint() has two exit paths:

  1. No walk-back (block at BestKnownNumber+1 is already processed): returns (BestKnownNumber+1, hash)headers[0].Number == BestKnownNumber+1. The +1 here is correct.
  2. Walk-back (typical active beacon sync — block at BestKnownNumber+1 is IsBeaconInfo): the do/while loop decrements startingPoint once to BestKnownNumber and returns (BestKnownNumber, hash) — confirmed by ChainLevelHelper.cs:207-208: "the starting point, points to the non-beacon info block. MergeBlockDownloader does not download the first header so this is deliberate". In this case headers[0].Number == BestKnownNumber and the +1 is wrong.

On a cache hit (same BestKnownNumber, walk-back path):

cacheStart    = BestKnownNumber      (from first call)
desiredStart  = BestKnownNumber + 1  (← off by one)
offset        = 1
slice         = [block@BKN+1, …]     (anchor at BKN dropped)

BlockDownloader.AssembleRequest relies on headers[0] as the parent anchor (parentHeader), iterating headers.Skip(1) for actual downloads. Dropping the anchor means the body for BKN+1 is never moved into a satisfied entry.

Suggested change
long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber);
long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber);

public void SetUp()
{
_chainLevelHelper = Substitute.For<IChainLevelHelper>();
_chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical — still present] Test mock covers the wrong code path and hides the off-by-one

BuildSequentialHeaders(start: 1, count: CachedBatchSize) with BestKnownNumber = 0 models the no-walk-back path where headers[0].Number == BestKnownNumber+1 == 1. In this shape desiredStart = BKN+1 = 1 = cacheStart, so offset = 0 and the off-by-one is invisible.

Production uses the walk-back path (BestKnownNumber+1 is IsBeaconInfo) which returns headers[0].Number == BestKnownNumber == 0. The mock should match that:

Suggested change
_chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize));
_chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 0, count: CachedBatchSize));

With start: 0: cacheStart = 0 = BestKnownNumber. The buggy desiredStart = BKN+1 = 1 ≠ 0offset = 1 → anchor dropped. The test would then fail, exposing the bug. The Cached_slice_advances_with_best_known_number assertion on line 107 (second[0]!.Number.Should().Be(21)) would also need to change to Be(20) once the code fix is applied.

@LukaszRozmej

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Claude finished @LukaszRozmej's task in 3m 9s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Load tools
  • Read current state of changed files
  • Check git log for new commits since last review
  • Verify status of all prior findings
  • Post final re-review

The PR has been updated with a new commit (556a96a fix(merge-sync): preserve anchor block on PosForwardHeaderProvider cache hit) that addresses all three critical/blocking findings from prior reviews.

Resolved findings

# Severity Finding Status
1 Critical Off-by-one: desiredStart = BKN+1 drops anchor on cache hit in walk-back path FixeddesiredStart = BestKnownNumber (line 109), with explanation comment
2 Critical Test mock start: 1/BKN=0 modelled no-walk-back path, hiding the bug Fixed — mock now uses start: 0, assertion updated to .Be(20) (line 109 test)
3 Medium ValidateSeals not re-run per slice; terminal-block check frozen at fill time FixedValidateSeals(slice!, cancellation) added on cache hits (lines 58–60)
5 Low skipLastN semantics drift: original trimmed chain tail; PR front-capped cached window FixedskipLastBlockCount: skipLastN forwarded to ChainLevelHelper; only full-sized batches cached (lines 69, 81)

Remaining findings

Medium — acknowledged, not fixed

4. Cache invalidation misses intra-range reorgs (PosForwardHeaderProvider.cs:103)

A competing FCU that rewrites part of the canonical chain inside the cached [BKN, BKN+fetchSize] window without advancing ProcessDestination to a new hash/number will leave stale headers in the cache. BlockDownloader already subscribes to BlockTree.NewHeadBlock; subscribing there and evicting the cache on reorg is more robust. The PR remarks explicitly acknowledge this with the rationale that ProcessDestination identity covers the dominant NewPayload/ForkchoiceUpdated path.


Nits — not blockers

6. MinCachedHeaderBatchSize = 32 duplicated from base class (PosForwardHeaderProvider.cs:33 vs PowForwardHeaderProvider.cs:40) — hoist to PowForwardHeaderProvider or a shared constant.

7. cached.Length == 0 guard on line 98 is unreachableUpdateCache guards on MinCachedHeaderBatchSize, so the array is always non-empty when cached.

8. Double allocation per cache hitnew BlockHeader[take] in TryServeFromCache (line 119) + ToPooledList(slice.Length) at the call site (line 62). Consider populating ArrayPoolList directly or passing the pre-allocated array to avoid one copy.


Verdict

The three critical/blocking findings from prior reviews are all addressed. The only remaining medium concern (#4, reorg invalidation) is explicitly acknowledged in the PR remarks with a valid rationale. Nits are non-blocking.

The code is sound to merge; whether to require the NewHeadBlock subscription before merge is a judgment call for the reviewer.

Fetches a 4x batch from ChainLevelHelper, slices per-call, and serves
from cache while BeaconPivot.ProcessDestination is unchanged and the
desired start sits inside the cached range. Closes NethermindEth#8573.
…che hit

`ChainLevelHelper.GetStartingPoint` returns the anchor at `BestKnownNumber`
in the active beacon-sync walk-back path, so the cache stores the anchor at
index 0. Using `BestKnownNumber + 1` as the serve start dropped that anchor
on every cache hit, breaking `BlockDownloader.AssembleRequest` which relies
on `headers[0]` as the parent header.

Additional fixes:
- Forward `skipLastN` to `ChainLevelHelper` and only cache full-sized
  batches, restoring the original chain-tip exclusion semantics.
- Re-run `ValidateSeals` on each served slice so terminal-block and
  random-index checks are not frozen at fill time.

Tests updated so the mock models the walk-back path (`start: 0` matching
`BestKnownNumber = 0`); the assertion `second[0].Number == 20` now exercises
the anchor inclusion that previously slipped through the off-by-one.
@0xDevNinja 0xDevNinja force-pushed the feature/8573-pos-forward-header-cache branch from 556a96a to 99af2db Compare May 25, 2026 06:06
@LukaszRozmej

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Claude finished @LukaszRozmej's task in 2m 10s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Read current state of changed files
  • Check git log for new commits since last review
  • Verify status of all prior findings
  • Post final re-review

The branch now has two commits: b52fdcf (original perf change) and 99af2db (fix commit addressing review feedback). All critical and medium findings from the previous round have been addressed.

Resolved findings

# Severity Finding Status
1 Critical Off-by-one in TryServeFromCache (desiredStart = BKN+1) drops anchor on cache hit Fixedlong desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber) with explanatory comment (line 109)
2 Critical Test mock used start: 1/BKN=0 (no-walk-back shape), hiding the bug Fixed — mock now uses start: 0 matching the walk-back path; assertion correctly updated to .Be(20)
3 Medium ValidateSeals not re-run per slice; terminal-block forced check skipped at slice boundaries FixedValidateSeals(slice!, cancellation) on cache hits (lines 58–60)
5 Low skipLastN semantics drift: original trimmed from chain tail; PR front-capped cached window FixedskipLastBlockCount: skipLastN forwarded to ChainLevelHelper (line 69); only full-sized batches cached (line 81)

Remaining: acknowledged medium

4. Cache invalidation misses intra-range reorgs — a competing FCU that rewrites part of the canonical chain inside the cached range without changing ProcessDestination (hash or number) leaves stale headers in the cache. The PR remarks explicitly acknowledge this and note the dominant NewPayload/ForkchoiceUpdated path is covered. Subscribing to BlockTree.NewHeadBlock (as BlockDownloader already does) would be more robust, but is an explicit, reasoned trade-off not a surprise.

Remaining nits (non-blocking)

6. MinCachedHeaderBatchSize = 32 duplicatedPosForwardHeaderProvider line 33 and PowForwardHeaderProvider line 40 both define this constant. Hoist to the base class.

7. cached.Length == 0 guard is unreachable (line 98) — UpdateCache only stores arrays with at least MinCachedHeaderBatchSize elements; the cached is null branch is sufficient.

8. Double allocation per cache hitnew BlockHeader[take] in TryServeFromCache (line 119) plus ToPooledList(slice.Length) at the call site (line 62). Consider populating ArrayPoolList directly to save one allocation per hit.


Verdict

Mergeable. All three critical/blocking findings from prior reviews are resolved. The remaining medium concern (#4, reorg invalidation) is explicitly acknowledged in the PR with a valid rationale. Nits are non-blocking.

- Hoist MinCachedHeaderBatchSize from PosForwardHeaderProvider to
  PowForwardHeaderProvider (protected const), removing the duplicate.
- Drop unreachable `cached.Length == 0` guard in TryServeFromCache;
  UpdateCache only stores arrays >= MinCachedHeaderBatchSize, so the
  null check alone suffices.
- Eliminate double allocation on cache hit by making TryServeFromCache
  populate an ArrayPoolList directly and return it, instead of
  allocating BlockHeader[] then a second time via ToPooledList.
@flcl42

flcl42 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Could you check the following:

  1. src/Nethermind/Nethermind.Merge.Plugin/Synchronization/PosForwardHeaderProvider.cs:54-59 allocates an ArrayPoolList cache-hit slice, then calls ValidateSeals; if validation or cancellation throws, ownership is never returned to the caller and the pooled list is leaked. Wrap validation in try/catch and dispose the slice before rethrowing.
  2. src/Nethermind/Nethermind.Merge.Plugin.Test/Synchronization/PosForwardHeaderProviderCacheTests.cs:70 and similar calls ignore returned IOwnedReadOnlyList<BlockHeader?> values; the contract requires callers to dispose returned owned lists. Store them in using declarations, even when the test only verifies call counts.
  3. src/Nethermind/Nethermind.Merge.Plugin.Test/Synchronization/PosForwardHeaderProviderCacheTests.cs:114-122 does not actually prove skipLastN is honored: with Requested = 16, the first 16 cached headers are returned even if skipLastN is ignored. Move BestKnownNumber near the cache tail or request enough headers to make tail trimming observable, then assert the exact last header/count.

LukaszRozmej and others added 2 commits May 28, 2026 12:41
…on intra-range reorg

- Wrap the per-slice `ValidateSeals` call in try/catch so the `ArrayPoolList`
  is returned to the pool on cancellation/seal-validator exceptions instead
  of leaking the underlying buffer.
- Subscribe to `BlockTree.OnUpdateMainChain` and drop the cache whenever the
  first block of a main-chain update has a different hash than the cached
  header at the same number — catches reorgs that don't change
  `BeaconPivot.ProcessDestination`. Required converting the primary
  constructor to an explicit one and implementing `IDisposable` so the
  subscription is released for non-singleton consumers (tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…over reorg invalidation

- Dispose every returned `IOwnedReadOnlyList` via `using` so the pooled
  buffers are returned to `ArrayPool.Shared`.
- Extract `Get`, `RaiseMainChainUpdate`, and `AssertChainLevelCalls` helpers
  to remove the per-test boilerplate around the common cache-hit/miss pattern.
- `SkipLastN_is_honored_on_cache_hit`: request `CachedBatchSize` headers so
  the `min(available, maxHeader)` front-cap actually exposes `skipLastN`
  (previous `Requested = 16` made the test pass even if `skipLastN` was
  silently dropped) and assert exact count plus tail header number.
- Add `Cache_is_invalidated_on_reorg_within_cached_range` and
  `Cache_survives_main_chain_update_that_matches_cached_hash` to cover the
  new `OnUpdateMainChain` invalidation path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukaszRozmej

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Claude finished @LukaszRozmej's task in 3m 50s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Read current state of changed files
  • Check git log for new commits since last review (May 25)
  • Verify status of all prior findings + flcl42's new concerns
  • Post final re-review

Three new commits have landed since the last review (May 25):

Commit Message
8e27c13 perf(merge-sync): address PR #11617 review nits
eb2aacb fix(merge-sync): dispose slice on validation throw, invalidate cache on intra-range reorg
865824e test(merge-sync): dedup, dispose owned lists, strengthen SkipLastN, cover reorg invalidation

flcl42's Concerns (from May 27)

# Finding Status
f1 ArrayPoolList slice leaked if ValidateSeals throws Fixedtry { ValidateSeals(slice, ct); } catch { slice.Dispose(); throw; } at lines 73–83
f2 Test methods ignored IOwnedReadOnlyList return values without disposing Fixed — all Get() calls in tests now use using declarations
f3 SkipLastN test didn't make skipLastN observable (request was too small to expose tail-trim) Fixed — new test requests max: CachedBatchSize (64) with skip: 4; asserts Count == 60 and last.Number == 59. A no-op skipLastN would give Count == 64, so the assertion is now meaningful

All Prior Findings

# Severity Finding Status
1 Critical Off-by-one desiredStart = BKN+1 drops anchor on cache hit Fixedlong desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber) (line 132)
2 Critical Test mock start:1/BKN=0 hid the bug Fixed — mock uses start: 0; assertion .Be(20)
3 Medium ValidateSeals not re-run per cache-hit slice FixedValidateSeals(slice, cancellation) called on every cache hit
4 Medium Cache invalidation missed intra-range reorgs FixedBlockTreeOnUpdateMainChain handler implemented (lines 160–184); two new tests cover reorg invalidation and cache survival on matching-hash update
5 Low skipLastN semantics drift from tail-trim to front-cap FixedskipLastBlockCount: skipLastN forwarded to ChainLevelHelper (line 92); only full-sized batches cached (line 104)
6 Nit MinCachedHeaderBatchSize duplicated FixedPosForwardHeaderProvider now inherits the constant from PowForwardHeaderProvider
7 Nit Dead cached.Length == 0 guard Fixed — removed
8 Nit Double allocation on cache hit FixedTryServeFromCache now directly populates a single ArrayPoolList<BlockHeader?>

Remaining: One Low Edge Case

BlockTreeOnUpdateMainChain only checks e.Blocks[0] (line 164)

BlockTree.UpdateMainChain passes blocks in ascending order; e.Blocks[0] is the lowest-numbered new-canonical block. If a reorg starts below the cache range but extends into it (e.g., cache covers blocks 10–73, reorg replaces blocks 3–20), then firstNumber = 3 < cacheStart = 10 and the handler returns early without invalidating the stale entries for blocks 10–20.

The practical risk is low: ProcessDestination identity already catches the dominant reorg path (a ForkchoiceUpdated with a competing head will change ProcessDestination.Hash, invalidating the cache through the first guard at line 126). This gap only matters for a reorg that lands below the cached range and doesn't update ProcessDestination. In PoS that scenario is rare, but worth noting.

A simple fix would be to also check e.Blocks[^1] when firstNumber < cacheStart, or iterate over all blocks in the update that fall within the cache range. Not a blocker for merge.


Verdict

Mergeable. All critical, high, and medium findings from prior reviews are fully resolved, and all three concerns raised by flcl42 on May 27 are addressed. The remaining finding is a Low-severity edge case in the reorg handler with low practical impact given the ProcessDestination guard.

Master removed FluentAssertions in NethermindEth#11567; this PR's new
PosForwardHeaderProviderCacheTests still used Should() and failed to
compile against the updated test props. Convert to the unified
Assert.That style.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukaszRozmej

Copy link
Copy Markdown
Member

Deep review — perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

Verdict: mergeable, with one medium concern I'd want addressed and a handful of nits. The critical blockers from the prior automated rounds (off-by-one anchor drop, hidden by the test mock, ValidateSeals not re-run per slice, skipLastN semantics drift) have all been resolved in 99af2db, eb2aac and 865824e, and flcl42's three concerns (validation-throw leak, undisposed test lists, skipLastN test that didn't observe skipLastN) are all addressed too.


1. Medium — reorg invalidation handler can miss reorgs that start below the cache

PosForwardHeaderProvider.cs:160-184 (BlockTreeOnUpdateMainChain) only inspects e.Blocks[0]:

Block firstBlock = e.Blocks[0];
long firstNumber = firstBlock.Number;
...
if (firstNumber < cacheStart || firstNumber > cacheEnd) return;
int idx = (int)(firstNumber - cacheStart);
if (cached[idx].Hash != firstHash) { /* invalidate */ }

Two things to know about the event source:

  • BlockTree.UpdateMainChain (BlockTree.cs:970-1064) accepts blocks in either ascending or descending order (bool ascendingOrder = blocks[^1].Number < blocks[0].Number ? false : true). So e.Blocks[0] is not guaranteed to be the lowest-numbered block — it can be the highest.
  • For reorgs landing below the cache and extending into it (e.g. cache covers [BKN..BKN+63], reorg replaces [BKN-5..BKN+10]), in ascending order e.Blocks[0].Number = BKN-5 < cacheStart and the handler returns early. The cached entries for [BKN..BKN+10] are now stale.

In descending order the handler would catch it (the highest-numbered block lands inside the cache), but the handler doesn't rely on the order being consistent — it should work in both.

The ProcessDestination guard catches the dominant case (a ForkchoiceUpdated flipping the head changes the hash). The window where this matters is narrow — a reorg overlap that happens to leave ProcessDestination unchanged, with a start point below cacheStart and an end point inside the cache — but it does exist, and the symptom (serving stale headers) is exactly what the new handler was added to prevent.

Suggested fix — handle both directions by computing the actual range of the update, or simpler: nuke the cache on any overlap (cost is one extra ChainLevelHelper call on rare events):

long updateStart = Math.Min(e.Blocks[0].Number, e.Blocks[^1].Number);
long updateEnd   = Math.Max(e.Blocks[0].Number, e.Blocks[^1].Number);
if (updateEnd < cacheStart || updateStart > cacheEnd) return;
_cachedHeaders = null;
_cachedProcessDestinationHash = null;
_cachedProcessDestinationNumber = 0;

The matching Cache_survives_main_chain_update_that_matches_cached_hash test would need to use a non-overlapping block under that variant.


2. Low — skipLastN semantics still drift when the value varies between calls

UpdateCache stores the array returned by ChainLevelHelper.GetNextHeaders(fetchSize, _, skipLastBlockCount: skipLastN_at_fill). The cached tail is therefore at chain position chain_tip - skipLastN_at_fill.

On a cache hit with a different skipLastN_new, the tail trim is available = cached.Length - offset - skipLastN_new, which assumes the cached tail is at chain_tip. It isn't — it's skipLastN_at_fill headers below. So the effective skip from chain tip is skipLastN_at_fill + skipLastN_new, not skipLastN_new.

In practice BlockDownloader passes a fixed fastSyncLag through one sync session (BlockDownloader.cs:155), so skipLastN is stable per session and this drift doesn't bite. Worth either a // callers must use a stable skipLastN within a cache lifetime note on GetBlockHeaders, or invalidating the cache when skipLastN_new != skipLastN_at_fill.


3. Low — BuildSlice's skipLastN parameter is dead

PosForwardHeaderProvider.cs:107 only ever calls BuildSlice(fresh!, maxHeader, skipLastN: 0). Inline the slice logic into the cache-miss path or drop the parameter — keeping it invites future drift back into the same bug class.


4. Low — TOCTOU between TryServeFromCache and ValidateSeals returning stale headers

TryServeFromCache reads _cachedHeaders under _cacheLock, copies references into the slice, and releases the lock. If BlockTreeOnUpdateMainChain fires immediately after and nulls _cachedHeaders, the slice still holds the (now-stale) headers and is returned to BlockDownloader. The slice is internally consistent (immutable header objects), so this is not a memory-safety issue; the downstream consequence is the same as case #1 — the downloader works on stale headers until the next GetBlockHeaders call.

If you go with the aggressive-invalidation suggestion in #1, this becomes purely cosmetic. Otherwise, a one-line comment on TryServeFromCache would help.


5. Nits

  • PosForwardHeaderProvider.cs:138 — the slice copy is a hand-rolled loop. ArrayPoolList<BlockHeader?> has AddRange(Span<T>); slice.AddRange(cached.AsSpan(offset, take)) lets the JIT vectorize.
  • PosForwardHeaderProvider.cs:131int available = cached.Length - offset - skipLastN; if (available <= 1) return null; — the <= 1 (rather than <= 0) presumably reflects "anchor only, no progress headers". A short comment would be nice.
  • PosForwardHeaderProvider.cs:64_cacheLock is now genuinely load-bearing (the OnUpdateMainChain handler is fired from an event source, not serialized with BlockDownloader._requestLock). A one-line comment noting that would help future readers.
  • Nullable typingBlockHeader[]? for _cachedHeaders vs BlockHeader?[]? for fresh vs ArrayPoolList<BlockHeader?> for the slice. ChainLevelHelper.GetNextHeaders returns BlockHeader[]? (non-null elements per the interface), so fresh should be BlockHeader[]? too and the fresh! plus BlockHeader?[]? chain in GetBlockHeaders is unnecessary noise.
  • UpdateCache first guardif (headers.Length < MinCachedHeaderBatchSize) return; is reachable only from one call site that already guards on fresh.Length >= fetchSize where fetchSize >= MinCachedHeaderBatchSize. The inner guard is redundant.

6. Tests — coverage gaps

  • Descending-order OnUpdateMainChainRaiseMainChainUpdate(params Block[]) always feeds in declaration order; no test forces blocks[^1].Number < blocks[0].Number. Given the production source can fire either way, a descending-order test would have surfaced the case-1 issue.
  • Reorg starting below cache — extending into the cache is the case-1 finding above; explicit test would be: RaiseMainChainUpdate(Build.A.Block.WithNumber(cacheStart - 5)...) and expect invalidation, then a follow-up Get() triggering a re-fetch.
  • skipLastN varying between calls — first call with skipLastN=0, second with skipLastN=4, second should serve from cache and respect the new skip.

Summary

# Severity Issue Recommendation
1 Medium OnUpdateMainChain only checks e.Blocks[0]; misses descending-order events and reorgs starting below the cache Fix the handler to consider the actual update range (or nuke on any overlap)
2 Low skipLastN cached at fill time; per-call drift if value varies Document the invariant or invalidate on mismatch
3 Low BuildSlice's skipLastN parameter is dead Inline or drop the parameter
4 Low TOCTOU: slice may outlive invalidation Comment, or fold into #1
5 Nit AddRange over manual loop; lock-scope comment; nullable typing inconsistency; redundant UpdateCache guard Polish
6 Nit Missing tests: descending event order, below-cache reorg, varying skipLastN Add for hardening

I would request the case-1 fix before merge; the rest is reviewer's discretion.

- `BlockTreeOnUpdateMainChain` now scans the event for any block in the
  cached range instead of inspecting only `Blocks[0]`. `UpdateMainChain`
  can fire in either ascending or descending order, and reorgs may start
  below `cacheStart` and only extend into the cache from there — the old
  shape returned early in both situations and left stale headers in the
  cache.
- Track `skipLastN` per cache entry and invalidate when it changes; the
  cached array is the result of `ChainLevelHelper.GetNextHeaders` with a
  specific `skipLastBlockCount`, so reusing it for a different
  `skipLastN` would double-trim from the tail.
- Drop the dead `skipLastN` parameter from the cache-miss slice builder
  (it was always called with 0) and inline the slice.
- Replace the per-element copy loop with `ArrayPoolList.AddRange(span)`.
- Drop the redundant `headers.Length < MinCachedHeaderBatchSize` guard
  in `UpdateCache` — the single caller already gates on `fresh.Length
  >= fetchSize` where `fetchSize >= MinCachedHeaderBatchSize`.
- Dedup tests via a small `ExpectCalls` helper; add tests for the new
  invalidation paths (skipLastN change, ascending-order reorg starting
  below cache, descending-order reorg with top above cache).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukaszRozmej

Copy link
Copy Markdown
Member

@claude re-review

@claude

claude Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Claude finished @LukaszRozmej's task in 5m 49s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Read current state of changed files
  • Check git log for new commits since last review (May 28)
  • Assess LukaszRozmej's May 29 deep review findings against current code
  • Post final re-review

Two new commits have landed since the May 28 review:

Commit Message
431ba46 fix(merge-sync): widen reorg invalidation, key cache on skipLastN
22086a5 test(merge-sync): convert FluentAssertions to NUnit Assert.That

LukaszRozmej's May 29 Deep Review — Finding Status

# Severity Finding Status
1 Medium OnUpdateMainChain only checks e.Blocks[0]; misses descending-order events and reorgs starting below cache Fixed — handler now does a full loop scan (PosForwardHeaderProvider.cs:179-193), skipping out-of-range blocks and stopping at the first in-range block. Handles both ascending and descending order correctly, as well as reorgs starting below cacheStart.
2 Low skipLastN semantics drift when value varies between calls Fixed — cache now keyed on skipLastN via _cachedSkipLastN field (line 37); TryServeFromCache returns null on mismatch (line 128), causing a fresh fetch with the correct value.
3 Low BuildSlice's skipLastN parameter is dead ResolvedBuildSlice was removed entirely in the refactor; no dead parameter remains.
4 Low TOCTOU: slice may outlive invalidation Acknowledged / cosmeticTryServeFromCache copies refs into the slice under lock, then releases; a concurrent BlockTreeOnUpdateMainChain could null _cachedHeaders after. The slice is internally consistent (immutable headers), and the worst case is one stale sync iteration that self-corrects. With finding #1 now fixed, this is purely cosmetic as noted in prior reviews.
5 Nit AddRange over manual loop; nullable typing; lock comment; redundant guard Mostly fixedAddRange is now used (line 148: slice.AddRange(cached.AsSpan(offset, take))); redundant MinCachedHeaderBatchSize guard in UpdateCache was removed. Remaining: _cacheLock still lacks a comment noting it is now load-bearing (the OnUpdateMainChain handler fires from an event source unsynchronized with BlockDownloader._requestLock, unlike the previous version); nullable element type on _cachedHeaders (BlockHeader?[]?) remains noisier than BlockHeader[]?.
6 Nit Missing tests: descending order, below-cache reorg, varying skipLastN Fixed — three new tests added: Cache_is_invalidated_on_ascending_reorg_starting_below_cached_range, Cache_is_invalidated_on_descending_reorg_with_top_above_cache, Cache_is_invalidated_when_skipLastN_changes.

Correctness Note on the Updated BlockTreeOnUpdateMainChain

The handler's return after finding the first block inside the cache range (line 192) is correct. In a linear chain, hash depends on parent hash, so if the first block inside the cache range matches the cached header, it's cryptographically guaranteed that no earlier block in the same canonical update changed either. A reorg that changes block X necessarily changes every descendant's hash; if the first in-range block matches, the fork point is outside the cached range. Cache_survives_main_chain_update_that_matches_cached_hash validates this path.


All Findings Across All Review Rounds

# Severity Finding Status
Critical-1 Critical Off-by-one desiredStart = BKN+1 drops anchor on cache hit Fixed (99af2db)
Critical-2 Critical Test mock start:1/BKN=0 hid the bug Fixed (99af2db)
Medium-3 Medium ValidateSeals not re-run per slice Fixed (eb2aacb)
Low-4 Low skipLastN semantics drift (tail-trim vs front-cap) Fixed (8e27c13)
Nit-5 Nit MinCachedHeaderBatchSize duplicated Fixed (8e27c13)
Nit-6 Nit Dead cached.Length == 0 guard Fixed (8e27c13)
Nit-7 Nit Double allocation per cache hit Fixed (8e27c13)
flcl42-f1 Medium ArrayPoolList slice leaked on ValidateSeals throw Fixed (eb2aacb)
flcl42-f2 Medium Undisposed IOwnedReadOnlyList in tests Fixed (865824e)
flcl42-f3 Medium skipLastN test didn't observe tail-trim Fixed (865824e)
May29-1 Medium OnUpdateMainChain single-block-check misses descending/below-cache reorgs Fixed (431ba46)
May29-2 Low skipLastN varying between calls causes drift Fixed (431ba46)
May29-3 Low Dead BuildSlice skipLastN parameter Resolved (method removed)
May29-4 Low TOCTOU: slice may outlive cache invalidation Cosmetic; no action needed
May29-5 Nit _cacheLock comment missing; nullable typing noise Remaining nits

Verdict

Mergeable. All Critical and Medium findings across all review rounds are fully resolved. The May 29 medium concern (reorg handler) is addressed in 431ba46 with a correct full-scan implementation and two regression tests. The only remaining items are Low/Nit that don't affect correctness.

`ToPooledList` was the only call site that needed it; the inlined
cache-miss slice no longer uses extension methods from that namespace.
Triggers IDE0005 in CI lint job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@svlachakis

Copy link
Copy Markdown
Contributor

@claude please check the following review

🔴 CRITICAL

C1 — TOCTOU between snapshot-under-lock and live IBeaconPivot.ProcessDestination read
lock (_cacheLock)
{
cached = _cachedHeaders;
cachedHash = _cachedProcessDestinationHash;
cachedNumber = _cachedProcessDestinationNumber;
cachedSkip = _cachedSkipLastN;
}
// gap
BlockHeader? processDestination = _beaconPivot.ProcessDestination; // outside lock
Hash256? currentHash = processDestination?.Hash;
Between the lock release and the pivot read, another thread can call UpdateCache (rewriting _cachedProcessDestinationHash) and a NewPayload
can flip ProcessDestination. The validation then compares a fresh currentHash against a stale snapshot. Fix: read
_beaconPivot.ProcessDestination inside the lock — the pivot getter is just a field read, no nested locking — and validate the relationship
in one critical section.

C2 — _cachedHeaders nulled while another thread holds a snapshot of it
BlockTreeOnUpdateMainChain zeros all four cache fields under _cacheLock. But TryServeFromCache's snapshot is used outside the lock to
compute desiredStart/cacheEnd/slice offset. A thread that snapshotted _cachedHeaders non-null right before the invalidation will continue
indexing into an array whose contents are now logically stale (same array reference, but no longer representing the current chain). Same
TOCTOU root cause as C1 — the lock must cover validation, not just the snapshot.

C3 — fresh array ownership unclear; potential ArrayPool leak on truncated batches
When fresh.Length < fetchSize the cache-write path is skipped and the method returns a new ArrayPoolList<BlockHeader?> copied from fresh.
fresh itself is whatever IChainLevelHelper.GetNextHeaders returns. If that implementation rents from ArrayPool and transfers ownership to
the caller (a common pattern in the codebase), nothing here returns it — straight pool leak on the truncation path. Either confirm
ChainLevelHelper returns a plain BlockHeader[] (not a rental), or call ArrayPool.Shared.Return(fresh) after the copy.


🟠 WARNING

W1 — ValidateSeals runs on every cache hit, defeating part of the optimisation
The cache-hit path unconditionally calls ValidateSeals(slice, cancellation), which spawns a Parallel.For and allocates a
ConcurrentQueue per call. The point of the cache was to avoid re-walking headers; spawning the full parallel seal-validation
loop on every hit erodes the win. Either: validate seals only once at cache fill, or restrict per-hit validation to the terminal + one
random index (matching the random-coverage intent without the full parallel fanout).

W2 — Double-allocation pattern: new ArrayPoolList(take) + AddRange(span)
Both the hit and miss paths do:
ArrayPoolList<BlockHeader?> slice = new(take);
slice.AddRange(cached.AsSpan(offset, take));
ArrayPoolList has a ReadOnlySpan constructor that rents once and copies in one shot. Use it; saves one allocation/initialisation per
call.

W3 — (BlockHeader?[])fresh cast is dead noise
fresh is declared BlockHeader[]. The covariant cast to BlockHeader?[] adds nothing — AsSpan(0, take) accepts the array directly and there's
an implicit covariant span conversion. Drop the cast.

W4 — IDisposable on a singleton creates a false-promise lifecycle
PosForwardHeaderProvider is registered as a singleton via the Merge plugin. Dispose() will not run in production; the OnUpdateMainChain
subscription lives for the node's lifetime — which is the intended behaviour. Adding IDisposable purely so tests can unsubscribe muddies
the contract. Either:

  • document that Dispose is test-only and a no-op in production lifetime, or
  • expose an internal Unsubscribe() method instead, gated on InternalsVisibleTo for the test assembly.

The current shape will mislead a future maintainer into treating the class as having a managed lifetime.

W5 — Test naming + missing precondition labels
Project standard is {WhatUnderTest}{AtWhatState}{ExpectedOutcome}. Rename:

  • Second_call_with_same_inputs_is_served_from_cache → GetBlockHeaders_WhenCalledTwiceWithSameInputs_ServesSecondCallFromCache
  • Cache_is_invalidated_when_process_destination_hash_changes → GetBlockHeaders_WhenProcessDestinationHashChanges_InvalidatesCache
  • Cache_is_invalidated_on_reorg_within_cached_range → GetBlockHeaders_WhenReorgWithinCachedRange_InvalidatesCache

And precondition assertions need a "why" message:
Assert.That(reorgBlock.Header.Hash, Is.Not.EqualTo(first[10]!.Hash),
"precondition: reorg block must hash differently from cached header at same height");


🟡 SUGGESTION

S1 — TryServeFromCache mixes abstraction levels
Lock + snapshot + pivot validation + arithmetic + allocation all live in one method. Extract bool TryGetCacheSnapshot(out BlockHeader[]
cached, out long start, out int skip) and have the public method call it under the same lock as the pivot validation. Reduces nesting and
makes the critical section's scope explicit.

S2 — between: _ => { } in tests
Empty delegates allocate on each test run and add semantic noise. Make between optional (between = null) or add a sibling
ExpectCacheHits(int count) overload.

S3 — _blockTree.BestKnownNumber read on every check
Virtual property, possibly synchronised internally. Hoist into a single local before/after entering the lock — both for clarity and to
avoid the property being re-evaluated mid-arithmetic.

S4 — Document the "stop on first in-range block" semantics in BlockTreeOnUpdateMainChain
The early return after the first in-range block is intentional (contiguous-prefix invariant), but easy to misread as a bug. One comment
line saves a future maintainer's reading time.

S5 — Inline cachedSkip
Read once, used once. Move the comparison into the lock-guarded block; remove the local.

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @svlachakis's task in 6m 52s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Load tools
  • Read current state of changed files
  • Read ChainLevelHelper.GetNextHeaders implementation (for C3)
  • Post findings

Findings from @svlachakis's review — verdict per item

# Reviewer severity My assessment Status
C1 Critical Medium — produces safe false misses, not false hits; previously acknowledged as Low/Cosmetic Open, acknowledged
C2 Critical Medium — one stale batch served during a concurrent reorg; self-correcting; same root cause as C1 Open, acknowledged
C3 Critical InvalidChainLevelHelper.GetNextHeaders returns headers.ToArray(), a plain heap allocation. No ArrayPool rental, no leak Not a finding
W1 Warning LowValidateSeals per-slice was an explicit correctness fix (May 21); post-merge validators return immediately Acknowledged trade-off
W2 Warning Low/Nitnew ArrayPoolList(take) + AddRange(span). Worth checking for a span constructor; not a correctness issue Open
W3 Warning Invalid — the (BlockHeader?[]) cast is required; Span<T> is invariant, so Span<BlockHeader> cannot implicitly become Span<BlockHeader?> Not a finding
W4 Warning Medium, newIDisposable on a singleton that's never disposed in production misleads about the class's managed lifetime Not previously acknowledged
W5 Warning Nit — test names don't follow {Subject}_{State}_{Outcome}; precondition assertion lacks a message Open
S1–S5 Suggestion Low/Nit — refactoring suggestions; S4 (contiguous-prefix comment) is particularly useful Open

Detail on C1/C2

Both were explicitly raised and accepted as Low/Cosmetic in the May 29 automated re-review (finding #4: "Acknowledged / cosmetic; no action needed"). I agree they are real concurrency issues but disagree with the CRITICAL label:

  • C1 — the comparison of a stale cachedHash snapshot against a fresh processDestination.Hash can only result in a false miss (unnecessary cache bypass). For a false hit (stale data served), you'd need ProcessDestination to have the same hash as the old snapshot while the cached array is actually stale — the BlockTreeOnUpdateMainChain invalidation covers that path.
  • C2 — Thread A holding a snapshot of array1 while the invalidation fires is the standard "snapshot survives invalidation" pattern. The array is internally consistent (immutable headers); the worst case is one wasted download round, after which the cache misses and fresh headers are fetched.

The fix (extend _cacheLock to cover the ProcessDestination read and comparison) is straightforward and would tighten the design. Given the existing acknowledgment it's up to reviewers whether to require it now.

Detail on W4 (new Medium finding)

IDisposable is implemented solely so tests can call _provider.Dispose() in [TearDown]. Since PosForwardHeaderProvider is registered as a singleton, Dispose() never runs in production — the OnUpdateMainChain subscription lives for the node's lifetime (which is the intended behavior). Adding IDisposable to a class that is never disposed creates a false promise: future maintainers may assume lifetime management is handled and add cleanup logic expecting it to execute. A doc comment on Dispose() explaining it is test-only, or an internal UnsubscribeForTest() method guarded by [InternalsVisibleTo], would resolve this with no functional change.


Verdict

Not mergeable as-is. W4 is a new Medium finding not previously raised or acknowledged. All other findings are either invalid (C3, W3), previously acknowledged (C1/C2, W1), or nits (W2, W5, S1–S5).

@svlachakis

Copy link
Copy Markdown
Contributor

@0xDevNinja please check claude review

- Replace IDisposable with internal UnsubscribeForTest(); the singleton
  is never disposed in production, the subscription is intentionally
  permanent, and IDisposable falsely advertises managed lifetime.
- Move the IBeaconPivot.ProcessDestination read inside _cacheLock so the
  cache snapshot and the pivot comparison happen in one critical section
  (closes the TOCTOU window between snapshot release and pivot read).
- Switch both ArrayPoolList allocations to the ReadOnlySpan<T> ctor;
  rents the buffer and copies in one step instead of new() + AddRange().
- Add a precondition message on the reorg-cache test assertion.
- One-line note on the BlockTreeOnUpdateMainChain early return:
  first in-range block's hash commits to all earlier ones, no need to
  scan further (the early return is otherwise easy to misread).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@svlachakis

Copy link
Copy Markdown
Contributor

@claude review again

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @svlachakis's task in 2m 38s —— View job


Re-review: perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders

  • Load tools
  • Read current state of changed files
  • Check git log for new commits since last review (Jun 2)
  • Verify status of all prior findings (svlachakis Jun 2 + previous rounds)
  • Post final re-review

Two new commits have landed since the Jun 2 automated review:

Commit Message
3fa5853 fix(merge-sync): drop unused Nethermind.Core.Extensions using
8d9dc25 fix(merge-sync): tighten PosForwardHeaderProvider cache contract

@svlachakis June 2 Findings — Status

# Reviewer severity My assessment Status
C1 Critical Medium Fixed_beaconPivot.ProcessDestination read moved inside _cacheLock (8d9dc25); snapshot and pivot comparison now happen atomically
C2 Critical Low/Cosmetic Effectively resolved — with C1 fixed, all validation (pivot hash check, offset/take computation) runs inside the lock; the only post-lock work is copying pre-validated, immutable header references into ArrayPoolList. The snapshot pattern is safe.
C3 Critical Invalid ChainLevelHelper.GetNextHeaders ends with headers.ToArray() — a plain heap allocation; no ArrayPool.Rent, no leak.
W1 Warning Low, accepted trade-off Still present. ValidateSeals per-slice was an explicit correctness fix (May 21) to keep terminal-block transition detection at the served window boundary. In pure PoS the validators return immediately; cost is minimal.
W2 Warning Low/Nit Fixed — both allocation sites now use the ReadOnlySpan<T> constructor (8d9dc25 lines 108, 141).
W3 Warning Invalid The (BlockHeader?[])fresh cast is necessary. Span<T> is invariant; Span<BlockHeader> cannot implicitly become Span<BlockHeader?>. Removing it would be a compile error.
W4 Warning Medium FixedIDisposable replaced with internal UnsubscribeForTest() (8d9dc25 line 248); test TearDown calls _provider.UnsubscribeForTest(). The class no longer advertises a managed lifetime it doesn't have.
W5 Warning Nit Partially addressed — precondition message added on the reorg-cache assertion (line 109). Test method names not renamed to {Subject}_{State}_{Outcome}. Remaining as a nit only.
S1–S5 Suggestion Low/Nit S4 (early-return comment) fixed (8d9dc25 line 184: "First in-range block's hash commits to all earlier blocks in this update."). S3 (dead parameter) already resolved when BuildSlice was removed. Others (S1, S2, S5) are open suggestions.

All Findings Across All Review Rounds

# Severity Finding Status
Critical-1 Critical Off-by-one desiredStart = BKN+1 drops anchor on cache hit Fixed (99af2db)
Critical-2 Critical Test mock start:1/BKN=0 hid the bug Fixed (99af2db)
Medium-3 Medium ValidateSeals not re-run per slice Fixed (eb2aacb)
Low-4 Low skipLastN semantics drift (tail-trim vs front-cap) Fixed (8e27c13)
Nit-5 Nit MinCachedHeaderBatchSize duplicated Fixed (8e27c13)
Nit-6 Nit Dead cached.Length == 0 guard Fixed (8e27c13)
Nit-7 Nit Double allocation per cache hit Fixed (8d9dc25)
flcl42-f1 Medium ArrayPoolList slice leaked on ValidateSeals throw Fixed (eb2aacb)
flcl42-f2 Medium Undisposed IOwnedReadOnlyList in tests Fixed (865824e)
flcl42-f3 Medium skipLastN test didn't observe tail-trim Fixed (865824e)
May29-1 Medium OnUpdateMainChain single-block-check misses descending/below-cache reorgs Fixed (431ba46)
May29-2 Low skipLastN varying between calls causes drift Fixed (431ba46)
May29-3 Low Dead BuildSlice skipLastN parameter Resolved (method removed)
May29-4 Low TOCTOU: slice may outlive cache invalidation Resolved by C1 fix; post-lock copy is of pre-validated immutable data
Jun2-C1 Medium TOCTOU: ProcessDestination read outside lock Fixed (8d9dc25)
Jun2-C2 Low Snapshot outlives invalidation Acceptable snapshot pattern; safe
Jun2-C3 Invalid ArrayPool leak on truncation N/A — heap-allocated array
Jun2-W1 Low ValidateSeals per cache hit Accepted trade-off
Jun2-W2 Nit Double allocation per hit Fixed (8d9dc25)
Jun2-W3 Invalid Cast is dead noise N/A — cast is required
Jun2-W4 Medium IDisposable on singleton Fixed (8d9dc25)
Jun2-W5 Nit Test naming; precondition message Precondition message added; names unchanged (nit)

Verdict

Mergeable. All Critical and Medium findings across all review rounds are fully resolved. The two new Medium concerns from @svlachakis's June 2 review (C1 TOCTOU, W4 IDisposable) are both addressed in 8d9dc25. The three "Critical" findings (C1, C2, C3) were correctly reassessed as Medium/Low/Invalid respectively, and all Medium items are now closed. The remaining open items are Low accepted trade-offs or Nits.

@LukaszRozmej LukaszRozmej merged commit 9869a75 into NethermindEth:master Jun 2, 2026
545 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: Cache PosForwardHeaderProvider.GetBlockHeaders

5 participants