perf(merge-sync): cache PosForwardHeaderProvider.GetBlockHeaders#11617
Conversation
Deep reviewVerdict: request changes. The optimization is sound in concept, but the cache slicing has an off-by-one that drops the anchor block on every cache hit in the typical PoS-sync state (the walk-back case). The test suite doesn't catch it because the mock simulates a non-walk-back scenario that doesn't occur in real sync. 1. Critical: off-by-one in
|
| # | Severity | Issue |
|---|---|---|
| 1 | Blocker | Off-by-one in desiredStart drops the anchor on cache hits in the walk-back path (typical PoS sync). Test mock hides it. |
| 2 | Medium | ValidateSeals no longer runs per slice — terminal-transition forced check may be skipped. |
| 3 | Low | skipLastN semantics drift from tail-trim to front-cap. |
| 4 | Medium | Cache invalidation misses reorgs that don't touch ProcessDestination; consider subscribing to BlockTree.NewHeadBlock. |
| 5 | Nit | Const duplication, double allocation on hit, dead branch, nullable-shape cleanup. |
Fixing #1 and adding a walk-back regression test (mock start: 0 so the anchor is included) would unblock; #4 worth revisiting before merge.
30acbd5 to
376311a
Compare
|
@claude review |
|
Claude finished @LukaszRozmej's task in 3m 32s —— View job Code Review:
|
| long currentNumber = processDestination?.Number ?? long.MaxValue; | ||
| if (cachedHash != currentHash || cachedNumber != currentNumber) return null; | ||
|
|
||
| long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber); |
There was a problem hiding this comment.
Critical: off-by-one drops the anchor block on every cache hit
ChainLevelHelper.GetStartingPoint() always returns the last processed (non-beacon) block as the anchor. In the typical active-sync state where BestKnownNumber + 1 is a beacon-info block, the walk-back loop returns (BestKnownNumber, hash_of_BestKnownNumber), so fresh[0].Number == BestKnownNumber. The cache therefore stores the anchor at index 0.
On the first call (cache miss) BuildSlice correctly returns [block@BestKnown, …]. On subsequent calls (cache hit, same BestKnownNumber):
desiredStart = BestKnownNumber + 1 // ← wrong
offset = 1
slice = [block@BestKnown+1, …] // anchor dropped
BlockDownloader relies on headers[0] being the already-known anchor that seeds AssembleSatisfiedEntries / AssembleRequest. Dropping it means the body for BestKnown+1 is never moved to a satisfied entry and lingers in _downloadRequests, and parentHeader becomes a beacon-only header on the next iteration.
Fix:
| long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber); | |
| long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber); |
| public void SetUp() | ||
| { | ||
| _chainLevelHelper = Substitute.For<IChainLevelHelper>(); | ||
| _chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize)); |
There was a problem hiding this comment.
Test mock hides the off-by-one bug
The mock returns BuildSequentialHeaders(start: 1, count: 64) with BestKnownNumber = 0. This means desiredStart = Math.Min(0+1, 1000) = 1 = cacheStart, so offset = 0 — the exact case that avoids the off-by-one.
In production, ChainLevelHelper.GetStartingPoint() returns the anchor at BestKnownNumber, so fresh[0].Number == BestKnownNumber. The mock should match this:
| _chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize)); | |
| _chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 0, count: CachedBatchSize)); |
With this change cacheStart = 0 = BestKnownNumber, so desiredStart = BestKnownNumber + 1 = 1 ≠ 0 and the off-by-one is exposed. The Cached_slice_advances_with_best_known_number assertion second[0]!.Number.Should().Be(21) would also need updating to Be(20) (the anchor at the new BestKnownNumber).
|
|
||
| // Fetch a larger batch than asked so subsequent peer allocations can be served from the cache. | ||
| int fetchSize = Math.Max(maxHeader * CacheBatchMultiplier, MinCachedHeaderBatchSize); | ||
| BlockHeader?[]? fresh = chainLevelHelper.GetNextHeaders(fetchSize, long.MaxValue, skipLastBlockCount: 0); |
There was a problem hiding this comment.
Medium: skipLastN semantics differ from the original
The original code forwarded skipLastN directly to ChainLevelHelper as skipLastBlockCount, which appended skipLastN extra headers to the fetch window and then trimmed them from the tail — effectively ensuring skipLastN blocks at the tip of the available chain are not included.
With skipLastBlockCount: 0 here, the fetch retrieves fetchSize headers right up to the tip, caches them all, and then trims skipLastN at serve time from the end of the cached window. In the end-of-chain regime where ChainLevelHelper returns fewer than fetchSize headers (i.e. close to ProcessDestination), the two semantics diverge: the original would exclude the last skipLastN of the available chain; this version may include them if the slice happens to reach the end of the cache.
At minimum worth a comment; worst case it may serve headers too close to an in-progress finalization boundary.
|
|
||
| // Alternatively we can do this in BeaconHeadersSyncFeed, but this seems easier. | ||
| ValidateSeals(headers!, cancellation); | ||
| ValidateSeals(fresh!, cancellation); |
There was a problem hiding this comment.
Medium: ValidateSeals no longer runs on each served slice
The original code called ValidateSeals on every returned batch. Post-merge seal validation is a no-op, but the method also contains the terminal-block transition check:
bool terminalBlock = !lastBlock && headers[i+1].Difficulty == 0 && headers[i].Difficulty != 0;
bool forceValidation = lastBlock || i == randomNumberForValidation || terminalBlock;Running it once on the large fresh array means: (a) the forced last-element check and the random-index check are fixed at fill time, not per-slice; (b) terminal-block transitions at a slice boundary (not present in the original fetch) are never force-validated on cache hits. This is safe in pure PoS but is a latent risk in the merge-transition region. Consider calling ValidateSeals(slice, cancellation) on cache hits too — it's cheap post-merge and restores the original safety invariant.
376311a to
d9849ce
Compare
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 3m 48s —— View job Re-review:
|
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | Critical | Off-by-one in TryServeFromCache (desiredStart = BKN+1) drops anchor on cache hit in walk-back path |
Still present — line 98 unchanged |
| 2 | Critical | Test mock uses start: 1 / BKN=0 (no-walk-back shape) and hides the bug |
Still present — line 36 unchanged |
| 3 | Medium | ValidateSeals not re-run per slice; terminal-transition forced check missed at slice boundaries |
Still present — line 72 unchanged |
| 4 | Medium | Cache invalidation misses intra-range reorgs that don't change ProcessDestination |
Still present |
| 5 | Low | skipLastN semantics drift: original trimmed from chain tail; PR front-caps the cached window |
Still present — line 64 unchanged |
| 6 | Nit | MinCachedHeaderBatchSize = 32 duplicated from PowForwardHeaderProvider (base class already has it at line 40) |
Still present — line 33 unchanged |
| 7 | Nit | cached.Length == 0 branch on line 91 is unreachable (UpdateCache guards on MinCachedHeaderBatchSize) |
Still present |
| 8 | Nit | Double allocation per cache hit (new BlockHeader[take] + ToPooledList) |
Still present |
Critical Analysis (re-confirmed against ChainLevelHelper.cs)
Reading ChainLevelHelper.GetStartingPoint() directly confirms the blocker. The method has two exit paths:
Path A — no walk-back (block at BKN+1 is already processed, !IsBeaconInfo):
// line 179-182
if (!beaconMainChainBlock.IsBeaconInfo)
return (startingPoint, beaconMainChainBlock.BlockHash); // returns BKN+1Here headers[0].Number == BKN+1. The +1 in line 98 is coincidentally correct.
Path B — walk-back (typical active beacon sync, BKN+1 is IsBeaconInfo):
// ChainLevelHelper.cs:207-208 (explicit comment):
// "Note: the starting point, points to the non-beacon info block.
// MergeBlockDownloader does not download the first header so this is deliberate"
// Decrements startingPoint → returns (BKN, hash_of_BKN)
Here headers[0].Number == BKN. On every cache hit: desiredStart = BKN+1, offset = 1, anchor dropped.
The test only exercises Path A (start: 1, BKN=0 → cacheStart == BKN+1). Path B (the dominant production state) is not covered. Inline comments with exact fix suggestions have been posted on lines 98 and 36.
Verdict
Not mergeable. The critical off-by-one (finding #1) and the concealing test mock (finding #2) are both unaddressed. The medium findings (#3, #4) should also be resolved or explicitly acknowledged before merge.
| long currentNumber = processDestination?.Number ?? long.MaxValue; | ||
| if (cachedHash != currentHash || cachedNumber != currentNumber) return null; | ||
|
|
||
| long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber); |
There was a problem hiding this comment.
[Critical — still present] Off-by-one drops the anchor on every cache hit in typical PoS sync
ChainLevelHelper.GetStartingPoint() has two exit paths:
- No walk-back (block at
BestKnownNumber+1is already processed): returns(BestKnownNumber+1, hash)→headers[0].Number == BestKnownNumber+1. The+1here is correct. - Walk-back (typical active beacon sync — block at
BestKnownNumber+1isIsBeaconInfo): the do/while loop decrementsstartingPointonce toBestKnownNumberand returns(BestKnownNumber, hash)— confirmed byChainLevelHelper.cs:207-208: "the starting point, points to the non-beacon info block. MergeBlockDownloader does not download the first header so this is deliberate". In this caseheaders[0].Number == BestKnownNumberand the+1is wrong.
On a cache hit (same BestKnownNumber, walk-back path):
cacheStart = BestKnownNumber (from first call)
desiredStart = BestKnownNumber + 1 (← off by one)
offset = 1
slice = [block@BKN+1, …] (anchor at BKN dropped)
BlockDownloader.AssembleRequest relies on headers[0] as the parent anchor (parentHeader), iterating headers.Skip(1) for actual downloads. Dropping the anchor means the body for BKN+1 is never moved into a satisfied entry.
| long desiredStart = Math.Min(_blockTree.BestKnownNumber + 1, currentNumber); | |
| long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber); |
| public void SetUp() | ||
| { | ||
| _chainLevelHelper = Substitute.For<IChainLevelHelper>(); | ||
| _chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize)); |
There was a problem hiding this comment.
[Critical — still present] Test mock covers the wrong code path and hides the off-by-one
BuildSequentialHeaders(start: 1, count: CachedBatchSize) with BestKnownNumber = 0 models the no-walk-back path where headers[0].Number == BestKnownNumber+1 == 1. In this shape desiredStart = BKN+1 = 1 = cacheStart, so offset = 0 and the off-by-one is invisible.
Production uses the walk-back path (BestKnownNumber+1 is IsBeaconInfo) which returns headers[0].Number == BestKnownNumber == 0. The mock should match that:
| _chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 1, count: CachedBatchSize)); | |
| _chainLevelHelper.GetNextHeaders(default, default, default).ReturnsForAnyArgs(_ => BuildSequentialHeaders(start: 0, count: CachedBatchSize)); |
With start: 0: cacheStart = 0 = BestKnownNumber. The buggy desiredStart = BKN+1 = 1 ≠ 0 → offset = 1 → anchor dropped. The test would then fail, exposing the bug. The Cached_slice_advances_with_best_known_number assertion on line 107 (second[0]!.Number.Should().Be(21)) would also need to change to Be(20) once the code fix is applied.
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 3m 9s —— View job Re-review:
|
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | Off-by-one: desiredStart = BKN+1 drops anchor on cache hit in walk-back path |
Fixed — desiredStart = BestKnownNumber (line 109), with explanation comment |
|
| 2 | Test mock start: 1/BKN=0 modelled no-walk-back path, hiding the bug |
Fixed — mock now uses start: 0, assertion updated to .Be(20) (line 109 test) |
|
| 3 | ValidateSeals not re-run per slice; terminal-block check frozen at fill time |
Fixed — ValidateSeals(slice!, cancellation) added on cache hits (lines 58–60) |
|
| 5 | skipLastN semantics drift: original trimmed chain tail; PR front-capped cached window |
Fixed — skipLastBlockCount: skipLastN forwarded to ChainLevelHelper; only full-sized batches cached (lines 69, 81) |
Remaining findings
Medium — acknowledged, not fixed
4. Cache invalidation misses intra-range reorgs (PosForwardHeaderProvider.cs:103)
A competing FCU that rewrites part of the canonical chain inside the cached [BKN, BKN+fetchSize] window without advancing ProcessDestination to a new hash/number will leave stale headers in the cache. BlockDownloader already subscribes to BlockTree.NewHeadBlock; subscribing there and evicting the cache on reorg is more robust. The PR remarks explicitly acknowledge this with the rationale that ProcessDestination identity covers the dominant NewPayload/ForkchoiceUpdated path.
Nits — not blockers
6. MinCachedHeaderBatchSize = 32 duplicated from base class (PosForwardHeaderProvider.cs:33 vs PowForwardHeaderProvider.cs:40) — hoist to PowForwardHeaderProvider or a shared constant.
7. cached.Length == 0 guard on line 98 is unreachable — UpdateCache guards on MinCachedHeaderBatchSize, so the array is always non-empty when cached.
8. Double allocation per cache hit — new BlockHeader[take] in TryServeFromCache (line 119) + ToPooledList(slice.Length) at the call site (line 62). Consider populating ArrayPoolList directly or passing the pre-allocated array to avoid one copy.
Verdict
The three critical/blocking findings from prior reviews are all addressed. The only remaining medium concern (#4, reorg invalidation) is explicitly acknowledged in the PR remarks with a valid rationale. Nits are non-blocking.
The code is sound to merge; whether to require the NewHeadBlock subscription before merge is a judgment call for the reviewer.
Fetches a 4x batch from ChainLevelHelper, slices per-call, and serves from cache while BeaconPivot.ProcessDestination is unchanged and the desired start sits inside the cached range. Closes NethermindEth#8573.
…che hit `ChainLevelHelper.GetStartingPoint` returns the anchor at `BestKnownNumber` in the active beacon-sync walk-back path, so the cache stores the anchor at index 0. Using `BestKnownNumber + 1` as the serve start dropped that anchor on every cache hit, breaking `BlockDownloader.AssembleRequest` which relies on `headers[0]` as the parent header. Additional fixes: - Forward `skipLastN` to `ChainLevelHelper` and only cache full-sized batches, restoring the original chain-tip exclusion semantics. - Re-run `ValidateSeals` on each served slice so terminal-block and random-index checks are not frozen at fill time. Tests updated so the mock models the walk-back path (`start: 0` matching `BestKnownNumber = 0`); the assertion `second[0].Number == 20` now exercises the anchor inclusion that previously slipped through the off-by-one.
556a96a to
99af2db
Compare
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 2m 10s —— View job Re-review:
|
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | Off-by-one in TryServeFromCache (desiredStart = BKN+1) drops anchor on cache hit |
Fixed — long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber) with explanatory comment (line 109) |
|
| 2 | Test mock used start: 1/BKN=0 (no-walk-back shape), hiding the bug |
Fixed — mock now uses start: 0 matching the walk-back path; assertion correctly updated to .Be(20) |
|
| 3 | ValidateSeals not re-run per slice; terminal-block forced check skipped at slice boundaries |
Fixed — ValidateSeals(slice!, cancellation) on cache hits (lines 58–60) |
|
| 5 | skipLastN semantics drift: original trimmed from chain tail; PR front-capped cached window |
Fixed — skipLastBlockCount: skipLastN forwarded to ChainLevelHelper (line 69); only full-sized batches cached (line 81) |
Remaining: acknowledged medium
4. Cache invalidation misses intra-range reorgs — a competing FCU that rewrites part of the canonical chain inside the cached range without changing ProcessDestination (hash or number) leaves stale headers in the cache. The PR remarks explicitly acknowledge this and note the dominant NewPayload/ForkchoiceUpdated path is covered. Subscribing to BlockTree.NewHeadBlock (as BlockDownloader already does) would be more robust, but is an explicit, reasoned trade-off not a surprise.
Remaining nits (non-blocking)
6. MinCachedHeaderBatchSize = 32 duplicated — PosForwardHeaderProvider line 33 and PowForwardHeaderProvider line 40 both define this constant. Hoist to the base class.
7. cached.Length == 0 guard is unreachable (line 98) — UpdateCache only stores arrays with at least MinCachedHeaderBatchSize elements; the cached is null branch is sufficient.
8. Double allocation per cache hit — new BlockHeader[take] in TryServeFromCache (line 119) plus ToPooledList(slice.Length) at the call site (line 62). Consider populating ArrayPoolList directly to save one allocation per hit.
Verdict
Mergeable. All three critical/blocking findings from prior reviews are resolved. The remaining medium concern (#4, reorg invalidation) is explicitly acknowledged in the PR with a valid rationale. Nits are non-blocking.
- Hoist MinCachedHeaderBatchSize from PosForwardHeaderProvider to PowForwardHeaderProvider (protected const), removing the duplicate. - Drop unreachable `cached.Length == 0` guard in TryServeFromCache; UpdateCache only stores arrays >= MinCachedHeaderBatchSize, so the null check alone suffices. - Eliminate double allocation on cache hit by making TryServeFromCache populate an ArrayPoolList directly and return it, instead of allocating BlockHeader[] then a second time via ToPooledList.
|
Could you check the following:
|
…on intra-range reorg - Wrap the per-slice `ValidateSeals` call in try/catch so the `ArrayPoolList` is returned to the pool on cancellation/seal-validator exceptions instead of leaking the underlying buffer. - Subscribe to `BlockTree.OnUpdateMainChain` and drop the cache whenever the first block of a main-chain update has a different hash than the cached header at the same number — catches reorgs that don't change `BeaconPivot.ProcessDestination`. Required converting the primary constructor to an explicit one and implementing `IDisposable` so the subscription is released for non-singleton consumers (tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…over reorg invalidation - Dispose every returned `IOwnedReadOnlyList` via `using` so the pooled buffers are returned to `ArrayPool.Shared`. - Extract `Get`, `RaiseMainChainUpdate`, and `AssertChainLevelCalls` helpers to remove the per-test boilerplate around the common cache-hit/miss pattern. - `SkipLastN_is_honored_on_cache_hit`: request `CachedBatchSize` headers so the `min(available, maxHeader)` front-cap actually exposes `skipLastN` (previous `Requested = 16` made the test pass even if `skipLastN` was silently dropped) and assert exact count plus tail header number. - Add `Cache_is_invalidated_on_reorg_within_cached_range` and `Cache_survives_main_chain_update_that_matches_cached_hash` to cover the new `OnUpdateMainChain` invalidation path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 3m 50s —— View job Re-review:
|
| Commit | Message |
|---|---|
8e27c13 |
perf(merge-sync): address PR #11617 review nits |
eb2aacb |
fix(merge-sync): dispose slice on validation throw, invalidate cache on intra-range reorg |
865824e |
test(merge-sync): dedup, dispose owned lists, strengthen SkipLastN, cover reorg invalidation |
flcl42's Concerns (from May 27)
| # | Finding | Status |
|---|---|---|
| f1 | ArrayPoolList slice leaked if ValidateSeals throws |
Fixed — try { ValidateSeals(slice, ct); } catch { slice.Dispose(); throw; } at lines 73–83 |
| f2 | Test methods ignored IOwnedReadOnlyList return values without disposing |
Fixed — all Get() calls in tests now use using declarations |
| f3 | SkipLastN test didn't make skipLastN observable (request was too small to expose tail-trim) |
Fixed — new test requests max: CachedBatchSize (64) with skip: 4; asserts Count == 60 and last.Number == 59. A no-op skipLastN would give Count == 64, so the assertion is now meaningful |
All Prior Findings
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | Off-by-one desiredStart = BKN+1 drops anchor on cache hit |
Fixed — long desiredStart = Math.Min(_blockTree.BestKnownNumber, currentNumber) (line 132) |
|
| 2 | Test mock start:1/BKN=0 hid the bug |
Fixed — mock uses start: 0; assertion .Be(20) |
|
| 3 | ValidateSeals not re-run per cache-hit slice |
Fixed — ValidateSeals(slice, cancellation) called on every cache hit |
|
| 4 | Cache invalidation missed intra-range reorgs | Fixed — BlockTreeOnUpdateMainChain handler implemented (lines 160–184); two new tests cover reorg invalidation and cache survival on matching-hash update |
|
| 5 | skipLastN semantics drift from tail-trim to front-cap |
Fixed — skipLastBlockCount: skipLastN forwarded to ChainLevelHelper (line 92); only full-sized batches cached (line 104) |
|
| 6 | MinCachedHeaderBatchSize duplicated |
Fixed — PosForwardHeaderProvider now inherits the constant from PowForwardHeaderProvider |
|
| 7 | Dead cached.Length == 0 guard |
Fixed — removed | |
| 8 | Double allocation on cache hit | Fixed — TryServeFromCache now directly populates a single ArrayPoolList<BlockHeader?> |
Remaining: One Low Edge Case
BlockTreeOnUpdateMainChain only checks e.Blocks[0] (line 164)
BlockTree.UpdateMainChain passes blocks in ascending order; e.Blocks[0] is the lowest-numbered new-canonical block. If a reorg starts below the cache range but extends into it (e.g., cache covers blocks 10–73, reorg replaces blocks 3–20), then firstNumber = 3 < cacheStart = 10 and the handler returns early without invalidating the stale entries for blocks 10–20.
The practical risk is low: ProcessDestination identity already catches the dominant reorg path (a ForkchoiceUpdated with a competing head will change ProcessDestination.Hash, invalidating the cache through the first guard at line 126). This gap only matters for a reorg that lands below the cached range and doesn't update ProcessDestination. In PoS that scenario is rare, but worth noting.
A simple fix would be to also check e.Blocks[^1] when firstNumber < cacheStart, or iterate over all blocks in the update that fall within the cache range. Not a blocker for merge.
Verdict
Mergeable. All critical, high, and medium findings from prior reviews are fully resolved, and all three concerns raised by flcl42 on May 27 are addressed. The remaining finding is a Low-severity edge case in the reorg handler with low practical impact given the ProcessDestination guard.
Master removed FluentAssertions in NethermindEth#11567; this PR's new PosForwardHeaderProviderCacheTests still used Should() and failed to compile against the updated test props. Convert to the unified Assert.That style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deep review —
|
| # | Severity | Issue | Recommendation |
|---|---|---|---|
| 1 | Medium | OnUpdateMainChain only checks e.Blocks[0]; misses descending-order events and reorgs starting below the cache |
Fix the handler to consider the actual update range (or nuke on any overlap) |
| 2 | Low | skipLastN cached at fill time; per-call drift if value varies |
Document the invariant or invalidate on mismatch |
| 3 | Low | BuildSlice's skipLastN parameter is dead |
Inline or drop the parameter |
| 4 | Low | TOCTOU: slice may outlive invalidation | Comment, or fold into #1 |
| 5 | Nit | AddRange over manual loop; lock-scope comment; nullable typing inconsistency; redundant UpdateCache guard |
Polish |
| 6 | Nit | Missing tests: descending event order, below-cache reorg, varying skipLastN |
Add for hardening |
I would request the case-1 fix before merge; the rest is reviewer's discretion.
- `BlockTreeOnUpdateMainChain` now scans the event for any block in the cached range instead of inspecting only `Blocks[0]`. `UpdateMainChain` can fire in either ascending or descending order, and reorgs may start below `cacheStart` and only extend into the cache from there — the old shape returned early in both situations and left stale headers in the cache. - Track `skipLastN` per cache entry and invalidate when it changes; the cached array is the result of `ChainLevelHelper.GetNextHeaders` with a specific `skipLastBlockCount`, so reusing it for a different `skipLastN` would double-trim from the tail. - Drop the dead `skipLastN` parameter from the cache-miss slice builder (it was always called with 0) and inline the slice. - Replace the per-element copy loop with `ArrayPoolList.AddRange(span)`. - Drop the redundant `headers.Length < MinCachedHeaderBatchSize` guard in `UpdateCache` — the single caller already gates on `fresh.Length >= fetchSize` where `fetchSize >= MinCachedHeaderBatchSize`. - Dedup tests via a small `ExpectCalls` helper; add tests for the new invalidation paths (skipLastN change, ascending-order reorg starting below cache, descending-order reorg with top above cache). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude re-review |
|
Claude finished @LukaszRozmej's task in 5m 49s —— View job Re-review:
|
| Commit | Message |
|---|---|
431ba46 |
fix(merge-sync): widen reorg invalidation, key cache on skipLastN |
22086a5 |
test(merge-sync): convert FluentAssertions to NUnit Assert.That |
LukaszRozmej's May 29 Deep Review — Finding Status
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | OnUpdateMainChain only checks e.Blocks[0]; misses descending-order events and reorgs starting below cache |
Fixed — handler now does a full loop scan (PosForwardHeaderProvider.cs:179-193), skipping out-of-range blocks and stopping at the first in-range block. Handles both ascending and descending order correctly, as well as reorgs starting below cacheStart. |
|
| 2 | skipLastN semantics drift when value varies between calls |
Fixed — cache now keyed on skipLastN via _cachedSkipLastN field (line 37); TryServeFromCache returns null on mismatch (line 128), causing a fresh fetch with the correct value. |
|
| 3 | BuildSlice's skipLastN parameter is dead |
Resolved — BuildSlice was removed entirely in the refactor; no dead parameter remains. |
|
| 4 | Low | TOCTOU: slice may outlive invalidation | Acknowledged / cosmetic — TryServeFromCache copies refs into the slice under lock, then releases; a concurrent BlockTreeOnUpdateMainChain could null _cachedHeaders after. The slice is internally consistent (immutable headers), and the worst case is one stale sync iteration that self-corrects. With finding #1 now fixed, this is purely cosmetic as noted in prior reviews. |
| 5 | Nit | AddRange over manual loop; nullable typing; lock comment; redundant guard |
Mostly fixed — AddRange is now used (line 148: slice.AddRange(cached.AsSpan(offset, take))); redundant MinCachedHeaderBatchSize guard in UpdateCache was removed. Remaining: _cacheLock still lacks a comment noting it is now load-bearing (the OnUpdateMainChain handler fires from an event source unsynchronized with BlockDownloader._requestLock, unlike the previous version); nullable element type on _cachedHeaders (BlockHeader?[]?) remains noisier than BlockHeader[]?. |
| 6 | Nit | Missing tests: descending order, below-cache reorg, varying skipLastN |
Fixed — three new tests added: Cache_is_invalidated_on_ascending_reorg_starting_below_cached_range, Cache_is_invalidated_on_descending_reorg_with_top_above_cache, Cache_is_invalidated_when_skipLastN_changes. |
Correctness Note on the Updated BlockTreeOnUpdateMainChain
The handler's return after finding the first block inside the cache range (line 192) is correct. In a linear chain, hash depends on parent hash, so if the first block inside the cache range matches the cached header, it's cryptographically guaranteed that no earlier block in the same canonical update changed either. A reorg that changes block X necessarily changes every descendant's hash; if the first in-range block matches, the fork point is outside the cached range. Cache_survives_main_chain_update_that_matches_cached_hash validates this path.
All Findings Across All Review Rounds
| # | Severity | Finding | Status |
|---|---|---|---|
| Critical-1 | Off-by-one desiredStart = BKN+1 drops anchor on cache hit |
Fixed (99af2db) |
|
| Critical-2 | Test mock start:1/BKN=0 hid the bug |
Fixed (99af2db) |
|
| Medium-3 | ValidateSeals not re-run per slice |
Fixed (eb2aacb) |
|
| Low-4 | skipLastN semantics drift (tail-trim vs front-cap) |
Fixed (8e27c13) |
|
| Nit-5 | MinCachedHeaderBatchSize duplicated |
Fixed (8e27c13) |
|
| Nit-6 | Dead cached.Length == 0 guard |
Fixed (8e27c13) |
|
| Nit-7 | Double allocation per cache hit | Fixed (8e27c13) |
|
| flcl42-f1 | ArrayPoolList slice leaked on ValidateSeals throw |
Fixed (eb2aacb) |
|
| flcl42-f2 | Undisposed IOwnedReadOnlyList in tests |
Fixed (865824e) |
|
| flcl42-f3 | skipLastN test didn't observe tail-trim |
Fixed (865824e) |
|
| May29-1 | OnUpdateMainChain single-block-check misses descending/below-cache reorgs |
Fixed (431ba46) |
|
| May29-2 | skipLastN varying between calls causes drift |
Fixed (431ba46) |
|
| May29-3 | Dead BuildSlice skipLastN parameter |
Resolved (method removed) | |
| May29-4 | Low | TOCTOU: slice may outlive cache invalidation | Cosmetic; no action needed |
| May29-5 | Nit | _cacheLock comment missing; nullable typing noise |
Remaining nits |
Verdict
Mergeable. All Critical and Medium findings across all review rounds are fully resolved. The May 29 medium concern (reorg handler) is addressed in 431ba46 with a correct full-scan implementation and two regression tests. The only remaining items are Low/Nit that don't affect correctness.
`ToPooledList` was the only call site that needed it; the inlined cache-miss slice no longer uses extension methods from that namespace. Triggers IDE0005 in CI lint job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude please check the following review 🔴 CRITICAL C1 — TOCTOU between snapshot-under-lock and live IBeaconPivot.ProcessDestination read C2 — _cachedHeaders nulled while another thread holds a snapshot of it C3 — fresh array ownership unclear; potential ArrayPool leak on truncated batches 🟠 WARNING W1 — ValidateSeals runs on every cache hit, defeating part of the optimisation W2 — Double-allocation pattern: new ArrayPoolList(take) + AddRange(span) W3 — (BlockHeader?[])fresh cast is dead noise W4 — IDisposable on a singleton creates a false-promise lifecycle
The current shape will mislead a future maintainer into treating the class as having a managed lifetime. W5 — Test naming + missing precondition labels
And precondition assertions need a "why" message: 🟡 SUGGESTION S1 — TryServeFromCache mixes abstraction levels S2 — between: _ => { } in tests S3 — _blockTree.BestKnownNumber read on every check S4 — Document the "stop on first in-range block" semantics in BlockTreeOnUpdateMainChain S5 — Inline cachedSkip |
|
Claude finished @svlachakis's task in 6m 52s —— View job Re-review:
|
| # | Reviewer severity | My assessment | Status |
|---|---|---|---|
| C1 | Critical | Medium — produces safe false misses, not false hits; previously acknowledged as Low/Cosmetic | Open, acknowledged |
| C2 | Critical | Medium — one stale batch served during a concurrent reorg; self-correcting; same root cause as C1 | Open, acknowledged |
| C3 | Critical | Invalid — ChainLevelHelper.GetNextHeaders returns headers.ToArray(), a plain heap allocation. No ArrayPool rental, no leak |
Not a finding |
| W1 | Warning | Low — ValidateSeals per-slice was an explicit correctness fix (May 21); post-merge validators return immediately |
Acknowledged trade-off |
| W2 | Warning | Low/Nit — new ArrayPoolList(take) + AddRange(span). Worth checking for a span constructor; not a correctness issue |
Open |
| W3 | Warning | Invalid — the (BlockHeader?[]) cast is required; Span<T> is invariant, so Span<BlockHeader> cannot implicitly become Span<BlockHeader?> |
Not a finding |
| W4 | Warning | Medium, new — IDisposable on a singleton that's never disposed in production misleads about the class's managed lifetime |
Not previously acknowledged |
| W5 | Warning | Nit — test names don't follow {Subject}_{State}_{Outcome}; precondition assertion lacks a message |
Open |
| S1–S5 | Suggestion | Low/Nit — refactoring suggestions; S4 (contiguous-prefix comment) is particularly useful | Open |
Detail on C1/C2
Both were explicitly raised and accepted as Low/Cosmetic in the May 29 automated re-review (finding #4: "Acknowledged / cosmetic; no action needed"). I agree they are real concurrency issues but disagree with the CRITICAL label:
- C1 — the comparison of a stale
cachedHashsnapshot against a freshprocessDestination.Hashcan only result in a false miss (unnecessary cache bypass). For a false hit (stale data served), you'd needProcessDestinationto have the same hash as the old snapshot while the cached array is actually stale — theBlockTreeOnUpdateMainChaininvalidation covers that path. - C2 — Thread A holding a snapshot of
array1while the invalidation fires is the standard "snapshot survives invalidation" pattern. The array is internally consistent (immutable headers); the worst case is one wasted download round, after which the cache misses and fresh headers are fetched.
The fix (extend _cacheLock to cover the ProcessDestination read and comparison) is straightforward and would tighten the design. Given the existing acknowledgment it's up to reviewers whether to require it now.
Detail on W4 (new Medium finding)
IDisposable is implemented solely so tests can call _provider.Dispose() in [TearDown]. Since PosForwardHeaderProvider is registered as a singleton, Dispose() never runs in production — the OnUpdateMainChain subscription lives for the node's lifetime (which is the intended behavior). Adding IDisposable to a class that is never disposed creates a false promise: future maintainers may assume lifetime management is handled and add cleanup logic expecting it to execute. A doc comment on Dispose() explaining it is test-only, or an internal UnsubscribeForTest() method guarded by [InternalsVisibleTo], would resolve this with no functional change.
Verdict
Not mergeable as-is. W4 is a new Medium finding not previously raised or acknowledged. All other findings are either invalid (C3, W3), previously acknowledged (C1/C2, W1), or nits (W2, W5, S1–S5).
|
@0xDevNinja please check claude review |
- Replace IDisposable with internal UnsubscribeForTest(); the singleton is never disposed in production, the subscription is intentionally permanent, and IDisposable falsely advertises managed lifetime. - Move the IBeaconPivot.ProcessDestination read inside _cacheLock so the cache snapshot and the pivot comparison happen in one critical section (closes the TOCTOU window between snapshot release and pivot read). - Switch both ArrayPoolList allocations to the ReadOnlySpan<T> ctor; rents the buffer and copies in one step instead of new() + AddRange(). - Add a precondition message on the reorg-cache test assertion. - One-line note on the BlockTreeOnUpdateMainChain early return: first in-range block's hash commits to all earlier ones, no need to scan further (the early return is otherwise easy to misread). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review again |
|
Claude finished @svlachakis's task in 2m 38s —— View job Re-review:
|
| Commit | Message |
|---|---|
3fa5853 |
fix(merge-sync): drop unused Nethermind.Core.Extensions using |
8d9dc25 |
fix(merge-sync): tighten PosForwardHeaderProvider cache contract |
@svlachakis June 2 Findings — Status
| # | Reviewer severity | My assessment | Status |
|---|---|---|---|
| C1 | Critical | Medium | Fixed — _beaconPivot.ProcessDestination read moved inside _cacheLock (8d9dc25); snapshot and pivot comparison now happen atomically |
| C2 | Critical | Low/Cosmetic | Effectively resolved — with C1 fixed, all validation (pivot hash check, offset/take computation) runs inside the lock; the only post-lock work is copying pre-validated, immutable header references into ArrayPoolList. The snapshot pattern is safe. |
| C3 | Critical | Invalid | ChainLevelHelper.GetNextHeaders ends with headers.ToArray() — a plain heap allocation; no ArrayPool.Rent, no leak. |
| W1 | Warning | Low, accepted trade-off | Still present. ValidateSeals per-slice was an explicit correctness fix (May 21) to keep terminal-block transition detection at the served window boundary. In pure PoS the validators return immediately; cost is minimal. |
| W2 | Warning | Low/Nit | Fixed — both allocation sites now use the ReadOnlySpan<T> constructor (8d9dc25 lines 108, 141). |
| W3 | Warning | Invalid | The (BlockHeader?[])fresh cast is necessary. Span<T> is invariant; Span<BlockHeader> cannot implicitly become Span<BlockHeader?>. Removing it would be a compile error. |
| W4 | Warning | Medium | Fixed — IDisposable replaced with internal UnsubscribeForTest() (8d9dc25 line 248); test TearDown calls _provider.UnsubscribeForTest(). The class no longer advertises a managed lifetime it doesn't have. |
| W5 | Warning | Nit | Partially addressed — precondition message added on the reorg-cache assertion (line 109). Test method names not renamed to {Subject}_{State}_{Outcome}. Remaining as a nit only. |
| S1–S5 | Suggestion | Low/Nit | S4 (early-return comment) fixed (8d9dc25 line 184: "First in-range block's hash commits to all earlier blocks in this update."). S3 (dead parameter) already resolved when BuildSlice was removed. Others (S1, S2, S5) are open suggestions. |
All Findings Across All Review Rounds
| # | Severity | Finding | Status |
|---|---|---|---|
| Critical-1 | Off-by-one desiredStart = BKN+1 drops anchor on cache hit |
Fixed (99af2db) |
|
| Critical-2 | Test mock start:1/BKN=0 hid the bug |
Fixed (99af2db) |
|
| Medium-3 | ValidateSeals not re-run per slice |
Fixed (eb2aacb) |
|
| Low-4 | skipLastN semantics drift (tail-trim vs front-cap) |
Fixed (8e27c13) |
|
| Nit-5 | MinCachedHeaderBatchSize duplicated |
Fixed (8e27c13) |
|
| Nit-6 | Dead cached.Length == 0 guard |
Fixed (8e27c13) |
|
| Nit-7 | Double allocation per cache hit | Fixed (8d9dc25) |
|
| flcl42-f1 | ArrayPoolList slice leaked on ValidateSeals throw |
Fixed (eb2aacb) |
|
| flcl42-f2 | Undisposed IOwnedReadOnlyList in tests |
Fixed (865824e) |
|
| flcl42-f3 | skipLastN test didn't observe tail-trim |
Fixed (865824e) |
|
| May29-1 | OnUpdateMainChain single-block-check misses descending/below-cache reorgs |
Fixed (431ba46) |
|
| May29-2 | skipLastN varying between calls causes drift |
Fixed (431ba46) |
|
| May29-3 | Dead BuildSlice skipLastN parameter |
Resolved (method removed) | |
| May29-4 | TOCTOU: slice may outlive cache invalidation | Resolved by C1 fix; post-lock copy is of pre-validated immutable data | |
| Jun2-C1 | TOCTOU: ProcessDestination read outside lock |
Fixed (8d9dc25) |
|
| Jun2-C2 | Snapshot outlives invalidation | Acceptable snapshot pattern; safe | |
| Jun2-C3 | Invalid | ArrayPool leak on truncation | N/A — heap-allocated array |
| Jun2-W1 | Low | ValidateSeals per cache hit |
Accepted trade-off |
| Jun2-W2 | Double allocation per hit | Fixed (8d9dc25) |
|
| Jun2-W3 | Invalid | Cast is dead noise | N/A — cast is required |
| Jun2-W4 | IDisposable on singleton |
Fixed (8d9dc25) |
|
| Jun2-W5 | Nit | Test naming; precondition message | Precondition message added; names unchanged (nit) |
Verdict
Mergeable. All Critical and Medium findings across all review rounds are fully resolved. The two new Medium concerns from @svlachakis's June 2 review (C1 TOCTOU, W4 IDisposable) are both addressed in 8d9dc25. The three "Critical" findings (C1, C2, C3) were correctly reassessed as Medium/Low/Invalid respectively, and all Medium items are now closed. The remaining open items are Low accepted trade-offs or Nits.
Resolves #8573
Changes
PosForwardHeaderProvider.GetBlockHeadersnow fetches a4 * maxHeaderbatch fromChainLevelHelperonce, caches it, and serves subsequent calls by slicing the cached array.IBeaconPivot.ProcessDestination(hash or number) changes, whenBestKnownNumberadvances past the cached end, or when the requested slice cannot be satisfied without going below the cached start.base.GetBlockHeaders) is unchanged; theLastResponseBatchmechanism inPowForwardHeaderProvidercontinues to drive that branch.ValidateSealsruns once per cache fill (on the freshly fetched array) rather than on every peer allocation.Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
PosForwardHeaderProviderCacheTests(5 cases) covers: cache hit on identical inputs, invalidation whenProcessDestination.Hashchanges, invalidation whenBestKnownNumberadvances past the cached range, slice advances withBestKnownNumberwhile inside the range, andskipLastNhonored on cache hits. ExistingForwardHeaderProviderTests(37/38 pass, 1 pre-existing skip) continue to pass.Documentation
Requires documentation update
Requires explanation in Release Notes
Remarks
Invalidation strategy follows the hint left in the issue: with no explicit signal for when
ChainLevelHelperwould return different headers, identity ofBeaconPivot.ProcessDestination(set byNewPayload/ForkchoiceUpdated) plusBestKnownNumberadvance covers the cases where the underlying chain-level data could have moved. Happy to swap to a more aggressive (e.g.BlockTree.NewHeadBlock) invalidation if reviewers prefer.