Skip to content

fix root hash calculation during reorgs#2080

Merged
marcello33 merged 4 commits intodevelopfrom
mardizzone/reorg_detection_roothash
Feb 25, 2026
Merged

fix root hash calculation during reorgs#2080
marcello33 merged 4 commits intodevelopfrom
mardizzone/reorg_detection_roothash

Conversation

@marcello33
Copy link
Copy Markdown
Collaborator

Description

Fix root hash calculation during bor reorgs by always checking the end block's hash instead of current header.
More context here: #1950

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 86.36364% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.61%. Comparing base (d7b2be8) to head (eca8045).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
consensus/bor/api.go 86.36% 2 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (86.36%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2080      +/-   ##
===========================================
- Coverage    50.62%   50.61%   -0.01%     
===========================================
  Files          875      875              
  Lines       151794   151799       +5     
===========================================
- Hits         76842    76835       -7     
- Misses       69879    69895      +16     
+ Partials      5073     5069       -4     
Files with missing lines Coverage Δ
consensus/bor/api.go 86.12% <86.36%> (-1.93%) ⬇️

... and 15 files with indirect coverage changes

Files with missing lines Coverage Δ
consensus/bor/api.go 86.12% <86.36%> (-1.93%) ⬇️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lucca30
Copy link
Copy Markdown
Contributor

lucca30 commented Feb 24, 2026

@claude Can you review this replacement check from the root hash to the end hash on the function and analyze the impacts? Do a deep analysis based on your deep knowledge on reorgs and how they should be handled. Get the context of the affected function and analyze the impact of this change. By the end evaluate if the tests made for these changes are compatible with the changes or if we should cover any other scenario

@claude
Copy link
Copy Markdown

claude bot commented Feb 24, 2026

Claude finished @lucca30's task in 1m 24s —— View job


Deep Analysis: Root Hash Calculation During Reorgs

  • Read and understand the changed files
  • Analyze the core fix and its implications for reorg handling
  • Examine how this relates to blockchain reorg fundamentals
  • Evaluate test coverage and identify any gaps

Summary

This PR fixes a subtle but important bug in the GetRootHash function related to cache invalidation during blockchain reorganizations. The fix changes from tracking the chain tip hash to tracking the end block hash of the checkpoint range, which is a more precise and correct approach.


Core Change Analysis

Before (Bug)

type rootHashCacheEntry struct {
    headHash common.Hash  // Tracked chain tip
    root     string
}

// Cache invalidation checked:
if entry.headHash == headHash { /* cache hit */ }

// Reorg detection checked:
newHead := api.chain.CurrentHeader()
if newHead.Hash() != headHash { /* reorg detected */ }

After (Fix)

type rootHashCacheEntry struct {
    endHash common.Hash  // Tracks end block of range
    root    string
}

// Cache invalidation checks:
if entry.endHash == endHash { /* cache hit */ }

// Reorg detection checks:
newEndHeader := api.chain.GetHeaderByNumber(end)
if newEndHeader.Hash() != endHash { /* reorg detected */ }

Why This Fix is Correct

The Problem with Tracking Head Hash

Consider a checkpoint request for blocks [100, 200] when the chain is at block 500:

Scenario: Normal block production (no reorg)

  1. Call GetRootHash(100, 200) → computes root, caches with headHash=hash(500)
  2. Chain advances to block 501
  3. Call GetRootHash(100, 200)cache miss because hash(500) != hash(501)
  4. Unnecessarily recomputes the same root hash

This caused spurious cache invalidations whenever the chain tip advanced, even though the blocks [100, 200] hadn't changed.

The Solution: Track End Block Hash

By tracking endHash (hash of block 200 in the range [100, 200]):

  1. Call GetRootHash(100, 200) → computes root, caches with endHash=hash(200)
  2. Chain advances to block 501
  3. Call GetRootHash(100, 200)cache hit because hash(200) is unchanged
  4. No recomputation needed

Reorg Detection Remains Sound

The fix also correctly updates the reorg detection mechanism:

Scenario Old Behavior New Behavior
Tip advances past end (normal) Detects as "reorg" (false positive) No false positive
Actual reorg affecting [start, end] Correctly detected Correctly detected
Reorg only affecting blocks after end Detected (unnecessary) Not detected (correct - irrelevant)

Defense in Depth: Multiple Reorg Safety Mechanisms

The function employs three layers of protection against mixed-fork data:

  1. End-block hash binding (lines 302-318): Cache entries are invalidated if the canonical block at end changes
  2. Post-computation verification (lines 351-353): After fetching all headers concurrently, re-checks that the end block hasn't changed mid-flight
  3. Continuity validation (lines 367-372): Verifies H[n].ParentHash == H[n-1].Hash() for all headers, catching any interleaved forks

This multi-layered approach is sound from a reorg-safety perspective.


Test Coverage Evaluation

Tests Added/Modified

Test Purpose Coverage
TestGetRootHash_ReorgDuringCall_ReturnsError In-flight reorg detection
TestGetRootHash_PureChainsStable Stable roots for consistent chains
TestGetRootHashCacheInvalidatedOnReorg Cache invalidation on actual reorg
TestAPI_GetRootHash_StableWhenTipAdvancesPastEnd NEW: Cache hit when tip advances

The New Test is Key

The new test TestAPI_GetRootHash_StableWhenTipAdvancesPastEnd specifically validates the fix:

  • Computes root for range [0, 3]
  • Advances tip to block 5
  • Verifies the same root is returned (cache hit)
  • Confirms only 1 additional read (the end header check), not a full recomputation

This directly tests the scenario that was broken before.


Potential Gap: Non-contiguous Header Error Test

The errNonContiguousHeaderRange path (line 371) is not explicitly tested in the reorg tests. While the continuity check is a safety mechanism that would catch mixed-fork headers, adding a test that triggers this specific error path would improve coverage.

Suggested additional test scenario:

  • Configure a mid-flight reorg that switches chains after fetching some headers but before the end-hash check
  • Verify that either errReorgDuringRootComputation or errNonContiguousHeaderRange is returned

However, this is a minor gap since:

  1. The existing TestGetRootHash_ReorgDuringCall_ReturnsError covers the end-hash detection
  2. The continuity check is a secondary safety net

Conclusion

This fix is correct and improves both performance and precision:

Aspect Assessment
Correctness ✅ Fixes spurious cache invalidation
Reorg safety ✅ Maintains protection via end-hash binding + continuity check
Performance ✅ Eliminates unnecessary recomputation when tip advances
Test coverage ✅ New test validates the specific fix
Backward compatibility ✅ No API changes, only internal behavior improvement

The change from headHash to endHash is semantically correct: the root hash for range [start, end] depends only on the blocks in that range, not on blocks that come after. Binding the cache to the end block hash precisely captures this invariant.


Copy link
Copy Markdown
Contributor

@lucca30 lucca30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lucca30
Copy link
Copy Markdown
Contributor

lucca30 commented Feb 24, 2026

govuln check is pointing some errors, also kurtosis e2e is not properly running. Would be good having both working before merging.

@marcello33
Copy link
Copy Markdown
Collaborator Author

govuln check is pointing some errors, also kurtosis e2e is not properly running. Would be good having both working before merging.

@lucca30
govuln is fixed here #2081
And kurtosis is failing due to pumba which was fixed in my PRs and in @kamuikatsurgi's too

@sonarqubecloud
Copy link
Copy Markdown

@marcello33 marcello33 merged commit bbc3b53 into develop Feb 25, 2026
20 of 24 checks passed
lucca30 pushed a commit that referenced this pull request Mar 5, 2026
…othash

fix root hash calculation during reorgs

(cherry picked from commit bbc3b53)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants