Skip to content

Allow peer sync recovery from a wrong fork#1588

Merged
marcello33 merged 1 commit into0xPolygon:v2.2.0-beta-candidatefrom
cffls:v2.2.0-beta-candidate
Jun 24, 2025
Merged

Allow peer sync recovery from a wrong fork#1588
marcello33 merged 1 commit into0xPolygon:v2.2.0-beta-candidatefrom
cffls:v2.2.0-beta-candidate

Conversation

@cffls
Copy link
Copy Markdown
Contributor

@cffls cffls commented Jun 24, 2025

This commit implements a fork detection mechanism in the whitelist milestone validation system to resolve synchronization issues that occur when nodes sync from new milestone with extremely low latency. The problem was that nodes on different chain forks would be unable to sync with peers due to milestone validation rejecting chains with different block hashes, leading to "mismatch error" failures.

Root Cause Analysis:
The issue was caused by chain fork lock-in where milestone validation prevented nodes stuck on wrong forks from syncing to the correct canonical chain. Debug logs revealed that different nodes had different block hashes for the same block numbers, confirming a fork scenario where milestone validation was blocking recovery.

Solution Implemented:

  • Added ChainReader interface to milestone validation for blockchain access
  • Implemented fork detection logic in IsValidPeer method that checks if local blockchain has different hash than milestone hash for the same block number
  • When a fork is detected, the validation allows peer synchronization to proceed, enabling automatic recovery to the canonical chain
  • Added SetBlockchain method to inject blockchain reference after both whitelist service and blockchain are initialized (avoiding circular dependency)
  • Updated service creation in eth/backend.go to set blockchain reference after initialization

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

This commit implements a fork detection mechanism in the whitelist milestone validation system to resolve synchronization issues that occur when nodes sync from new milestone with extremely low latency. The problem was that nodes on different chain forks would be unable to sync with peers due to milestone validation rejecting chains with different block hashes, leading to "mismatch error" failures.

Root Cause Analysis:
The issue was caused by chain fork lock-in where milestone validation prevented nodes stuck on wrong forks from syncing to the correct canonical chain. Debug logs revealed that different nodes had different block hashes for the same block numbers, confirming a fork scenario where milestone validation was blocking recovery.

Solution Implemented:
- Added ChainReader interface to milestone validation for blockchain access
- Implemented fork detection logic in IsValidPeer method that checks if local blockchain has different hash than milestone hash for the same block number
- When a fork is detected, the validation allows peer synchronization to proceed, enabling automatic recovery to the canonical chain
- Added SetBlockchain method to inject blockchain reference after both whitelist service and blockchain are initialized (avoiding circular dependency)
- Updated service creation in eth/backend.go to set blockchain reference after initialization
@cffls cffls requested review from a team and marcello33 June 24, 2025 06:31
@marcello33 marcello33 merged commit fdf6cd6 into 0xPolygon:v2.2.0-beta-candidate Jun 24, 2025
10 of 13 checks passed
cffls added a commit to cffls/bor that referenced this pull request Jun 30, 2025
This commit implements a fork detection mechanism in the whitelist milestone validation system to resolve synchronization issues that occur when nodes sync from new milestone with extremely low latency. The problem was that nodes on different chain forks would be unable to sync with peers due to milestone validation rejecting chains with different block hashes, leading to "mismatch error" failures.

Root Cause Analysis:
The issue was caused by chain fork lock-in where milestone validation prevented nodes stuck on wrong forks from syncing to the correct canonical chain. Debug logs revealed that different nodes had different block hashes for the same block numbers, confirming a fork scenario where milestone validation was blocking recovery.

Solution Implemented:
- Added ChainReader interface to milestone validation for blockchain access
- Implemented fork detection logic in IsValidPeer method that checks if local blockchain has different hash than milestone hash for the same block number
- When a fork is detected, the validation allows peer synchronization to proceed, enabling automatic recovery to the canonical chain
- Added SetBlockchain method to inject blockchain reference after both whitelist service and blockchain are initialized (avoiding circular dependency)
- Updated service creation in eth/backend.go to set blockchain reference after initialization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants