Skip to content

Merge v2.2.4 beta candidate to master#1610

Merged
Raneet10 merged 20 commits intomasterfrom
v2.2.4-beta-candidate
Jul 3, 2025
Merged

Merge v2.2.4 beta candidate to master#1610
Raneet10 merged 20 commits intomasterfrom
v2.2.4-beta-candidate

Conversation

@Raneet10
Copy link
Copy Markdown
Contributor

@Raneet10 Raneet10 commented Jul 3, 2025

Description

PR to merge v2.2.4-beta-candidate to master.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

pratikspatil024 and others added 20 commits June 12, 2025 14:00
* cmd,consensus/bor,eth,internal,tests: init heimdall-v2 integration

* cmd,consensus/bor,eth,internal: refactor and todo comments

* go mod tidy

* update cosmos-sdk/client/v2

* consensus/bor: fix setting of proto message value to correct type

* Fix milestone response

* Latest milestone response fixes

* Milestone custom unmarshal

* Fix milestone parsing

* Fix milestone parsing

* Fix milestone no-ack parsing

* Fix milestone no-ack parsing

* Remove log

* Test fixes

* checkpoint: fix response

* Custom unmarshaling for checkpoint

* Fix fetching milestone by id

* Fix milestone count response

* Fix checkpoint responses

* Remove FetchMilestoneByID

* Fix MinimalVal

* consensus/bor: add page param to state sync fetch url

* consensus/bor: fix state sync url

* consensus/bor: add debug log for state sync

* consensus/bor: minor change to fetchStateSyncEventsFormat

* consensus/bor: fix state sync event record decoding

* update heimdall-v2 commit

* consensus/bor: remove debug print logs

* consensus/bor: fix fetching list of state sync

* consensus/bor: remove unused const

* Fast consensus

* Disable milestone lock

This is to unblock an issue where more than one milestone are proposed at the same time.

* Modify docker file to access private repo

* Only apply early block announcement to primary block producer

* Remove milestone no-ack fetching

* Find and insert whitelisted chain on new milestone

* dependencies

* update CI

* integration tests update

* milestone ws connection & handler refactor

* logs to validate

* more logs

* more lgos

* more logs

* remove deadline

* connection recover and logs to validate

* small make fixes

* remove logs

* refactor on retry mechanism

* empty ws addree by default

* install solc-select on e2e ci tests

* Bor without heimdall

* bor nil check

* Bor without heimdall

* chore: add removePrefix

* Milestone using gRPC

* Checkpoint using gRPC

* Span and State Sync using gRPC

* Minor nit

* Change milestone log to Debug

* fix milestone count, remove Span, and lint fixes

* fix FetchMilestone func

* Bor without heimdall

* remove extra line to fix lint

* Handle race condition cases - requesting heimdallv2 via v1 client

* fix: generate-mocks and test

* chore: comment out Span in heimdallapp

* minor chores

* address govuln check

* Return empty state sync events on error

* Increase milestone ticker

* Parse API server url and set tendermint RPC port

* fix nil pointer dereference on DoCall

* Continue even if we cannot fetch state sync events

* fix: metrics

* address comments

* chore: remove .EXPECT().Span from tests

* chore: fix test warnings

* fix: Bor-HV2 tests

* Bor without heimdall

* fix mocks

* Fix checkpoint v2 and milestone v2 responses

* Use proper url to fetch halt height

* Fix checkpoint and milestone v2 response

* Add debug logs

* Remove debug logs

* Set HF is approaching to true

* Debug logging

* Use correct latest span url

* If HF is approaching dont request v1 span with retry

* Fix querying latest stored span

* Add logs

* Correctly calculate self committed span start and end blocks

* Fix self committing of span start block

* Fix halt height types

* Halt height monitor fixes

* Fetch halt height every 5 seconds

* Fix heimdall halt height monitor

* Remove halt height monitor

* Dont panic if span not stored

* Retry until span is available in v1

* Enable self committing of spans for heimdallv2

* Dont retry indefinetely GetSpanV2

* Fix span length calculation

* Cleanup

* gRPC client to support heimdall v1 and v2

* Fix tests

* Fixing comments

* Fixing comments

* Remover bor without heimdall changes

* Uncomment fake miner

* Bor without heimdall

* Add GetStartBlockHeimdallSpanID endpoint

* Dont use websocket connection if it failed to connect

* Fix storing start block original span id

* Store IsHeimdallV2 in persistent storage

* Fix go.mod

* update APIs / bump v2

* Fix go.mod

* Use chainparams to detect hv1 and hv2

* Fix test

* Add missing func

* Audit report improvements

* Fix storing version flag

* Fix matic-cli-config

* Move bor span self commit logic to the span store

* Retry with backoff - detect if heimdall version changed

* Fix panics

* Fix return

* Fix tests

* Span start and end block calculation fixes

* fix: ci.yml and matic-cli config (#1571)

* Fix CI comments

* Update CI go version

* Remove alias

* Fix comment

* Fix wrong comments

* Remove unused url

* Added panic for not implemented running of heimdall from bor

* Handle error on SetReadDeadline

* Added error handling and falling back to HTTP for ws milestones fetching

* Added retrying if subscribeAndHandleMilestone fails

* Remove unused fnName parameter from retryHeimdallHandler

* Remove unused function

* Remove unused function

* Fix imports ordering

* Updated heimdallv2 dependency version and added total difficulty to milestone

* Fix typo

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Fix milestones and checkpoints count

* Fix tests

* Fix tests

* Fix TestInsertingSpanSizeBlocks

* Fix TestSimulatedBackend

* Fix lint and tests

* Fix TestSimulatedBackend

* Fix lint

* Cleanup logs

* Fix tests

* Turn serve warning to debug

* Increase testGenerateBlockAndImport timeout

---------

Co-authored-by: Raneet Debnath <raneetdebnath10@gmail.com>
Co-authored-by: Pratik Patil <pratikspatil024@gmail.com>
Co-authored-by: Raneet Debnath <35629432+Raneet10@users.noreply.github.com>
Co-authored-by: Jerry <jerrycgh@gmail.com>
Co-authored-by: marcello33 <marcelloardizzone@hotmail.it>
Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br>
Co-authored-by: kamuikatsurgi <shahkrishang11@gmail.com>
Co-authored-by: Krishang Shah <109511742+kamuikatsurgi@users.noreply.github.com>
This commit implements a fork detection mechanism in the whitelist milestone validation system to resolve synchronization issues that occur when nodes sync from new milestone with extremely low latency. The problem was that nodes on different chain forks would be unable to sync with peers due to milestone validation rejecting chains with different block hashes, leading to "mismatch error" failures.

Root Cause Analysis:
The issue was caused by chain fork lock-in where milestone validation prevented nodes stuck on wrong forks from syncing to the correct canonical chain. Debug logs revealed that different nodes had different block hashes for the same block numbers, confirming a fork scenario where milestone validation was blocking recovery.

Solution Implemented:
- Added ChainReader interface to milestone validation for blockchain access
- Implemented fork detection logic in IsValidPeer method that checks if local blockchain has different hash than milestone hash for the same block number
- When a fork is detected, the validation allows peer synchronization to proceed, enabling automatic recovery to the canonical chain
- Added SetBlockchain method to inject blockchain reference after both whitelist service and blockchain are initialized (avoiding circular dependency)
- Updated service creation in eth/backend.go to set blockchain reference after initialization
When checkpoint verifier inserts canonical chain after rewind, the fork choice
logic in writeBlockAndSetHead() may determine that no reorg is needed, causing
blocks to be inserted with SideStatTy instead of CanonStatTy. This prevents
writeHeadBlock() from being called, leaving canonical hash mappings outdated.

The issue manifests as:
- "Successfully inserted canonical chain" log appears
- GetBlockByNumber() still returns old block hash
- Fork detection triggers false positives for whitelisted blocks
- Milestone reorg protection fails

Fix by explicitly calling SetCanonical() after InsertChain() to force
canonical status and ensure proper canonical hash mapping updates.
eth: fix canonical chain state inconsistency in checkpoint verifier
…rite the state sync records for a range of blocks (#1597)
…age after heimdall (v1 -> v2) migration (#1598)

* consensus/bor, params: added OverrideStateSyncRecordsInRange to overwrite the state sync records for a range of blocks

* amoy: added blocks in the OverrideStateSyncRecordsInRange for the outage after heimdall (v1 -> v2) migration
@Raneet10 Raneet10 requested a review from a team July 3, 2025 04:23
@Raneet10 Raneet10 merged commit 81caac8 into master Jul 3, 2025
14 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants