bridge: improvements / app: improvements#427
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements significant improvements to bridge service integration and reliability. The bridge is now embedded as a child process within heimdalld instead of running as a separate process, requiring validators to use heimdalld start --bridge --all --rest-server. The PR also improves error handling for unresponsive REST servers with timeout mechanisms and better logging, validates bor_chain_id in checkpoint workflows, and includes dependency updates.
Key changes:
- Bridge service embedded within
heimdalldas a child process instead of standalone operation - Enhanced REST server timeout handling with configurable warnings after 30 minutes
- Chain ID validation in checkpoint processing to prevent mismatched submissions
- Dependency updates including cosmos-sdk bump and websocket library replacement
Reviewed Changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
bridge/service/bridge.go |
New bridge service implementation for embedded operation within heimdalld |
cmd/heimdalld/cmd/commands.go |
Enhanced REST server polling with timeout handling and bridge integration |
x/checkpoint/keeper/side_msg_server.go |
Added bor chain ID validation in checkpoint processing |
bridge/broadcaster/broadcaster.go |
Improved account polling logic with context-aware waiting |
bridge/util/common.go |
Updated GetAccount function to accept context parameter |
app/app.go |
Added fee transfer event emission in EndBlocker |
go.mod |
Updated cosmos-sdk version and websocket library replacement |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
* Delete spans backfill * Always vote NO on spans backfill * Return type assertion for MsgBackfillSpan * Merge pull request #413 from 0xPolygon/kamui/bump-kurtosis chore: remove validator test case and bump kurtosis * chore: side msg and abci handler metrics (#410) * chore: add side msg metrics * timer metrics for Pre, Begin, and End blocker funcs * chore: nits * abci handler metrics * chore: use Namespace from metrics package * misc: migrated from maticnetwork to 0xPolygon (#420) * misc: migrated from maticnetwork to 0xPolygon * misc: bump cosmos to v0.2.3-polygon * misc: bump bor * misc: addressed comments * chore: bump kurtosis (#422) * chore: bump kurtosis * chore: add milestones test * fix: test case name * chore: update test_runner image * Re-enable voting power check in tally votes (#409) * app,helper,x/stake: add logic to store penultimate valset * app: use correct valset in ValidateVoteExtensions * app: fix if condition * app: add condition for genesis * app,x/chainmanager: track initial chain height for UTs * app,helper,x/milestone: re-enable skipped validator not found errors * app: fix some tests and refactor a bit * app,bridge: fix more tests * app,x/milestone: some cleanup * app: dedup redundant stubs * app: more cleanup * app,helper,x/chainmanager: move setting of initial height outside of x/chainmanager * app: improved error logging * app,x/chainmanager: rm unused key + minor nit * app: deterministically extract max hash and VP from non-rp VE * x/stake: fix lint * helper: refactoring * app: add comment * bump go to 1.24.6 * helper: rm unnecessary todo * fix: build (#423) * Fix generate-keystore command (#424) * cmd: allow generate-keystore to accept private key * cmd: add --generate-new flag * update: cosmos-sdk (#425) * helper: fix conflicts * feat: bump kurtosis and migrate to pos-workflows (#431) * feat: remove matic-cli e2e-tests (#432) * bridge: improvements / app: improvements (#427) * improve bridge, remove bridge cli, minor fixes * validate bor_chain_id in checkpoints flow * test fee-transfer events emissions * remove test block * logs on rest server being unresponsive * improve initRootCmd * don't return on ctx done / restServerTimeOutInMinutes to 1m for tests * improve logs / restore restServerTimeOutInMinutes * sort imports * fix comment * fix tests and linter issues * bump cosmos-sdk dep * move const outside of the func * refactor StartWithCtx function * address comments * log anomaly if account not found with node being in sync * Set voting power and valset check heights for amoy (#435) * helper: set tallyFixHeight and disableVPCheckHeight for amoy * helper: set disableValSetCheckHeight for amoy * helper: set initialHeight for amoy * api,proto,x/stake: add penultimate valset to genesis export + add some tests (#436) * set tallyfix mainnet hf heights --------- Co-authored-by: Angel Valkov <avalkov@polygon.technology> Co-authored-by: Krishang Shah <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> Co-authored-by: Raneet Debnath <35629432+Raneet10@users.noreply.github.com> Co-authored-by: Raneet Debnath <raneetdebnath10@gmail.com>



Description
In this PR we implement the following improvements:
heimdalldwill hence wait until the account is visible locally, meaning the node is synced and past join heightheimdalldas a child process via the--bridgeflag. This means that validators running the bridge will need to adapt and useheimdalld start --bridge --all --rest-serverto start the bridge withinheimdalldservicerest-serveris not responding,heimdalldwon't crash unless the context is intentionally canceled. Instead, the service will wait for a certain time (configured to berestServerTimeOutInMinutes = 30) and then will start printing some meaningful logs, so that the operators can check the status of their rest server. Now we see logs likeUntil the service is stopped.
In happy case scenarios (healthy rest-server) no logs are printed, e.g.
httpClientwithtimeoutinstead ofDefaultClientforInitRootCmdwhen trying to contact the rest serverchainIdis initialized in the context at bridge startup. Fetching suchchainIdalso remains as a fallback while calling the rest-server. Example of log:bor_chain_idis validated during checkpoint workflow atside_serverlevel against the chain parameterscosmos-sdkto latest releasenhooyr.iolibrary withgithub.com/coder wsas per Fix: Replace nhooyr.io => github.com/coder ws #421 (thanks @DaveWK for your contribution)EndBlockeras per feat: Expose proposer transfer event during end of block #376 (thanks @haiyanghe for your contribution). Now we can observe such events, e.g.Changes
Breaking changes
The breaking changes here require all validators (generally speaking, nodes running the bridge) to run such service within
heimdalld(viaheimdalld start --bridge --all --rest-server) instead of using a custom separatedbridge cliprocessChecklist
Testing
Remotely, this has been tested with a fresh devnet anchored to the PR's source branch, and also with a staggered upgrade starting from
develop. It is recommended to further test everything with a staggered upgrade (on top of the current latest release), once the candidate release branch is cut off fromdevelop.