fix(provider): recover from panics in reward-claim gas estimation#2304
Merged
Conversation
A node can return a successful Simulate response with a nil GasInfo; cosmos-sdk's CalculateGas dereferences it without a nil check and panics. Because the reward claim runs in an un-recovered goroutine, that panic crashed the entire lavap provider process (observed SIGSEGV in TxRelayPayment -> simulateTxWithRetry -> CalculateGas). - Wrap CalculateGas (simulateTxWithRetry) so a panic becomes a retryable error handled by the existing retry/fail path; this also protects every gas-simulated tx, not just reward claims. - Add a backstop recover() in the reward-claim goroutine so no panic in the async claim can take down the provider; it logs the stack and marks the claim for retry. The root nil-deref lives in the cosmos-sdk fork; this is the lavap-side guard so a bad node response degrades gracefully instead of crashing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 🚀 New features to boost your workflow:
|
avitenzer
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A provider can crash (SIGSEGV) while claiming rewards:
cosmos-sdk's
CalculateGasdoessimRes.GasInfo.GasUsedwith no nil check (client/tx/tx.go). A node can return a successfulSimulateresponse withGasInfo == nil(the proto field is optional), so the dereference panics —addr=0x8in the signal is exactly the offset ofGasUsedinsideGasInfo. The reward claim runs in an un-recovered goroutine, so that panic crashes the entire lavap provider process (downtime, missed relays, lost claim).Fix (lavap-side, two layers)
simulateTxWithRetrynow callscalculateGasWithRecover, which wrapstx.CalculateGasand converts a panic into a normal error handled by the existing retry/fail path. It lives in the shared simulation path, so it protects every gas-estimated tx (consumer + provider), not just reward claims.recover()in the reward-claim goroutine logs the panic + stack and marks the claim for retry, so no panic in that async path can take the provider down.Normal behavior is unchanged.
Verification
TestSendRewardsClaim_RecoversFromTxPanic(panickingTxRelayPaymentmock): proven to fail without the change (the goroutine panic crashes the test binary) and pass with it; the recovered claim is marked for retry.rewardserver+statetrackerpackages,go vet,gofmtall clean.Related (not in this PR)
The root nil-deref is in the cosmos-sdk fork (
github.com/lavanet/cosmos-sdk,client/tx/tx.go):CalculateGasshould nil-checksimRes.GasInfoand return an error instead of dereferencing. That's the complementary deeper fix in the fork; this PR is the lavap-side guard so a bad node response degrades gracefully regardless. Independent of #2301/#2302 (different subsystem).🤖 Generated with Claude Code