Skip to content

fix(provider): recover from panics in reward-claim gas estimation#2304

Merged
nimrod-teich merged 1 commit into
mainfrom
fix/reward-claim-gas-panic-recover
Jun 2, 2026
Merged

fix(provider): recover from panics in reward-claim gas estimation#2304
nimrod-teich merged 1 commit into
mainfrom
fix/reward-claim-gas-panic-recover

Conversation

@AnnaR-prog

Copy link
Copy Markdown
Contributor

Problem

A provider can crash (SIGSEGV) while claiming rewards:

RewardServer.sendRewardsClaim (goroutine)
  └─ ProviderTxSender.TxRelayPayment
       └─ SimulateAndBroadCastTxWithRetryOnSeqMismatch
            └─ simulateTxWithRetry
                 └─ tx.CalculateGas   💥 nil pointer dereference

cosmos-sdk's CalculateGas does simRes.GasInfo.GasUsed with no nil check (client/tx/tx.go). A node can return a successful Simulate response with GasInfo == nil (the proto field is optional), so the dereference panics — addr=0x8 in the signal is exactly the offset of GasUsed inside GasInfo. The reward claim runs in an un-recovered goroutine, so that panic crashes the entire lavap provider process (downtime, missed relays, lost claim).

Fix (lavap-side, two layers)

  1. simulateTxWithRetry now calls calculateGasWithRecover, which wraps tx.CalculateGas and converts a panic into a normal error handled by the existing retry/fail path. It lives in the shared simulation path, so it protects every gas-estimated tx (consumer + provider), not just reward claims.
  2. A backstop recover() in the reward-claim goroutine logs the panic + stack and marks the claim for retry, so no panic in that async path can take the provider down.

Normal behavior is unchanged.

Verification

  • New regression test TestSendRewardsClaim_RecoversFromTxPanic (panicking TxRelayPayment mock): proven to fail without the change (the goroutine panic crashes the test binary) and pass with it; the recovered claim is marked for retry.
  • Full rewardserver + statetracker packages, go vet, gofmt all clean.

Related (not in this PR)

The root nil-deref is in the cosmos-sdk fork (github.com/lavanet/cosmos-sdk, client/tx/tx.go): CalculateGas should nil-check simRes.GasInfo and return an error instead of dereferencing. That's the complementary deeper fix in the fork; this PR is the lavap-side guard so a bad node response degrades gracefully regardless. Independent of #2301/#2302 (different subsystem).

🤖 Generated with Claude Code

A node can return a successful Simulate response with a nil GasInfo;
cosmos-sdk's CalculateGas dereferences it without a nil check and panics.
Because the reward claim runs in an un-recovered goroutine, that panic
crashed the entire lavap provider process (observed SIGSEGV in
TxRelayPayment -> simulateTxWithRetry -> CalculateGas).

- Wrap CalculateGas (simulateTxWithRetry) so a panic becomes a retryable
  error handled by the existing retry/fail path; this also protects every
  gas-simulated tx, not just reward claims.
- Add a backstop recover() in the reward-claim goroutine so no panic in the
  async claim can take down the provider; it logs the stack and marks the
  claim for retry.

The root nil-deref lives in the cosmos-sdk fork; this is the lavap-side guard
so a bad node response degrades gracefully instead of crashing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 40.00000% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
protocol/statetracker/tx_sender.go 0.00% 9 Missing ⚠️
Flag Coverage Δ
consensus 8.96% <ø> (ø)
protocol 35.54% <40.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
protocol/rpcprovider/rewardserver/reward_server.go 68.07% <100.00%> (+0.38%) ⬆️
protocol/statetracker/tx_sender.go 0.00% <0.00%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

Copy link
Copy Markdown

Test Results

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
7 files   ±0   0 ❌ ±0 

Results for commit 7c5e70e. ± Comparison against base commit 8fae505.

@nimrod-teich nimrod-teich merged commit 9d16b20 into main Jun 2, 2026
30 checks passed
@nimrod-teich nimrod-teich deleted the fix/reward-claim-gas-panic-recover branch June 2, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants