-
Notifications
You must be signed in to change notification settings - Fork 780
Description
Bug Report
Setup
CometBFT version (use cometbft version or git rev-parse --verify HEAD if installed from source): v0.38.5 (as part of palomad v1.13.0)
Have you tried the latest version: no
ABCI app (name for built-in, URL for self-written if it's publicly available): https://github.com/palomachain/paloma v1.13.0
Environment:
- OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS
- Install tools: go version go1.21.4 linux/amd64; but seeing the same behaviour if running pre-built palomad binary
- Others: tmkms on another machine, v0.14.0
node command runtime flags: start
Config
Nothing particularly different from default, other than the node being connected to a remote signer, and having metrics and state sync enabled.
What happened?
Whenever the validator attempts to sign a block using a remote signer (tmkms v0.14.0), we get a consensus failure error, regardless of which block it tries to sign. The node stops syncing completely afterwards. Restarting the node causes it to sync back up to the latest block height, until it tries to sign again and fails again.
What did you expect to happen?
The validator signs blocks normally.
How to reproduce it
Try to sign a block using remote tmkms.
Logs
2:59PM ERR CONSENSUS FAILURE!!! err="non-recoverable error when signing vote (15385390/1)" module=consensus stack="goroutine 139403 [running]:
runtime/debug.Stack()
/opt/hostedtoolcache/go/1.21.8/x64/src/runtime/debug/stack.go:24 +0x5e
github.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:800 +0x46
panic({0x2d03080?, 0xc0b5748830?})
/opt/hostedtoolcache/go/1.21.8/x64/src/runtime/panic.go:914 +0x21f
github.com/cometbft/cometbft/consensus.(*State).signVote(0xc000aaca80, 0x2, {0xc03d3b9360, 0x20, 0x20}, {0xe0eb5c?, {0xc03d3b9380?, 0xc054dc92ec?, 0x3035160?}}, 0xc0aff57a40)
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:2388 +0x639
github.com/cometbft/cometbft/consensus.(*State).signAddVote(0xc000aaca80, 0x1?, {0xc03d3b9360, 0x20, 0x20}, {0x1?, {0xc03d3b9380?, 0xc0fd083a60?, 0x20?}}, 0xc0aff57a40)
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:2439 +0x212
github.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc000aaca80, 0xeac32e, 0x1)
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:1536 +0x1337
github.com/cometbft/cometbft/consensus.(*State).addVote(0xc000aaca80, 0xc1677dc270, {0xc00675e120, 0x28})
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:2296 +0x186f
github.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc000aaca80, 0xc1677dc270, {0xc00675e120?, 0xc054dc9c08?})
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:2056 +0x26
github.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc000aaca80, {{0x3c0ebc0, 0xc0b4f8ebc8}, {0xc00675e120, 0x28}})
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:928 +0x3ce
github.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000aaca80, 0x0)
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:835 +0x3d1
created by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 334
/home/runner/go/pkg/mod/github.com/cometbft/cometbft@v0.38.5/consensus/state.go:397 +0x10c"
2:59PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/home/validator/.paloma/data/cs.wal/wal
2:59PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/home/validator/.paloma/data/cs.wal/wal
dump_consensus_state output
Anything else we need to know
We have not tried signing without a remote signer, but we seem the only ones with this particular issue since the upgrade.