Skip to content

Consensus Failure when remote signer connection drops #2926

@liamsi

Description

@liamsi

Tendermint version
develop branch

What happened:

When the remote signer drops the connection:
CONSENSUS FAILURE!!! module=consensus err=EOF or
or
CONSENSUS FAILURE!!! module=consensus err=remote signer timed out

Additionally, ConsensusState/receiveRoutine shuts down.

What you expected to happen:

A clearer error message and tendermint should probably retry / continue receiving.

Probably, these methods should not panic but return an error instead:

func (sc *RemoteSignerClient) GetAddress() types.Address {
pubKey, err := sc.getPubKey()
if err != nil {
panic(err)
}
return pubKey.Address()
}
// GetPubKey implements PrivValidator.
func (sc *RemoteSignerClient) GetPubKey() crypto.PubKey {
pubKey, err := sc.getPubKey()
if err != nil {
panic(err)
}
return pubKey
}

Depending on the error (timeout/eof vs unknown other), we should continue or exit.

Have you tried the latest version: yes

How to reproduce it (as minimally and precisely as possible):

./tendermint node --priv_validator_laddr=tcp://127.0.0.1:26659 --proxy_app=kvstore
and start e.g. start priv_val_server (or the kms) as a remote signer and and shut it down after a few rounds of signing.

Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):

log (click to expand)
I[27116-11-27|19:20:13.952] Executed block                               module=state height=492 validTxs=0 invalidTxs=0
I[27116-11-27|19:20:13.952] Committed state                              module=state height=492 txs=0 appHash=0000000000000000
I[27116-11-27|19:20:13.953] Indexed block                                module=txindex height=492
I[27116-11-27|19:20:14.954] Timed out                                    module=consensus dur=998.016427ms height=493 round=0 step=RoundStepNewHeight
I[27116-11-27|19:20:14.954] enterNewRound(493/0). Current: 493/0/RoundStepNewHeight module=consensus height=493 round=0
I[27116-11-27|19:20:14.954] enterPropose(493/0). Current: 493/0/RoundStepNewRound module=consensus height=493 round=0c

E[27116-11-27|19:20:14.955] CONSENSUS FAILURE!!!                         module=consensus err=EOF stack="goroutine 93 [running]:\nruntime/debug.Stack(0xc420ca5550, 0x1, 0x1)\n\t/usr/local/Cellar/go/1.10.3/libexec/src/runtime/debug/stack.go:24 +0xa7\ngithub.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine.func2(0xc4200b6a80, 0x1b74350)\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/consensus/state.go:583 +0xf9\npanic(0x18dc060, 0xc420074040)\n\t/usr/local/Cellar/go/1.10.3/libexec/src/runtime/panic.go:502 +0x229\ngithub.com/tendermint/tendermint/privval.(*RemoteSignerClient).GetAddress(0xc42019e560, 0x0, 0x0, 0x0)\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/privval/remote_signer.go:39 +0x91\ngithub.com/tendermint/tendermint/consensus.(*ConsensusState).enterPropose(0xc4200b6a80, 0x1ed, 0x0)\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/consensus/state.go:872 +0x613\ngithub.com/tendermint/tendermint/consensus.(*ConsensusState).enterNewRound(0xc4200b6a80, 0x1ed, 0x0)\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/consensus/state.go:790 +0x81a\ngithub.com/tendermint/tendermint/consensus.(*ConsensusState).handleTimeout(0xc4200b6a80, 0x3b7c85ab, 0x1ed, 0x0, 0x1, 0x1ed, 0x0, 0x1, 0x38b16ab2, 0xed38f81de, ...)\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/consensus/state.go:701 +0x551\ngithub.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc4200b6a80, 0x0)\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/consensus/state.go:623 +0x430\ncreated by github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart\n\t/Users/ismail/go/src/github.com/tendermint/tendermint/consensus/state.go:307 +0x140\n"
I[27116-11-27|19:20:14.955] Stopping baseWAL                             module=consensus wal=/Users/ismail/.tendermint/data/cs.wal/wal impl=baseWAL
I[27116-11-27|19:20:14.955] Stopping Group                               module=consensus wal=/Users/ismail/.tendermint/data/cs.wal/wal impl=Group
E[27116-11-27|19:20:15.853] Ping                                         module=privval err="remote signer timed out"

ref tendermint/tmkms#116
ref #2923

Metadata

Metadata

Assignees

Labels

C:consensusComponent: ConsensusT:bugType Bug (Confirmed)T:validatorType: Validator related

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions