Skip to content

privval: Consumes out of order message if previous sign request timed out #3486

@joe-bowman

Description

@joe-bowman

Tendermint version (use tendermint version or git rev-parse --verify HEAD if installed from source): 0.30.1 (as reported by localhost:26657/status

ABCI app (name for built-in, URL for self-written if it's publicly available): cosmos-sdk:0.33.0

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 18.04.2
  • Install tools:
  • Others: tendermint/KMS: 0.4.0

What happened:
After signing for a period of time (first time, about 12 hours; second time about a week), the KMS sends a message gaiad does not expect, and gaiad logs the following message:

ESC[32mgaia_1                  |ESC[0m E[2019-03-22|18:36:54.561] Error signing vote                           module=consensus height=112215 round=0 vote="Vote{15:31920F9BC3A3 112215/00/1(Prevote) 89AA573A77A7 00000
0000000 @ 2019-03-22T18:36:51.561061017Z}" err="remote signer timed out"
ESC[32mgaia_1                  |ESC[0m E[2019-03-22|18:36:54.704] Ping                                         module=privval err="received unexpected response"
ESC[32mgaia_1                  |ESC[0m E[2019-03-22|18:36:54.705] Error signing vote                           module=consensus height=112215 round=0 vote="Vote{15:31920F9BC3A3 112215/00/2(Precommit) 89AA573A77A7 000000000000 @ 2019-03-22T18:36:54.705527937Z}" err="received unexpected response"

Logs past this point appear normal, and the KMS logs show that gaiad was continuing to make requests to the KMS after these errors, but the prevote/precommits after height 112214 are missing from blocks until gaiad was restarted.

To summarise:
Gaiad (< 112215): OK, 100% blocks signed
KMS (< 112215): OK, 100% blocks signed
Gaiad (112215): Error messages as above
KMS (112215): Received sign requests from gaiad, no errors
Gaiad (>112215): No errors, no prevotes/precommits/proposals included in blocks
KMS (>112215): Received sign requests from gaiad, no errors

What you expected to happen:
No errors, blocks signed as expected.

Have you tried the latest version: yes/no
Unable to test current version in this environment

How to reproduce it (as minimally and precisely as possible):
Not reliably reproducible; but repeats after random period of time.

Having looked into code, this error only displays when the message returned by the external signer does not match the type it expected to deal with (e.g. Ping received instead of SignedVote, etc.) (privval/remote_signer.go; search ErrUnexpectedResponse).

This could potentially be an invalid message being emitted by the signer, or a race condition.

Metadata

Metadata

Assignees

Labels

T:bugType Bug (Confirmed)T:validatorType: Validator related

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions