Skip to content
This repository was archived by the owner on Jun 3, 2020. It is now read-only.
This repository was archived by the owner on Jun 3, 2020. It is now read-only.

Restarting tmkms leads to Tendermint "remote signer timed out" #116

@mdyring

Description

@mdyring

Hi guys,

Sorry, not sure if the right place to report this is Tendermint or KMS repo. It seems very KMS specific, so trying here first.

Doing some testing and have tmkms running as a systemd service alongside gaia. When I restart tmkms, Tendermint does not recover gracefully and needs to be restarted as well.

Nov 24 22:50:04 val2 gaiad[86004]: I[24116-11-24|21:50:04.406] Starting BlockPool                           module=blockchain impl=BlockPool
Nov 24 22:50:04 val2 gaiad[86004]: I[24116-11-24|21:50:04.406] Starting IndexerService                      module=txindex impl=IndexerService
Nov 24 22:52:08 val2 systemd[1]: Stopping Tendermint KMS Service...
Nov 24 22:52:08 val2 systemd[1]: Stopped Tendermint KMS Service.
Nov 24 22:52:08 val2 systemd[1]: Started Tendermint KMS Service.
Nov 24 22:52:08 val2 gaiad[86004]: E[24116-11-24|21:52:08.402] Ping                                         module=privval err=EOF
Nov 24 22:52:08 val2 kernel: usb 1-3: reset full-speed USB device number 23 using xhci_hcd
Nov 24 22:52:08 val2 kernel: usb 1-10: reset full-speed USB device number 5 using xhci_hcd
Nov 24 22:52:09 val2 kernel: usb 1-10: reset full-speed USB device number 5 using xhci_hcd
Nov 24 22:52:10 val2 gaiad[86004]: E[24116-11-24|21:52:10.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:12 val2 gaiad[86004]: E[24116-11-24|21:52:12.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:14 val2 gaiad[86004]: E[24116-11-24|21:52:14.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:16 val2 gaiad[86004]: E[24116-11-24|21:52:16.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:18 val2 gaiad[86004]: E[24116-11-24|21:52:18.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:20 val2 gaiad[86004]: E[24116-11-24|21:52:20.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:22 val2 gaiad[86004]: E[24116-11-24|21:52:22.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:24 val2 gaiad[86004]: E[24116-11-24|21:52:24.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:26 val2 gaiad[86004]: E[24116-11-24|21:52:26.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:28 val2 gaiad[86004]: E[24116-11-24|21:52:28.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:30 val2 gaiad[86004]: E[24116-11-24|21:52:30.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:32 val2 gaiad[86004]: E[24116-11-24|21:52:32.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:34 val2 gaiad[86004]: E[24116-11-24|21:52:34.401] Ping                                         module=privval err="remote signer timed out"
Nov 24 22:52:36 val2 gaiad[86004]: E[24116-11-24|21:52:36.401] Ping                                         module=privval err="remote signer timed out"

Restarting after the above errors, this also presents itself:

Nov 24 22:57:26 val2 systemd[1]: Started Gaia Service.
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.468] Starting ABCI with Tendermint                module=main
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.480] Starting multiAppConn                        module=proxy impl=multiAppConn
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.480] Starting localClient                         module=abci-client connection=query impl=localClient
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.480] Starting localClient                         module=abci-client connection=mempool impl=localClient
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.480] Starting localClient                         module=abci-client connection=consensus impl=localClient
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.480] ABCI Handshake App Info                      module=consensus height=0 hash= software-version= protocol-version=0
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.480] ABCI Replay Blocks                           module=consensus appHeight=0 storeHeight=0 stateHeight=0
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.544] Completed ABCI Handshake - Tendermint and App are synced module=consensus appHeight=0 appHash=
Nov 24 22:57:26 val2 gaiad[86067]: I[24116-11-24|21:57:26.544] Starting TCPVal                              module=privval impl=TCPVal
Nov 24 22:57:29 val2 gaiad[86067]: E[24116-11-24|21:57:29.544] OnStart                                      module=privval err="accept tcp 127.0.0.1:26658: i/o timeout"
Nov 24 22:57:29 val2 gaiad[86067]: ERROR: Error starting private validator client: accept tcp 127.0.0.1:26658: i/o timeout
Nov 24 22:57:29 val2 systemd[1]: gaia.service: Main process exited, code=exited, status=1/FAILURE
Nov 24 22:57:29 val2 systemd[1]: gaia.service: Unit entered failed state.
Nov 24 22:57:29 val2 systemd[1]: gaia.service: Failed with result 'exit-code'.
Nov 24 22:57:32 val2 systemd[1]: gaia.service: Service hold-off time over, scheduling restart.
Nov 24 22:57:32 val2 systemd[1]: Stopped Gaia Service.
Nov 24 22:57:32 val2 systemd[1]: Started Gaia Service.
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.087] Starting ABCI with Tendermint                module=main
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.100] Starting multiAppConn                        module=proxy impl=multiAppConn
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.100] Starting localClient                         module=abci-client connection=query impl=localClient
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.100] Starting localClient                         module=abci-client connection=mempool impl=localClient
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.100] Starting localClient                         module=abci-client connection=consensus impl=localClient
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.100] ABCI Handshake App Info                      module=consensus height=0 hash= software-version= protocol-version=0
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.100] ABCI Replay Blocks                           module=consensus appHeight=0 storeHeight=0 stateHeight=0
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.163] Completed ABCI Handshake - Tendermint and App are synced module=consensus appHeight=0 appHash=
Nov 24 22:57:33 val2 gaiad[86096]: I[24116-11-24|21:57:33.163] Starting TCPVal                              module=privval impl=TCPVal
Nov 24 22:57:36 val2 gaiad[86096]: I[24116-11-24|21:57:36.124] This node is not a validator                 module=consensus addr=2369786F94AECAABEE11A1242A395EC9C6303BF9 pubKey=PubKeyEd25519{6C0B225542087B267B312F09424CF9E58C23519F9EC7B85181E036BB8E20E720}
Nov 24 22:57:36 val2 gaiad[86096]: I[24116-11-24|21:57:36.127] P2P Node ID                                  module=p2p ID=06430257c53430df262d5010a26175db590b4154 file=/config/node_key.json
Nov 24 22:57:36 val2 gaiad[86096]: I[24116-11-24|21:57:36.127] Starting Node                                module=node impl=Node
Nov 24 22:57:36 val2 gaiad[86096]: I[24116-11-24|21:57:36.127] Starting EventBus                            module=events impl=EventBus
Nov 24 22:57:36 val2 gaiad[86096]: I[24116-11-24|21:57:36.127] Starting PubSub                              module=pubsub impl=PubSub

I notice that often Tendermint will time out rather quickly when waiting for tmkms, as shown in the above as well.

Thanks for the great work so far.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions