-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Right now the behavior of TCPVal (I think that's the right name?) is to wait for a TCP connection until a timeout, and then terminate the process if it doesn't receive one.
I'd like to suggest an alternative behavior I think would be helpful for validator failover while preserving double signing protection: have a node with priv_validator_laddr configured start up as if it were a normal full node if it does not receive a connection from the KMS. As soon as it does receive a KMS connection, it can flip over from being a full node to being a validator.
With this approach, it's possible to have several full nodes capable of being promoted to validators operating, but use the client connection to control which one is currently active.
In particular I think it'd be nice to have KMS connect to a virtual IP (a.k.a. VIP, e.g. single address NAT, or a TCP proxy) managed by a network device with its own built-in failover. This "VIP" can be pointed at the presently active validator. To flip it over, the VIP only needs to be pointed at a different "standby" validator, which can become active and start signing as soon as it receives a connection.