Skip to content

privval: allow full node startup without a connection for improved failover #3455

@tarcieri

Description

@tarcieri

Right now the behavior of TCPVal (I think that's the right name?) is to wait for a TCP connection until a timeout, and then terminate the process if it doesn't receive one.

I'd like to suggest an alternative behavior I think would be helpful for validator failover while preserving double signing protection: have a node with priv_validator_laddr configured start up as if it were a normal full node if it does not receive a connection from the KMS. As soon as it does receive a KMS connection, it can flip over from being a full node to being a validator.

With this approach, it's possible to have several full nodes capable of being promoted to validators operating, but use the client connection to control which one is currently active.

In particular I think it'd be nice to have KMS connect to a virtual IP (a.k.a. VIP, e.g. single address NAT, or a TCP proxy) managed by a network device with its own built-in failover. This "VIP" can be pointed at the presently active validator. To flip it over, the VIP only needs to be pointed at a different "standby" validator, which can become active and start signing as soon as it receives a connection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T:enhancementType: EnhancementT:validatorType: Validator relatedstalefor use by stalebot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions