-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver,ui: remove usage of NodeLivenessProto #50707
Description
Let us consider the NodeLivenessProto. It enumerates across two distinct set of things: the "membership" status of the corresponding node as it pertains to the cluster (live/active, decommissioning, decommissioned), and the "process" state of that node as seen by the rest of the cluster (live/active, unavailable, dead). "unavailable" is the state we use to represent a node that has only recently started misbehaving, and one that we're unable to contact. We use this intermediate stage for a fixed deadThreshold time.Duration before marking it as "dead". It's possible for two of these "states" to be true simultaneously. Consider when we decide to decommission a dead node, as per this enum it can be considered as both "unavailable" and "decommissioning". It makes for confusing semantics, and code attempting to disambiguate across these possible states (in kvserver.LivenessStatus) is wholly arbitrary (it takes in this deadThreshold time.Duration to make the determination when a node transitions from being simply unavailable, to being considered dead).
There's no good reason for the "deadness" of a node to be something that's proto-backed. For callers interested in whether or not a node is considered "dead" or "unavailable", or "live", it should suffice for clients to capture the given node's liveness record (which includes the expiration time), compare with current time, and use just that for it's determination of "deadness" or "unavailability". That aside, the node we mark as "dead" may not actually be dead. According to this API, we consider a node "dead" when it hasn't been able to hearbeat it's liveness record for deadThreshold (which defaults to 5m) after it was set to expire. It could very well be that the node is behind a persistent network failure, so still up and running.
Now, lets consider all the current consumers of this NodeLivenessProto enum, and how we'd go about changing them. Use of this enum is primarily exposed through our StatusServer interface:
(a) cockroach {gen haproxy,debug zip}: Uses the StatusServer API to poll the targeted node for membership status, in order to skip over decommissioning nodes
(b) cockroach node {recommission,decommission}: Uses the StatusServer API to poll the targeted nodes for membership status, to generate warnings for users (when re-decommissioning a node, etc.)
(c) kvserver.Allocator: Talking through storeDetail.status, and kvserver.LivenessStatus, it uses both "membership" status and "process" state to inform placement of replicas.
(d) Admin UI: It uses the StatusServer process to poll the "process" state of given nodes, which powers the "unvailable"/"dead" view in the Admin UI.
Now, for our proposal: Let's get rid of NodeLivenessProto. We're working separately #50329 to introduce a decommissioned bit to CRDB, which does not make use of this existing infrastructure. For (a) and (b), since they depend on having a view of the decommissioning/decommissioned status of a given node, it should suffice to simply consult this new API. For (c), like suggested above, we can simply fetch the liveness record of interest (which captures the expiration time of said liveness record), and use the current time to determine whether or not a given store is a viable placement target for replicas. It can be well insulated between storeDetail.status. For (d), we can expose a view of the liveness record (which after #50329 will contain the "membership" status as well) and use the expiration time in the Admin UI code to determine "dead"-ness. In fact, digging through why we introduced this proto-backed "dead"-ness in the first place, we did so in 20086 precisely to capture node "process" state in the Admin UI.
Jira issue: CRDB-4099