kvserver,server: discourage further use of NodeStatusLiveness#50478
Conversation
73dec16 to
c49a112
Compare
knz
left a comment
There was a problem hiding this comment.
I reviewed this and I have two general remarks:
-
you forgot to update the UI sources that refer to the enum entries (there's a few - the failing UI checks in CI will tell you. You can also run
make ui-checklocally I believe) -
I come to this PR with the assumption that "deprecated" things are also rarely-used or that their remaining uses are planned to change soon to instead use non-deprecated things. Instead in this PR I can see a bunch of uses of the enum values which have various purposes, and for which you do not make a plan/proposal to move them away. So the "deprecation" is non-sensical (or, specifically, it does not mean anything - it does not inform the team about what's going to happen next)
This last remark might suggest you're not going about this in the right order. If I were you, I'd have done it the other way around:
- first have created a tech note or RFC draft that draws an inventory of existing uses of the enum, with some (prose) context for each.
- then, for each existing use, point out which other things could be used instead, or what we'd need to get rid of the liveness value in that case.
- once we have confidence that we can eliminate or replace all uses (with a sketch of "how"), proceed to issue this deprecation PR
- however, you'd then augment it with
// TODOannotations for every existing use, that indicates "the plan" - whether the use will be removed or replaced, and how.
- however, you'd then augment it with
Reviewed 18 of 18 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @tbg)
|
Yeah, I broadly share Raphael's view that we need a plan here. If there's no time to make a plan I would rather leave this in place for now. BTW, you can rename the enum to something like |
c49a112 to
b342929
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Thank you for your comments, I learned that I went about it incorrectly. I've gone ahead and filed #50707 to outline what I had left implicit in my thinking when sending out this PR. My motivation here originally was to free up usage of the {DECOMMISSIONING,DECOMMISSIONED} symbols to use in #50329. I see that trying to deprecate it (partially) in one go, instead of simply renaming it, was not necessary. Updated this PR to refer to #50707 in outlining future clean up work we could possibly do, and just renaming the existing symbols. PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
...and discourage further use of it altogether. This came up in trying to introduce a new enum type for node membership (active, decommissioning, decommissioned), and realizing the symbols were already used by `NodeStatusLiveness`. On investigating further, it doesn't seem like we should be using NodeStatusLiveness anymore. It's unclear if the enum was ever well considered. It enumerates across two distinct set of things: the "membership" status (live/active, decommissioning, decommissioned), and the node "process" status (live, unavailable, available). It's possible for two of these "states" to be true, simultaneously (consider a decommissioned, dead node). It makes for confusing semantics, and the code attempting to disambiguate across these states (kvserver.LivenessStatus() for e.g.) seem wholly arbitrary. See cockroachdb#50707 for greater detail. Short of replacing the usage of NodeStatusLiveness, we're adding some text pointing to cockroachdb#50707 to discourage further usage. We also rename the enums to be able to reuse those symbols in introducing the node membership enum. --- This was pulled out of cockroachdb#50329. Release note: None.
b342929 to
5b5df15
Compare
knz
left a comment
There was a problem hiding this comment.
Reviewed 24 of 25 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained
|
TFTR! |
|
bors r+ |
Build succeeded |
… not just availability Fixes cockroachdb#53515. We should have the autoupgrade process look at the fully decommissioned bit we added in cockroachdb#50329, instead of just looking at availability. It would avoid the hazard described in cockroachdb#53515. Previously the autoupgrade process was also looking at NodeStatusLiveness, which we've since soured upon (see cockroachdb#50478). Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV. We add a helper to do this, which we'll also rely on in future PRs for other purposes. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the caching structure, but the methods fetching records from KV is the world we're hoping to move towards going forward. Release note: None
We mark the NodeLivenessStatus proto as deprecated. It's unclear if the
enum was ever well considered. It enumerates across two distinct set of
things: the "membership" status (live/active, decommissioning,
decommissioned), and the node "process" status (live, unavailable,
available). It's possible for two of these "states" to be true,
simultaneously (consider a decommissioned, dead node). It makes for
confusing semantics, and the code attempting to disambiguate across
these states (kvserver.LivenessStatus() for e.g.) seem wholly arbitrary.
It's currently used by the Allocator to detect replicas placed on dead
nodes. But there's no good reason for the "deadness" to be proto-backed.
It should suffice to capture the "membership" status and view of node's
liveness (when was the record last heartbeated) + current time to inform
allocator decisions.
This was pulled out of #50329.
Release note: None.