Skip to content

kvserver,server: discourage further use of NodeStatusLiveness#50478

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
irfansharif:200622.deprecate-NodeStatusLiveness
Jun 29, 2020
Merged

kvserver,server: discourage further use of NodeStatusLiveness#50478
craig[bot] merged 1 commit intocockroachdb:masterfrom
irfansharif:200622.deprecate-NodeStatusLiveness

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

We mark the NodeLivenessStatus proto as deprecated. It's unclear if the
enum was ever well considered. It enumerates across two distinct set of
things: the "membership" status (live/active, decommissioning,
decommissioned), and the node "process" status (live, unavailable,
available). It's possible for two of these "states" to be true,
simultaneously (consider a decommissioned, dead node). It makes for
confusing semantics, and the code attempting to disambiguate across
these states (kvserver.LivenessStatus() for e.g.) seem wholly arbitrary.

It's currently used by the Allocator to detect replicas placed on dead
nodes. But there's no good reason for the "deadness" to be proto-backed.
It should suffice to capture the "membership" status and view of node's
liveness (when was the record last heartbeated) + current time to inform
allocator decisions.


This was pulled out of #50329.

Release note: None.

@irfansharif irfansharif requested a review from tbg June 22, 2020 15:12
@irfansharif irfansharif requested a review from a team as a code owner June 22, 2020 15:12
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 200622.deprecate-NodeStatusLiveness branch from 73dec16 to c49a112 Compare June 22, 2020 15:14
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this and I have two general remarks:

  • you forgot to update the UI sources that refer to the enum entries (there's a few - the failing UI checks in CI will tell you. You can also run make ui-check locally I believe)

  • I come to this PR with the assumption that "deprecated" things are also rarely-used or that their remaining uses are planned to change soon to instead use non-deprecated things. Instead in this PR I can see a bunch of uses of the enum values which have various purposes, and for which you do not make a plan/proposal to move them away. So the "deprecation" is non-sensical (or, specifically, it does not mean anything - it does not inform the team about what's going to happen next)

This last remark might suggest you're not going about this in the right order. If I were you, I'd have done it the other way around:

  1. first have created a tech note or RFC draft that draws an inventory of existing uses of the enum, with some (prose) context for each.
  2. then, for each existing use, point out which other things could be used instead, or what we'd need to get rid of the liveness value in that case.
  3. once we have confidence that we can eliminate or replace all uses (with a sketch of "how"), proceed to issue this deprecation PR
    • however, you'd then augment it with // TODO annotations for every existing use, that indicates "the plan" - whether the use will be removed or replaced, and how.

Reviewed 18 of 18 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@tbg
Copy link
Copy Markdown
Member

tbg commented Jun 25, 2020

Yeah, I broadly share Raphael's view that we need a plan here. If there's no time to make a plan I would rather leave this in place for now.

BTW, you can rename the enum to something like NODESTATUS_DECOMMISSIONING and there's a gogoproto option to drop the NodeStatus prefix in the generated code. That way it won't get in your way any more.

@tbg tbg removed their request for review June 25, 2020 09:58
@irfansharif irfansharif force-pushed the 200622.deprecate-NodeStatusLiveness branch from c49a112 to b342929 Compare June 26, 2020 19:12
@irfansharif irfansharif changed the title kvserver,server: deprecate NodeStatusLiveness, and dependencies kvserver,server: discourage further use of NodeStatusLiveness Jun 26, 2020
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments, I learned that I went about it incorrectly. I've gone ahead and filed #50707 to outline what I had left implicit in my thinking when sending out this PR. My motivation here originally was to free up usage of the {DECOMMISSIONING,DECOMMISSIONED} symbols to use in #50329. I see that trying to deprecate it (partially) in one go, instead of simply renaming it, was not necessary. Updated this PR to refer to #50707 in outlining future clean up work we could possibly do, and just renaming the existing symbols. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

...and discourage further use of it altogether.

This came up in trying to introduce a new enum type for node membership
(active, decommissioning, decommissioned), and realizing the symbols
were already used by `NodeStatusLiveness`. On investigating further, it
doesn't seem like we should be using NodeStatusLiveness anymore.

It's unclear if the enum was ever well considered. It enumerates across
two distinct set of things: the "membership" status (live/active,
decommissioning, decommissioned), and the node "process" status (live,
unavailable, available). It's possible for two of these "states" to be
true, simultaneously (consider a decommissioned, dead node). It makes
for confusing semantics, and the code attempting to disambiguate across
these states (kvserver.LivenessStatus() for e.g.) seem wholly arbitrary.
See cockroachdb#50707 for greater
detail.

Short of replacing the usage of NodeStatusLiveness, we're adding some
text pointing to cockroachdb#50707
to discourage further usage. We also rename the enums to be able to
reuse those symbols in introducing the node membership enum.

---

This was pulled out of cockroachdb#50329.

Release note: None.
@irfansharif irfansharif force-pushed the 200622.deprecate-NodeStatusLiveness branch from b342929 to 5b5df15 Compare June 26, 2020 23:06
@irfansharif irfansharif requested a review from a team June 26, 2020 23:06
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 24 of 25 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@irfansharif
Copy link
Copy Markdown
Contributor Author

TFTR!

@irfansharif
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 29, 2020

Build succeeded

@craig craig bot merged commit c1f9a84 into cockroachdb:master Jun 29, 2020
@irfansharif irfansharif deleted the 200622.deprecate-NodeStatusLiveness branch June 29, 2020 13:59
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 22, 2020
… not just availability

Fixes cockroachdb#53515.

We should have the autoupgrade process look at the fully decommissioned
bit we added in cockroachdb#50329, instead of just looking at availability. It
would avoid the hazard described in cockroachdb#53515.

Previously the autoupgrade process was also looking at
NodeStatusLiveness, which we've since soured upon (see cockroachdb#50478). Now that
we always create a liveness record on start up (cockroachdb#53805), we can simply
fetch all liveness records from KV. We add a helper to do this, which
we'll also rely on in future PRs for other purposes. It's a bit
unfortunate that we're further adding on to the NodeLiveness API without
changing the caching structure, but the methods fetching records from KV
is the world we're hoping to move towards going forward.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants