release-22.2: gossip: reintroduce GossipClientsKey gossip#103788
release-22.2: gossip: reintroduce GossipClientsKey gossip#103788erikgrinaker merged 1 commit intocockroachdb:release-22.2from
GossipClientsKey gossip#103788Conversation
|
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
irfansharif
left a comment
There was a problem hiding this comment.
I'm reminding myself of the history in #89613. I'm not sure this is worth fixing given the CPU overhead this is introducing in turn, but perhaps we could just publish a release note calling it out instead.
- We know exactly the 22.2.x/23.1 releases where
is_liveis a false positive, and those users can use newer patch releases of 22.2.x while still in their mixed version state and be fine. - 22.1.x is out of the maintenance window tomorrow (2023-05-24, https://www.cockroachlabs.com/docs/releases/release-support-policy.html). This
is_livefalse positive, given it happens with 22.1 clusters present, could be spun as as a known issue in their release that's now out of support but is fixed in 22.2.3+.
What do you think?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
|
I'm inclined to agree. We could version gate this such that we stop gossiping in finalized 22.2 clusters while retaining 22.1 compatibility, and live with the <22.2.3 incompatibility. |
dc25d5c to
5e935c7
Compare
|
I gated this on |
5e935c7 to
158b09f
Compare
In 2114678, we stopped gossiping the connectivity graph via `GossipClientsKey`. However, this broke backwards compatibility with `cockroach node status` which relied on this key to determine `is_live` liveness. This patch partially reverts that change, by reintroducing gossip of these keys in mixed 22.1/22.2 clusters, but otherwise not using them for anything. We stop gossiping them in finalized 22.2 clusters, since the cost of gossiping these is significant in large clusters (which motivated their removal in the first place). This breaks backwards compatibility with <22.2.3, we'll just have to live with that. Epic: none Release note (bug fix): Fixed a bug where `cockroach node status` could incorrectly report nodes as `is_live = false` in mixed 22.1/22.2 clusters. The bug still exists between 22.2 patch versions before and after 22.2.3.
158b09f to
a6ee652
Compare
|
Verified the fix in a mixed-version clusters. TFTR! |
In 2114678, we stopped gossiping the connectivity graph via
GossipClientsKey. However, this broke backwards compatibility withcockroach node statuswhich relied on this key to determineis_liveliveness.This patch partially reverts that change, by reintroducing gossip of these keys in mixed 22.1/22.2 clusters, but otherwise not using them for anything. We stop gossiping them in finalized 22.2 clusters. This breaks backwards compatibility with <22.2.3, we'll just have to live with that.
Touches #89613.
Touches #51838.
Epic: none
Release note (bug fix): Fixed a bug where
cockroach node statuscould incorrectly report nodes asis_live = falsein mixed 22.1/22.2 clusters. The bug still exists between 22.2 patch versions before and after 22.2.3.