Skip to content

Conversation

@zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Jun 16, 2022

Gossip the cluster node blacklist in ping and pong messages. This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster. It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a cluster bus ping extension (introduced in #9530).

Release notes:
Automatically propagate node deletion to other nodes in a cluster when `CLUSTER FORGET` is called, allowing nodes to be deleted with a single call in most cases.

Fixes #10861.

@zuiderkwast zuiderkwast requested a review from madolson June 16, 2022 13:09
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. This will be sequenced after @PingXie Cr once we start merging 7.2 stuff.

@madolson madolson changed the title Gossip forgotten nodes Gossip forgotten nodes on CLUSTER FORGET Jul 12, 2022
@madolson madolson added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes approval-needed Waiting for core team approval to be merged labels Jul 12, 2022
@madolson
Copy link
Contributor

@redis/core-team The major change here is that CLUSTER FORGET will now attempt to gossip the deletion to all nodes in the cluster. You can still use the old style of calling CLUSTER FORGET on all nodes without errors.

@oranagra
Copy link
Member

LGTM on high level.
i understand that the implication on older versions (mixed cluster) is that older nodes will print a log warning, and will require an explicit FORGET.
so the feature can be relied on, only if all nodes supports this, but will not cause harm in mixed clusters.

I understand redis-cli still sends that command to all nodes (and we can probably never change that), so is this change just to resolve a race condition (mentioned in the related issue?).
let's copy the relevant justification text to the top comment here (for the purpose of release notes).

while on that topic, considering that it's gonna take a long time before we actually release it, i wanna propose a new guideline:
whenever we tag an issue to be mentioned in the release notes, we also add a line in the top comment of the PR with a short text to be copied to the release notes.
something like:

Release notes: 
<some short text that's enough to draw the reader attention if this might affect them, or cause them to skip otherwise>

WDYT?

@madolson
Copy link
Contributor

@oranagra Since you are the one that is generating the release notes most of the time, I'm happy to follow/enforce that.

@madolson
Copy link
Contributor

@zuiderkwast do you want to rebase?

@oranagra oranagra merged commit 5032de5 into redis:unstable Jul 26, 2022
@oranagra
Copy link
Member

tests sporadically failed on ARM + TLS
https://github.com/redis/redis-extra-ci/runs/7935419226?check_suite_focus=true

*** [err]: Set cluster hostnames and verify they are propagated in tests/unit/cluster/hostnames.tcl
cluster config did not reach a consistent state
*** [err]: Update hostnames and make sure they are all eventually propagated in tests/unit/cluster/hostnames.tcl
cluster config did not reach a consistent state
*** [err]: Remove hostnames and make sure they are all eventually propagated in tests/unit/cluster/hostnames.tcl
cluster config did not reach a consistent state
*** [err]: Verify cluster-preferred-endpoint-type behavior for redirects and info in tests/unit/cluster/hostnames.tcl
cluster config did not reach a consistent state

@zuiderkwast
Copy link
Contributor Author

tests sporadically failed on ARM + TLS https://github.com/redis/redis-extra-ci/runs/7935419226?check_suite_focus=true

cluster config did not reach a consistent state

Any idea why and what we can do about it? Shall we increase the time to wait to reach a consistent cluster state?

@zuiderkwast zuiderkwast deleted the gossip-forgotten-nodes branch August 22, 2022 08:52
@madolson
Copy link
Contributor

Increasing the timing seems like one such option, we can see if it helps. An immediate change is we should probably print out the delta.

@enjoy-binbin
Copy link
Contributor

i see it's usually fast, so i suppose increase the time won't help much

[ok]: Set cluster hostnames and verify they are propagated (311 ms)
[ok]: Update hostnames and make sure they are all eventually propagated (309 ms)
[ok]: Remove hostnames and make sure they are all eventually propagated (312 ms)
[ok]: Verify cluster-preferred-endpoint-type behavior for redirects and info (307 ms)

@madolson
Copy link
Contributor

It might be a failover causing some havoc? I have seen that happening on some of the other ARM tests. We should also probably clean up the logging a bit, it's hard to follow.

enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
Gossip the cluster node blacklist in ping and pong messages.
This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster.
It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a
cluster bus ping extension (introduced in redis#9530).
nosammai added a commit to nosammai/docs that referenced this pull request Jan 15, 2025
Update CLUSTER FORGET docs for changes in redis/redis#10869
nosammai added a commit to nosammai/redis that referenced this pull request Jan 15, 2025
dwdougherty added a commit to redis/docs that referenced this pull request Jan 15, 2025
* Update forget docs for ban-list propagation

Update CLUSTER FORGET docs for changes in redis/redis#10869

* Update content/commands/cluster-forget/index.md

Co-authored-by: David Dougherty <david.dougherty@redis.com>

---------

Co-authored-by: David Dougherty <david.dougherty@redis.com>
sundb added a commit that referenced this pull request Jan 27, 2025
Update CLUSTER FORGET docs for changes in
#10869

Docs PR:
redis/docs#1057

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
Update CLUSTER FORGET docs for changes in
redis#10869

Docs PR:
redis/docs#1057

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Gossiped node deletion

7 participants