Generate a new shard_id when the replica executes CLUSTER RESET SOFT#2283
Conversation
When a replica doing the CLUSTER RESET SOFT, we should generate a new shard_id for it. Since if the node was a replica, it inherits the shard_id of its primary, and after the CLUSTER RESET SOFT, the replica become a new empty primary, it should has its own shard_id. Otherwise, when it joins back into the cluster or responds the PONG, we will have two primaries in the same shard. Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2283 +/- ##
============================================
- Coverage 71.52% 71.46% -0.06%
============================================
Files 123 123
Lines 66926 67032 +106
============================================
+ Hits 47868 47905 +37
- Misses 19058 19127 +69
🚀 New features to boost your workflow:
|
hpatro
left a comment
There was a problem hiding this comment.
Agree with the change around shard id generation. But would like the extension flag marking part left as is.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
@hpatro and I discussed and decided to temporarily remove link->support_extension for now. After this merge, we will discuss it in another new PR. The new link->extension is aim for this case:
That is something like that. So if we don't have link->extension, Node B needs to wait until the cluster meets (I remember we have a compensation mechanism that will allow node B to eventually meet back into the cluster.) again before sending the extension with link->support_extension time: without the link->support_extension |
… faster during handshake (#2310) Add support_extension to cluster link, so that when myself receives a message from sender, even if myself does not know the sender, for example during handshake the sender node is still treated as random node, in thi case, the packet we reply to sender can also carry the extension, since the sender link explicitly indicates that it supports the extension, by doing this, we can speed up the spread of the extension. Also see #2283 and #2279 for more details.
When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>
When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see #2283. However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of #2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>
…3192) When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
When a replica doing the CLUSTER RESET SOFT, we should generate a
new shard_id for it. Since if the node was a replica, it inherits
the shard_id of its primary, and after the CLUSTER RESET SOFT, the
replica become a new empty primary, it should has its own shard_id.
Otherwise, when it joins back into the cluster or responds the PONG,
we will have two primaries in the same shard.