-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Introduce Shard IDs to logically group nodes in cluster mode #10536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@PingXie Given that we are after the code cut-off for Redis 7, we probably won't include this, will probably merge this after 7 goes GA. |
|
@madolson I see strong value in getting this change rolled out together with @oranagra FYI |
|
@PingXie I see a lot of your points, and I agree with most of them. I have two concerns. Let's focus on the implementation, if it's ready we can of course make the decision to merge it earlier.
|
Sounds good @madolson.
Yes, replicas will pick up their primary's shard id upon joining the shard via
Shard ids are persisted in nodes.conf. Nodes should be able to retrieve their previous shard ids on restart, assuming their nodes.conf are not corrupted. In the case where replicas lost their nodes.conf, they can still recover their shard ids via the gossip message from their primary, if the primary's nodes.conf is still good. If, for whatever reason, all nodes.conf files in the same shard are lost, we will rotate to a new shard id. I added a new command I can update the documentation after we close this PR. |
|
@madolson I pushed another commit to make Below are the externally visible changes (hence my preference to include them along with
@zuiderkwast in case you are interested - this is to improve the native shard support started with |
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea in general.
Regarding the form ip:port@cport,shard,hostname (which is already a pretty strange format without shard), I have two incompatible suggestions:
-
Since every part so far uses a different separator (
:@,), we could pick yet another separator for shard-id, for example$(looks like S for shard). Then, we could be able to parse these strings from 7.0 release candidates too (with hostname, without shard), so the format would beip:port@cport[$shard][,hostname]. -
Now might also be a good time to make this format future-proof in some way, so that more fields can be added without breaking old nodes and clients. For example, add an URI query string in the end
?shard=s98s98dus98du&something=foo&bla=blawhere unknown future keys can be safely ignored.
WDYT?
Great point on future-proofing the format now, especially your 2nd point. It is going to be super hard to introduce new fields in the future if we don't come up with a systematic solution in 7.0, even with just the hostname field. Mulling over your suggestion now ... |
|
@zuiderkwast I think JSON would be the ideal long term solution. A potential migration/upgrade strategy would be to attempting to load Here is an example of nodes.json for illustration: {
"nodes": {
"node": {
"id": "fbc2329fc9e9f2c9f9ed9e6cdcee656df65d191c",
"ip": "127.0.0.1",
"port": "30001",
"cluster-port": "40001",
"hostname": "hostname-1",
"role": "master",
"shard-id": "11829057fd6af31adeb6db3ce43464e51defe0fd",
"slots": [{
"start": 0,
"end": 10
},
{
"start": 10,
"end": 20
},
{
"start": 30,
"end": 40
}
],
"importing": [{
"slot": 9,
"from": "e6977cc6c45110a05f3e862717b1120b30589498"
},
{
"slot": 15,
"from": "45110a05f3e862717b1120b30589498e6977cc6c"
}
],
"migrating": [{
"slot": 12,
"to": "e6977cc6c45110a05f3e862717b1120b30589498"
},
{
"slot": 34,
"to": "45110a05f3e862717b1120b30589498e6977cc6c"
}
]
}
}
}Update - I think an in-place update scheme that reuses the same file name |
|
@madolson this change is ready. I moved shard-id behind hostname so it should be compatible with 7.0 GA. Can you please take a look? Pre 7.0: |
|
@madolson can you please review this change when you get a chance? Thanks! |
|
@PingXie Ok! I don't know about json, we don't have a JSON parser today in Redis and I don't think we really want to take a dependency on one. I think there are easier and more extensible ways to improve node.conf without moving to JSON though. In another PR I proposed having key/value fields both the slots data. |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high level all makes sense.
My primary point is to show that there is a way forward. I don't have a strong opinion that we should use JSON. Can you point me to your other PR where key/value fields were proposed? The idea sounds reasonable to me. |
It was discussed here: #9564. It still uses the older convention though, one of the two PRs can implement it and we can cross port it. |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nitpicky comments.
|
@ushachar Will you take a look at this since we presumably want the same "shard ID" to be the same between cluster v2 and cluster v1. |
IDs in flotilla are monotonically increasing numbers (padded in order to preserve compatibility with Cluster v1), but the general concept of this PR does seem to align well with our plans there. |
|
@ushachar One of the assumptions being discussed here is that ShardIDs may not necessarily be monotonically increasing, and externally imposed. |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM, I think the only remaining question is what we want to do about making the nodes.conf more extensible for the moment.
|
Thanks for the offer @madolson. I will rebase this PR tonight and get on the documentation PR sometime this week. |
…0536) Introduce Shard IDs to logically group nodes in cluster mode. 1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname" 2. Added a new PING extension to propagate "shard_id" 3. Handled upgrade from pre-7.2 releases automatically 4. Refactored PING extension assembling/parsing logic Behavior of Shard IDs: Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.
There is still the |
|
so how come it wasn't mentioned in RC1 release notes? in any case, we need to decide what to do in RC3 release notes. btw, i don't remember being aware of that new command, and i was under the impression that when we removed the metadata in #12166 we reverted all the interface changes of this one. either i missed something or have a memory corruption. |
|
i guess editing the RC1 release notes and mentioning the breaking change is the better alternative. @madolson either way, can you please update the top comment in this PR (mention the new command) and mention the other interface changes were reverted and when... |
I think I missed adding it, the bigger decision at the time was showing it in
I think adding that it was dropped in RC3 is the right documentation (we did break it, but in a way that makes it more compatible). This also means that for GA it's more compatible. I think documenting the command in RC3 is fine. |
…0536) Introduce Shard IDs to logically group nodes in cluster mode. 1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname" 2. Added a new PING extension to propagate "shard_id" 3. Handled upgrade from pre-7.2 releases automatically 4. Refactored PING extension assembling/parsing logic Behavior of Shard IDs: Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.
An unintentional change was introduced in redis#10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior. It's already released in 7.2, so it might be a breaking change.
An unintentional change was introduced in #10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior.
…d 7.2 In redis#10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like redis#12538. In this PR, we remove the assert and replace it with plain if. Fixes redis#12603.
…d 7.2 (#12604) In #10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like #12538. In this PR, we remove the assert and replace it with an unconditional removal.
…d 7.2 (#12604) In #10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like #12538. In this PR, we remove the assert and replace it with an unconditional removal. (cherry picked from commit e5ef161)
- We will generate a random shard id when creating a cluster node, so `auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it means we never add master nodes into `cluster.shards` when loading, this bug is introduced in #10536 that supports `shard-id` concept. BTW, #13468 can make replicas add into `cluster.shards` - Replica shard_id may be different with master, so before we add again, we should remove it, otherwise, `cluster.shards` will have dirty shards, introduced in #13468 These bugs causes the output of the `cluster shards` is corrupt, the temporary `slot_info` may not be cleaned up, we may see the duplicated slot ranges, and a shard info without master.
CLUSTER MYSHARDIDBehavior of Shard IDs: