Introduce Shard IDs to logically group nodes in cluster mode #10536

PingXie · 2022-04-05T19:00:17Z

Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname". (removed in RC3)
Added a new PING extension to propagate "shard_id"
Handled upgrade from pre-7.2 releases automatically
Refactored PING extension assembling/parsing logic
Expose the shard-id in a new command CLUSTER MYSHARDID

Behavior of Shard IDs:

Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.

Release notes R1:
Introduce Shard IDs to logically group nodes in cluster mode based off replication. Shard IDs are automatically assigned and visible via `CLUSTER MYSHARDID`.

Release notes R3:
Cluster SHARD IDs are no longer visible in the cluster nodes output.

madolson · 2022-04-05T23:39:42Z

@PingXie Given that we are after the code cut-off for Redis 7, we probably won't include this, will probably merge this after 7 goes GA.

PingXie · 2022-04-05T23:52:18Z

@madolson I see strong value in getting this change rolled out together with CLUSTER SHARDS as it helps complete the shard scenario end 2 end. Additionally, it has externally visible impact in areas like CLUSTER NODES, CLUSTER SHARDS, and nodes.conf, which are already updated in 7.0 so it will be great if we can piggy back on the same release (as opposed to churning these areas release after release). I will close the remaining two issues today (the missing tests and a way to return own shard_id). I am happy to prioritize this fix for 7.0 GA on my end.

@oranagra FYI

madolson · 2022-04-06T04:08:17Z

@PingXie I see a lot of your points, and I agree with most of them. I have two concerns. Let's focus on the implementation, if it's ready we can of course make the decision to merge it earlier.

The replica should follow the shard ID of the primary, and reconcile itself automatically.
You should be able to enforce a shard id, since I don't think shard ids should rotate if all nodes in a shard die.

PingXie · 2022-04-06T05:00:43Z

@PingXie I see a lot of your points, and I agree with most of them. I have two concerns. Let's focus on the implementation, if it's ready we can of course make the decision to merge it earlier.

Sounds good @madolson.

The replica should follow the shard ID of the primary, and reconcile itself automatically.

Yes, replicas will pick up their primary's shard id upon joining the shard via CLUSTER REPLICATE

You should be able to enforce a shard id, since I don't think shard ids should rotate if all nodes in a shard die.

Shard ids are persisted in nodes.conf. Nodes should be able to retrieve their previous shard ids on restart, assuming their nodes.conf are not corrupted. In the case where replicas lost their nodes.conf, they can still recover their shard ids via the gossip message from their primary, if the primary's nodes.conf is still good. If, for whatever reason, all nodes.conf files in the same shard are lost, we will rotate to a new shard id.

I added a new command CLUSTER MYSHARDID to expose a node's shard id, similar to CLUSTER MYID. The reason for a new command is to reduce backcompat risk but let me know if you think otherwise.

I can update the documentation after we close this PR.

PingXie · 2022-04-13T17:06:35Z

@madolson I pushed another commit to make CLUSTER SHARDS work better with failed primaries and this should make the shard id support complete.

Below are the externally visible changes (hence my preference to include them along with CLUSTER SHARDS in a major release).

Nodes.conf - added a new field in the end point column before hostname

208c9c887b8d3c148ca3b19f6585748816be54cc 127.0.0.1:30002@40002,5ec50a13c34fc1d4db518770c34c04d364b19593,hostname-1 master - 0 1649868449278 2 connected 5461-10922

This is not the most logical place but it seems to me the only extensible column with minimum backcompat risk
With this change, since shard-id is always on, I moved it ahead of hostname. I am assuming this incompatible change is OK as we are still in the RC phase but let me know if you have concerns

CLUSTER SHARDS has a new shard-id row

1) "shard-id"
   2) "5ec50a13c34fc1d4db518770c34c04d364b19593"

CLUSTER SHARDS now groups failed primaries with their old replicas/new primary in the same shard; previously they are separated after the failover

2) 1) "shard-id"
   2) "9eac95730a6d80fdaac65bfbea09b74647677171"
   3) "slots"
   4) 1) "0"
      2) "5460"
   5) "nodes"
   6) 1)  1) "id"
          2) "f8e798e4ea98bb2339c1b07237aab93121b25664"
          3) "port"
          4) (integer) 30001
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "hostname"
         10) ""
         11) "role"
         12) "master"
         13) "replication-offset"
         14) (integer) 602
         15) "health"
         16) "fail"
      2)  1) "id"
          2) "d0595e25f43a6242482f7380b8039af5a88f4adb"
          3) "port"
          4) (integer) 30005
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "hostname"
         10) ""
         11) "role"
         12) "master"
         13) "replication-offset"
         14) (integer) 602
         15) "health"
         16) "online"

@zuiderkwast in case you are interested - this is to improve the native shard support started with CLUSTER SHARDS

zuiderkwast

I like the idea in general.

Regarding the form ip:port@cport,shard,hostname (which is already a pretty strange format without shard), I have two incompatible suggestions:

Since every part so far uses a different separator (:@,), we could pick yet another separator for shard-id, for example $ (looks like S for shard). Then, we could be able to parse these strings from 7.0 release candidates too (with hostname, without shard), so the format would be ip:port@cport[$shard][,hostname].
Now might also be a good time to make this format future-proof in some way, so that more fields can be added without breaking old nodes and clients. For example, add an URI query string in the end ?shard=s98s98dus98du&something=foo&bla=bla where unknown future keys can be safely ignored.

WDYT?

src/cluster.h

PingXie · 2022-04-14T17:28:05Z

Regarding the form ip:port@cport,shard,hostname (which is already a pretty strange format without shard), I have two incompatible suggestions:

Since every part so far uses a different separator (:@,), we could pick yet another separator for shard-id, for example $ (looks like S for shard). Then, we could be able to parse these strings from 7.0 release candidates too (with hostname, without shard), so the format would be ip:port@cport[$shard][,hostname].

Now might also be a good time to make this format future-proof in some way, so that more fields can be added without breaking old nodes and clients. For example, add an URI query string in the end ?shard=s98s98dus98du&something=foo&bla=bla where unknown future keys can be safely ignored.

WDYT?

Great point on future-proofing the format now, especially your 2nd point. It is going to be super hard to introduce new fields in the future if we don't come up with a systematic solution in 7.0, even with just the hostname field. Mulling over your suggestion now ...

PingXie · 2022-04-15T01:12:41Z

@zuiderkwast I think JSON would be the ideal long term solution. A potential migration/upgrade strategy would be to attempting to load nodes.json first and on not found fall back to loading nodes.conf and immediately writing out nodes.json. This is IMO better than hacking nodes.conf further. So one option could be to stick with ip:port@cport,shard[,hostname] in 7.0 and then make the JSON switch in 7.2. Thoughts?

Here is an example of nodes.json for illustration:

{
    "nodes": {
        "node": {
            "id": "fbc2329fc9e9f2c9f9ed9e6cdcee656df65d191c",
            "ip": "127.0.0.1",
            "port": "30001",
            "cluster-port": "40001",
            "hostname": "hostname-1",
            "role": "master",
            "shard-id": "11829057fd6af31adeb6db3ce43464e51defe0fd",
            "slots": [{
                    "start": 0,
                    "end": 10
                },
                {
                    "start": 10,
                    "end": 20
                },
                {
                    "start": 30,
                    "end": 40
                }
            ],
            "importing": [{
                    "slot": 9,
                    "from": "e6977cc6c45110a05f3e862717b1120b30589498"
                },
                {
                    "slot": 15,
                    "from": "45110a05f3e862717b1120b30589498e6977cc6c"
                }
            ],
            "migrating": [{
                    "slot": 12,
                    "to": "e6977cc6c45110a05f3e862717b1120b30589498"
                },
                {
                    "slot": 34,
                    "to": "45110a05f3e862717b1120b30589498e6977cc6c"
                }
            ]
        }
    }
}

Update - I think an in-place update scheme that reuses the same file name nodes.conf for the JSON content would work fine too.

PingXie · 2022-04-28T05:58:06Z

@madolson this change is ready. I moved shard-id behind hostname so it should be compatible with 7.0 GA. Can you please take a look?

Pre 7.0: ip:port[@cport]
7.0: ip:port[@cport] or ip:port[@cport],hostname
7.0 + shard-id: ip:port[@cport],,shard_id or ip:port[@cport],hostname,shard_id

PingXie · 2022-05-02T06:57:32Z

FYI @yossigo @oranagra - in case @madolson's bandwidth is limited recently. This PR brings native shard-id to 7.0.0 in a fully compatible way. It is ready for code review. Please feel free to let me know if there is anything that I can help to make some progress on this PR.

PingXie · 2022-05-18T05:59:49Z

@madolson can you please review this change when you get a chance? Thanks!

madolson · 2022-06-04T00:35:36Z

@PingXie Ok!

I don't know about json, we don't have a JSON parser today in Redis and I don't think we really want to take a dependency on one. I think there are easier and more extensible ways to improve node.conf without moving to JSON though. In another PR I proposed having key/value fields both the slots data.

madolson

The high level all makes sense.

src/cluster.c

PingXie · 2022-06-09T05:26:22Z

I don't know about json, we don't have a JSON parser today in Redis and I don't think we really want to take a dependency on one. I think there are easier and more extensible ways to improve node.conf without moving to JSON though. In another PR I proposed having key/value fields both the slots data.

My primary point is to show that there is a way forward. I don't have a strong opinion that we should use JSON. Can you point me to your other PR where key/value fields were proposed? The idea sounds reasonable to me.

madolson · 2022-06-20T22:59:57Z

My primary point is to show that there is a way forward. I don't have a strong opinion that we should use JSON. Can you point me to your other PR where key/value fields were proposed? The idea sounds reasonable to me.

It was discussed here: #9564. It still uses the older convention though, one of the two PRs can implement it and we can cross port it.

madolson

Some minor nitpicky comments.

src/cluster.c

madolson · 2022-06-21T04:05:03Z

@ushachar Will you take a look at this since we presumably want the same "shard ID" to be the same between cluster v2 and cluster v1.

ushachar · 2022-06-23T22:32:52Z

@ushachar Will you take a look at this since we presumably want the same "shard ID" to be the same between cluster v2 and cluster v1.

IDs in flotilla are monotonically increasing numbers (padded in order to preserve compatibility with Cluster v1), but the general concept of this PR does seem to align well with our plans there.

madolson · 2022-06-28T03:40:51Z

@ushachar One of the assumptions being discussed here is that ShardIDs may not necessarily be monotonically increasing, and externally imposed.

PingXie · 2022-07-03T20:36:22Z

@ushachar One of the assumptions being discussed here is that ShardIDs may not necessarily be monotonically increasing, and externally imposed.

I commented on the "monotonically increasing" property on @ushachar's cluster V2 spec.

ushachar · 2022-07-04T13:58:24Z

@madolson / @PingXie - Externally imposed is fine - the IDs allocated by Flotilla will still be monotonically increasing (internally, we maintain a counter and verify that the value is indeed unused before returning it)

madolson

Change LGTM, I think the only remaining question is what we want to do about making the nodes.conf more extensible for the moment.

PingXie · 2022-11-09T03:27:51Z

Thanks for the offer @madolson. I will rebase this PR tonight and get on the documentation PR sometime this week.

…0536) Introduce Shard IDs to logically group nodes in cluster mode. 1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname" 2. Added a new PING extension to propagate "shard_id" 3. Handled upgrade from pre-7.2 releases automatically 4. Refactored PING extension assembling/parsing logic Behavior of Shard IDs: Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.

oranagra · 2023-07-02T07:17:26Z

@madolson please help me figure out something.
i see this PR is marked for release-notes, but i don't see it mentioned in the 7.2 release notes.
maybe you know / remember why?
it bothers me since i need to mention breaking it in 7.2-RC3 (#12166)

madolson · 2023-07-04T16:03:28Z

i see this PR is marked for release-notes, but i don't see it mentioned in the 7.2 release notes.

There is still the myshardid command, which reveals the shardID used by all the nodes in the shard.

oranagra · 2023-07-05T06:16:04Z

so how come it wasn't mentioned in RC1 release notes?
could it be that i just missed it somehow? or did we discuss it and decide to skip it?

in any case, we need to decide what to do in RC3 release notes.
i can retroactively edit the RC1 release notes and then mention it was dropped in RC3 (breaking change), or just add a new note just about MYSHARDID in RC3.

btw, i don't remember being aware of that new command, and i was under the impression that when we removed the metadata in #12166 we reverted all the interface changes of this one. either i missed something or have a memory corruption.

oranagra · 2023-07-05T06:19:36Z

i guess editing the RC1 release notes and mentioning the breaking change is the better alternative.

@madolson either way, can you please update the top comment in this PR (mention the new command) and mention the other interface changes were reverted and when...

madolson · 2023-07-05T15:46:14Z

so how come it wasn't mentioned in RC1 release notes?

I think I missed adding it, the bigger decision at the time was showing it in CLUSTER NODES, so it probably got overshadowed.

i can retroactively edit the RC1 release notes and then mention it was dropped in RC3 (breaking change), or just add a new note just about MYSHARDID in RC3.

I think adding that it was dropped in RC3 is the right documentation (we did break it, but in a way that makes it more compatible). This also means that for GA it's more compatible. I think documenting the command in RC3 is fine.

…0536) Introduce Shard IDs to logically group nodes in cluster mode. 1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname" 2. Added a new PING extension to propagate "shard_id" 3. Handled upgrade from pre-7.2 releases automatically 4. Refactored PING extension assembling/parsing logic Behavior of Shard IDs: Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.

src/cluster.c

An unintentional change was introduced in redis#10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior. It's already released in 7.2, so it might be a breaking change.

An unintentional change was introduced in #10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior.

…d 7.2 In redis#10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like redis#12538. In this PR, we remove the assert and replace it with plain if. Fixes redis#12603.

…d 7.2 (#12604) In #10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like #12538. In this PR, we remove the assert and replace it with an unconditional removal.

An unintentional change was introduced in #10536, we used to use addReplyLongLong and now it is addReplyBulkLonglong, revert it back the previous behavior. (cherry picked from commit 4031a18)

…d 7.2 (#12604) In #10536, we introduced the assert, some older versions of servers (like 7.0) doesn't gossip shard_id, so we will not add the node to cluster->shards, and node->shard_id is filled in randomly and may not be found here. It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow like #12538. In this PR, we remove the assert and replace it with an unconditional removal. (cherry picked from commit e5ef161)

- We will generate a random shard id when creating a cluster node, so `auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it means we never add master nodes into `cluster.shards` when loading, this bug is introduced in #10536 that supports `shard-id` concept. BTW, #13468 can make replicas add into `cluster.shards` - Replica shard_id may be different with master, so before we add again, we should remove it, otherwise, `cluster.shards` will have dirty shards, introduced in #13468 These bugs causes the output of the `cluster shards` is corrupt, the temporary `slot_info` may not be cleaned up, we may see the duplicated slot ranges, and a shard info without master.

PingXie mentioned this pull request Apr 5, 2022

Introduce shard ID to Redis cluster #10474

Closed

zuiderkwast reviewed Apr 14, 2022

View reviewed changes

src/cluster.h Outdated Show resolved Hide resolved

PingXie mentioned this pull request May 1, 2022

Improve slot migration reliability #10517

Closed

madolson reviewed Jun 4, 2022

View reviewed changes

PingXie mentioned this pull request Jun 16, 2022

Gossiped node deletion #10861

Closed

madolson added the state:major-decision Requires core team consensus label Jun 20, 2022

madolson reviewed Jun 20, 2022

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

madolson changed the title ~~Fix Introduce shard ID to Redis cluster #10474~~ Introduce Shard IDs to logically group nodes in cluster mode Jun 20, 2022

madolson linked an issue Jun 20, 2022 that may be closed by this pull request

Introduce shard ID to Redis cluster #10474

Closed

This was referenced Jun 21, 2022

Add specification for a new cluster implementation. #10875

Open

Cluster human readable nodename feature #9564

Merged

madolson reviewed Jul 4, 2022

View reviewed changes

Merge branch 'unstable' into shard_id

159ac9e

oranagra assigned yossigo Nov 9, 2022

PingXie mentioned this pull request Nov 13, 2022

Add cluster myshardid command redis/redis-doc#2211

Merged

madolson merged commit 203b12e into redis:unstable Nov 17, 2022

PingXie deleted the shard_id branch November 17, 2022 07:50

oranagra mentioned this pull request Jul 2, 2023

Exclude aux fields from "cluster nodes" and "cluster replicas" output #12166

Merged

oranagra mentioned this pull request Jul 10, 2023

Relase 7.2-RC3 #12402

Merged

enjoy-binbin reviewed Sep 7, 2023

View reviewed changes

src/cluster.c Show resolved Hide resolved

enjoy-binbin mentioned this pull request Sep 8, 2023

Fix that slot return in CLUSTER SHARDS should be integer #12561

Merged

enjoy-binbin mentioned this pull request Sep 23, 2023

Fix crash when running rebalance command in a mixed cluster of 7.0 and 7.2 #12604

Merged

enjoy-binbin mentioned this pull request Dec 4, 2023

Fix clusterLoadConfig aux_argv minor memory leak #12726

Merged

mp911de mentioned this pull request Mar 4, 2024

Can't Handle GCP Memory Store Redis having extra characters in ClusterNode Entry spring-projects/spring-data-redis#2862

Closed

ShooterIT mentioned this pull request Oct 31, 2025

Fix loading old version node.conf without shard_id #14495

Merged

Introduce Shard IDs to logically group nodes in cluster mode #10536

Introduce Shard IDs to logically group nodes in cluster mode #10536

Uh oh!

Conversation

PingXie commented Apr 5, 2022 • edited by madolson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Apr 5, 2022

Uh oh!

PingXie commented Apr 5, 2022

Uh oh!

madolson commented Apr 6, 2022

Uh oh!

PingXie commented Apr 6, 2022

Uh oh!

PingXie commented Apr 13, 2022

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PingXie commented Apr 14, 2022

Uh oh!

PingXie commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PingXie commented Apr 28, 2022

Uh oh!

PingXie commented May 2, 2022

Uh oh!

PingXie commented May 18, 2022

Uh oh!

madolson commented Jun 4, 2022

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PingXie commented Jun 9, 2022

Uh oh!

madolson commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson commented Jun 21, 2022

Uh oh!

ushachar commented Jun 23, 2022

Uh oh!

madolson commented Jun 28, 2022

Uh oh!

PingXie commented Jul 3, 2022

Uh oh!

ushachar commented Jul 4, 2022

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

PingXie commented Nov 9, 2022

Uh oh!

oranagra commented Jul 2, 2023

Uh oh!

madolson commented Jul 4, 2023

Uh oh!

oranagra commented Jul 5, 2023

Uh oh!

oranagra commented Jul 5, 2023

PingXie commented Apr 5, 2022 •

edited by madolson

Loading

PingXie commented Apr 15, 2022 •

edited

Loading

madolson commented Jun 20, 2022 •

edited

Loading