Skip to content

Conversation

@PingXie
Copy link
Contributor

@PingXie PingXie commented Apr 5, 2022

  1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname". (removed in RC3)
  2. Added a new PING extension to propagate "shard_id"
  3. Handled upgrade from pre-7.2 releases automatically
  4. Refactored PING extension assembling/parsing logic
  5. Expose the shard-id in a new command CLUSTER MYSHARDID

Behavior of Shard IDs:

  1. Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.
Release notes R1:
Introduce Shard IDs to logically group nodes in cluster mode based off replication. Shard IDs are automatically assigned and visible via `CLUSTER MYSHARDID`.

Release notes R3:
Cluster SHARD IDs are no longer visible in the cluster nodes output.

@madolson
Copy link
Contributor

madolson commented Apr 5, 2022

@PingXie Given that we are after the code cut-off for Redis 7, we probably won't include this, will probably merge this after 7 goes GA.

@PingXie
Copy link
Contributor Author

PingXie commented Apr 5, 2022

@madolson I see strong value in getting this change rolled out together with CLUSTER SHARDS as it helps complete the shard scenario end 2 end. Additionally, it has externally visible impact in areas like CLUSTER NODES, CLUSTER SHARDS, and nodes.conf, which are already updated in 7.0 so it will be great if we can piggy back on the same release (as opposed to churning these areas release after release). I will close the remaining two issues today (the missing tests and a way to return own shard_id). I am happy to prioritize this fix for 7.0 GA on my end.

@oranagra FYI

@madolson
Copy link
Contributor

madolson commented Apr 6, 2022

@PingXie I see a lot of your points, and I agree with most of them. I have two concerns. Let's focus on the implementation, if it's ready we can of course make the decision to merge it earlier.

  1. The replica should follow the shard ID of the primary, and reconcile itself automatically.
  2. You should be able to enforce a shard id, since I don't think shard ids should rotate if all nodes in a shard die.

@PingXie
Copy link
Contributor Author

PingXie commented Apr 6, 2022

@PingXie I see a lot of your points, and I agree with most of them. I have two concerns. Let's focus on the implementation, if it's ready we can of course make the decision to merge it earlier.

Sounds good @madolson.

  1. The replica should follow the shard ID of the primary, and reconcile itself automatically.

Yes, replicas will pick up their primary's shard id upon joining the shard via CLUSTER REPLICATE

  1. You should be able to enforce a shard id, since I don't think shard ids should rotate if all nodes in a shard die.

Shard ids are persisted in nodes.conf. Nodes should be able to retrieve their previous shard ids on restart, assuming their nodes.conf are not corrupted. In the case where replicas lost their nodes.conf, they can still recover their shard ids via the gossip message from their primary, if the primary's nodes.conf is still good. If, for whatever reason, all nodes.conf files in the same shard are lost, we will rotate to a new shard id.

I added a new command CLUSTER MYSHARDID to expose a node's shard id, similar to CLUSTER MYID. The reason for a new command is to reduce backcompat risk but let me know if you think otherwise.

I can update the documentation after we close this PR.

@PingXie
Copy link
Contributor Author

PingXie commented Apr 13, 2022

@madolson I pushed another commit to make CLUSTER SHARDS work better with failed primaries and this should make the shard id support complete.

Below are the externally visible changes (hence my preference to include them along with CLUSTER SHARDS in a major release).

  • Nodes.conf - added a new field in the end point column before hostname
208c9c887b8d3c148ca3b19f6585748816be54cc 127.0.0.1:30002@40002,5ec50a13c34fc1d4db518770c34c04d364b19593,hostname-1 master - 0 1649868449278 2 connected 5461-10922
  1. This is not the most logical place but it seems to me the only extensible column with minimum backcompat risk
  2. With this change, since shard-id is always on, I moved it ahead of hostname. I am assuming this incompatible change is OK as we are still in the RC phase but let me know if you have concerns
  • CLUSTER SHARDS has a new shard-id row
1) "shard-id"
   2) "5ec50a13c34fc1d4db518770c34c04d364b19593"
  • CLUSTER SHARDS now groups failed primaries with their old replicas/new primary in the same shard; previously they are separated after the failover
2) 1) "shard-id"
   2) "9eac95730a6d80fdaac65bfbea09b74647677171"
   3) "slots"
   4) 1) "0"
      2) "5460"
   5) "nodes"
   6) 1)  1) "id"
          2) "f8e798e4ea98bb2339c1b07237aab93121b25664"
          3) "port"
          4) (integer) 30001
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "hostname"
         10) ""
         11) "role"
         12) "master"
         13) "replication-offset"
         14) (integer) 602
         15) "health"
         16) "fail"
      2)  1) "id"
          2) "d0595e25f43a6242482f7380b8039af5a88f4adb"
          3) "port"
          4) (integer) 30005
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "hostname"
         10) ""
         11) "role"
         12) "master"
         13) "replication-offset"
         14) (integer) 602
         15) "health"
         16) "online"

@zuiderkwast in case you are interested - this is to improve the native shard support started with CLUSTER SHARDS

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea in general.

Regarding the form ip:port@cport,shard,hostname (which is already a pretty strange format without shard), I have two incompatible suggestions:

  1. Since every part so far uses a different separator (:@,), we could pick yet another separator for shard-id, for example $ (looks like S for shard). Then, we could be able to parse these strings from 7.0 release candidates too (with hostname, without shard), so the format would be ip:port@cport[$shard][,hostname].

  2. Now might also be a good time to make this format future-proof in some way, so that more fields can be added without breaking old nodes and clients. For example, add an URI query string in the end ?shard=s98s98dus98du&something=foo&bla=bla where unknown future keys can be safely ignored.

WDYT?

@PingXie
Copy link
Contributor Author

PingXie commented Apr 14, 2022

Regarding the form ip:port@cport,shard,hostname (which is already a pretty strange format without shard), I have two incompatible suggestions:

  1. Since every part so far uses a different separator (:@,), we could pick yet another separator for shard-id, for example $ (looks like S for shard). Then, we could be able to parse these strings from 7.0 release candidates too (with hostname, without shard), so the format would be ip:port@cport[$shard][,hostname].
  2. Now might also be a good time to make this format future-proof in some way, so that more fields can be added without breaking old nodes and clients. For example, add an URI query string in the end ?shard=s98s98dus98du&something=foo&bla=bla where unknown future keys can be safely ignored.

WDYT?

Great point on future-proofing the format now, especially your 2nd point. It is going to be super hard to introduce new fields in the future if we don't come up with a systematic solution in 7.0, even with just the hostname field. Mulling over your suggestion now ...

@PingXie
Copy link
Contributor Author

PingXie commented Apr 15, 2022

@zuiderkwast I think JSON would be the ideal long term solution. A potential migration/upgrade strategy would be to attempting to load nodes.json first and on not found fall back to loading nodes.conf and immediately writing out nodes.json. This is IMO better than hacking nodes.conf further. So one option could be to stick with ip:port@cport,shard[,hostname] in 7.0 and then make the JSON switch in 7.2. Thoughts?

Here is an example of nodes.json for illustration:

{
    "nodes": {
        "node": {
            "id": "fbc2329fc9e9f2c9f9ed9e6cdcee656df65d191c",
            "ip": "127.0.0.1",
            "port": "30001",
            "cluster-port": "40001",
            "hostname": "hostname-1",
            "role": "master",
            "shard-id": "11829057fd6af31adeb6db3ce43464e51defe0fd",
            "slots": [{
                    "start": 0,
                    "end": 10
                },
                {
                    "start": 10,
                    "end": 20
                },
                {
                    "start": 30,
                    "end": 40
                }
            ],
            "importing": [{
                    "slot": 9,
                    "from": "e6977cc6c45110a05f3e862717b1120b30589498"
                },
                {
                    "slot": 15,
                    "from": "45110a05f3e862717b1120b30589498e6977cc6c"
                }
            ],
            "migrating": [{
                    "slot": 12,
                    "to": "e6977cc6c45110a05f3e862717b1120b30589498"
                },
                {
                    "slot": 34,
                    "to": "45110a05f3e862717b1120b30589498e6977cc6c"
                }
            ]
        }
    }
}

Update - I think an in-place update scheme that reuses the same file name nodes.conf for the JSON content would work fine too.

@PingXie
Copy link
Contributor Author

PingXie commented Apr 28, 2022

@madolson this change is ready. I moved shard-id behind hostname so it should be compatible with 7.0 GA. Can you please take a look?

Pre 7.0: ip:port[@cport]
7.0: ip:port[@cport] or ip:port[@cport],hostname
7.0 + shard-id: ip:port[@cport],,shard_id or ip:port[@cport],hostname,shard_id

@PingXie
Copy link
Contributor Author

PingXie commented May 2, 2022

FYI @yossigo @oranagra - in case @madolson's bandwidth is limited recently. This PR brings native shard-id to 7.0.0 in a fully compatible way. It is ready for code review. Please feel free to let me know if there is anything that I can help to make some progress on this PR.

@PingXie
Copy link
Contributor Author

PingXie commented May 18, 2022

@madolson can you please review this change when you get a chance? Thanks!

@madolson
Copy link
Contributor

madolson commented Jun 4, 2022

@PingXie Ok!

I don't know about json, we don't have a JSON parser today in Redis and I don't think we really want to take a dependency on one. I think there are easier and more extensible ways to improve node.conf without moving to JSON though. In another PR I proposed having key/value fields both the slots data.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high level all makes sense.

@PingXie
Copy link
Contributor Author

PingXie commented Jun 9, 2022

I don't know about json, we don't have a JSON parser today in Redis and I don't think we really want to take a dependency on one. I think there are easier and more extensible ways to improve node.conf without moving to JSON though. In another PR I proposed having key/value fields both the slots data.

My primary point is to show that there is a way forward. I don't have a strong opinion that we should use JSON. Can you point me to your other PR where key/value fields were proposed? The idea sounds reasonable to me.

@PingXie PingXie mentioned this pull request Jun 16, 2022
@madolson madolson added the state:major-decision Requires core team consensus label Jun 20, 2022
@madolson
Copy link
Contributor

madolson commented Jun 20, 2022

My primary point is to show that there is a way forward. I don't have a strong opinion that we should use JSON. Can you point me to your other PR where key/value fields were proposed? The idea sounds reasonable to me.

It was discussed here: #9564. It still uses the older convention though, one of the two PRs can implement it and we can cross port it.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nitpicky comments.

@madolson madolson changed the title Fix Introduce shard ID to Redis cluster #10474 Introduce Shard IDs to logically group nodes in cluster mode Jun 20, 2022
@madolson madolson linked an issue Jun 20, 2022 that may be closed by this pull request
@madolson
Copy link
Contributor

@ushachar Will you take a look at this since we presumably want the same "shard ID" to be the same between cluster v2 and cluster v1.

@ushachar
Copy link
Contributor

@ushachar Will you take a look at this since we presumably want the same "shard ID" to be the same between cluster v2 and cluster v1.

IDs in flotilla are monotonically increasing numbers (padded in order to preserve compatibility with Cluster v1), but the general concept of this PR does seem to align well with our plans there.

@madolson
Copy link
Contributor

@ushachar One of the assumptions being discussed here is that ShardIDs may not necessarily be monotonically increasing, and externally imposed.

@PingXie
Copy link
Contributor Author

PingXie commented Jul 3, 2022

@ushachar One of the assumptions being discussed here is that ShardIDs may not necessarily be monotonically increasing, and externally imposed.

I commented on the "monotonically increasing" property on @ushachar's cluster V2 spec.

@ushachar
Copy link
Contributor

ushachar commented Jul 4, 2022

@madolson / @PingXie - Externally imposed is fine - the IDs allocated by Flotilla will still be monotonically increasing (internally, we maintain a counter and verify that the value is indeed unused before returning it)

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM, I think the only remaining question is what we want to do about making the nodes.conf more extensible for the moment.

@PingXie
Copy link
Contributor Author

PingXie commented Nov 9, 2022

Thanks for the offer @madolson. I will rebase this PR tonight and get on the documentation PR sometime this week.

@madolson madolson merged commit 203b12e into redis:unstable Nov 17, 2022
@PingXie PingXie deleted the shard_id branch November 17, 2022 07:50
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
…0536)

Introduce Shard IDs to logically group nodes in cluster mode.
1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname"
2. Added a new PING extension to propagate "shard_id"
3. Handled upgrade from pre-7.2 releases automatically
4. Refactored PING extension assembling/parsing logic

Behavior of Shard IDs:

Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.
@oranagra
Copy link
Member

oranagra commented Jul 2, 2023

@madolson please help me figure out something.
i see this PR is marked for release-notes, but i don't see it mentioned in the 7.2 release notes.
maybe you know / remember why?
it bothers me since i need to mention breaking it in 7.2-RC3 (#12166)

@madolson
Copy link
Contributor

madolson commented Jul 4, 2023

i see this PR is marked for release-notes, but i don't see it mentioned in the 7.2 release notes.

There is still the myshardid command, which reveals the shardID used by all the nodes in the shard.

@oranagra
Copy link
Member

oranagra commented Jul 5, 2023

so how come it wasn't mentioned in RC1 release notes?
could it be that i just missed it somehow? or did we discuss it and decide to skip it?

in any case, we need to decide what to do in RC3 release notes.
i can retroactively edit the RC1 release notes and then mention it was dropped in RC3 (breaking change), or just add a new note just about MYSHARDID in RC3.

btw, i don't remember being aware of that new command, and i was under the impression that when we removed the metadata in #12166 we reverted all the interface changes of this one. either i missed something or have a memory corruption.

@oranagra
Copy link
Member

oranagra commented Jul 5, 2023

i guess editing the RC1 release notes and mentioning the breaking change is the better alternative.

@madolson either way, can you please update the top comment in this PR (mention the new command) and mention the other interface changes were reverted and when...

@madolson
Copy link
Contributor

madolson commented Jul 5, 2023

so how come it wasn't mentioned in RC1 release notes?

I think I missed adding it, the bigger decision at the time was showing it in CLUSTER NODES, so it probably got overshadowed.

i can retroactively edit the RC1 release notes and then mention it was dropped in RC3 (breaking change), or just add a new note just about MYSHARDID in RC3.

I think adding that it was dropped in RC3 is the right documentation (we did break it, but in a way that makes it more compatible). This also means that for GA it's more compatible. I think documenting the command in RC3 is fine.

@oranagra oranagra mentioned this pull request Jul 10, 2023
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…0536)

Introduce Shard IDs to logically group nodes in cluster mode.
1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname"
2. Added a new PING extension to propagate "shard_id"
3. Handled upgrade from pre-7.2 releases automatically
4. Refactored PING extension assembling/parsing logic

Behavior of Shard IDs:

Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Sep 8, 2023
An unintentional change was introduced in redis#10536, we used
to use addReplyLongLong and now it is addReplyBulkLonglong,
revert it back the previous behavior.

It's already released in 7.2, so it might be a breaking change.
madolson pushed a commit that referenced this pull request Sep 10, 2023
An unintentional change was introduced in #10536, we used
to use addReplyLongLong and now it is addReplyBulkLonglong, 
revert it back the previous behavior.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Sep 23, 2023
…d 7.2

In redis#10536, we introduced the assert, some older versions of servers
(like 7.0) doesn't gossip shard_id, so we will not add the node to
cluster->shards, and node->shard_id is filled in randomly and may not
be found here.

It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots
to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow
like redis#12538.

In this PR, we remove the assert and replace it with plain if.
Fixes redis#12603.
madolson pushed a commit that referenced this pull request Oct 12, 2023
…d 7.2 (#12604)

In #10536, we introduced the assert, some older versions of servers
(like 7.0) doesn't gossip shard_id, so we will not add the node to
cluster->shards, and node->shard_id is filled in randomly and may not
be found here.

It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots
to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow
like #12538.

In this PR, we remove the assert and replace it with an unconditional removal.
oranagra pushed a commit that referenced this pull request Oct 18, 2023
An unintentional change was introduced in #10536, we used
to use addReplyLongLong and now it is addReplyBulkLonglong,
revert it back the previous behavior.

(cherry picked from commit 4031a18)
oranagra pushed a commit that referenced this pull request Oct 18, 2023
…d 7.2 (#12604)

In #10536, we introduced the assert, some older versions of servers
(like 7.0) doesn't gossip shard_id, so we will not add the node to
cluster->shards, and node->shard_id is filled in randomly and may not
be found here.

It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots
to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow
like #12538.

In this PR, we remove the assert and replace it with an unconditional removal.

(cherry picked from commit e5ef161)
ShooterIT added a commit that referenced this pull request Nov 1, 2025
- We will generate a random shard id when creating a cluster node, so
`auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it
means we never add master nodes into `cluster.shards` when loading, this
bug is introduced in #10536 that
supports `shard-id` concept. BTW, #13468 can make replicas add into
`cluster.shards`

- Replica shard_id may be different with master, so before we add again,
we should remove it, otherwise, `cluster.shards` will have dirty shards,
introduced in #13468

These bugs causes the output of the `cluster shards` is corrupt, the
temporary `slot_info` may not be cleaned up, we may see the duplicated
slot ranges, and a shard info without master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository

Projects

Status: Done
Archived in project

Development

Successfully merging this pull request may close these issues.

Introduce shard ID to Redis cluster