Add cluster shards support. #10293

hpatro · 2022-02-13T16:39:45Z

Implement a new CLUSTER SHARDS command which provides more information to clients in an extensible format. This intended to be an alternative to CLUSTER SLOTS that provides more information and is more efficient for clusters with disjoint slot assignment. It is also to help migrate some clients currently using CLUSTER NODES.

The cluster slot command is a map of "shards", where a shard is a collection of nodes serving the same set of slots. If a node is serving no slots, it is said to exist in its own shard. Each shard contains a dictionary with two properties, "slots" and "nodes". Slots are a list of two value integer which indicate start and stop slot ranges. Nodes is a dictionary that contains the following attributres:

ID: The unique ID for this node.
port: The TCP port for this node, if available.
tls-port: The TLS port for this node, if available.
hostname: The propagated hostname for this node, if available.
endpoint: The preferred endpoint for the node. See https://redis.io/commands/cluster-slots for what the value of this represents.
role: Either master or replica. Please consider the alternative where we say the "first" node in the list is the master.
replication-offset: Reported for masters and replicas. This field can be used to read from the most "up-to date" replica, or checking how far behind replicas are from the master.
health: The reported health of this node to clients, one of three values:
- online: The node is online and healthy.
- fail: The node is confirmed dead.
- loading: The node is currently doing a full sync, so don't try to connect to it.

(Note, I think endpoint is missing, also replication-offset is missing from masters)

Sample request/response:

127.0.0.1:6379> CLUSTER SHARDS
1) 1) "slots"
   2) 1) (integer) 10923
      2) (integer) 11110
      3) (integer) 11113
      4) (integer) 16111
      5) (integer) 16113
      6) (integer) 16383
   3) "nodes"
   4) 1)  1) "id"
          2) "71f058078c142a73b94767a4e78e9033d195dfb4"
          3) "port"
          4) (integer) 6381
          5) "ip"
          6) "127.0.0.1"
          7) "role"
          8) "primary"
          9) "health"
         10) "online"
      2)  1) "id"
          2) "1461967c62eab0e821ed54f2c98e594fccfd8736"
          3) "port"
          4) (integer) 7381
          5) "ip"
          6) "127.0.0.1"
          7) "role"
          8) "replica"
          9) "replication-offset"
         10) (integer) 0
         11) "health"
         12) "fail"
2) 1) "slots"
   2) 1) (integer) 5461
      2) (integer) 10922
   3) "nodes"
   4) 1)  1) "id"
          2) "9215e30cd4a71070088778080565de6ef75fd459"
          3) "port"
          4) (integer) 6380
          5) "ip"
          6) "127.0.0.1"
          7) "role"
          8) "primary"
          9) "health"
         10) "online"
      2)  1) "id"
          2) "877fa59da72cb902d0563d3d8def3437fc3a0196"
          3) "port"
          4) (integer) 7380
          5) "ip"
          6) "127.0.0.1"
          7) "role"
          8) "replica"
          9) "replication-offset"
         10) (integer) 0
         11) "health"
         12) "loading"
3) 1) "slots"
   2) 1) (integer) 0
      2) (integer) 5460
      3) (integer) 11111
      4) (integer) 11112
      3) (integer) 16112
      4) (integer) 16112
   3) "nodes"
   4) 1)  1) "id"
          2) "b7e9acc0def782aabe6b596f67f06c73c2ffff93"
          3) "port"
          4) (integer) 7379
          5) "ip"
          6) "127.0.0.1"
          7) "role"
          8) "primary"
          9) "health"
         10) "online"
      2)  1) "id"
          2) "e2acf1a97c055fd09dcc2c0dcc62b19a6905dbc8"
          3) "port"
          4) (integer) 6379
          5) "ip"
          6) "127.0.0.1"
          7) "hostname"
          8) "host-1.com"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 0
         13) "health"
         14) "loading"

Open considerations

Should we expose cluster epoch in some capacity.
- Pros: Clients will be able to compare multiple different slot responses, and compare epochs. (Omitted for now, we can add it later)
- Cons: Epochs are not coherent except for mastership membership, so not a great indicator of anything.
Should we also expose PFAIL (or timedout) as a health indicator. (Omitted, since it's not a reliable indicator for node health and we don't want clients to use it)
- Pros: Some clients want to be able to know a node is timedout and might be in fail soon. This may also be useful in split network situations, but I'm not clear of that use case.
- Cons: The time delta between pfail and fail for a hard down node is pretty small, since it will start gossiping the failure immediately. Clients might implement extra logic to handle this case.

Code changes
Test cases for slot/slotrange/nodes/roles/ip/port/node_name
Test cases for health as LOADING
Test cases for endpoint

PingXie · 2022-02-13T20:20:50Z

src/cluster.c

+    void *shard_replylen = addReplyDeferredLen(c);
+    int shard_cnt = 0;
+    dict *nodeToSlotPair = dictCreate(&clusterNodesToSlotDictType);
+    for (int i = 0; i <= CLUSTER_SLOTS; i++) {


This exact slot enumeration logic has already been used twice in cluster.c. I wonder if it helps to single source it with a callback pattern that takes two parameters: the start index and the length. More specifically, I am thinking that the code block between line 5080 and 5094, inclusive, can be moved into a callback function while the rest of the for loop can be extracted into a generic enumeration function.

I've thought about how to deduplicate this slot range enumeration logic too. My idea is create an iterator function which returns each slot range of a node:

int start = 0, end = 0; while (nextSlotRange(node, &start, &end)) { // do something for each range }

I have an implementation. I can post it tomorrow.

Worth mentioning we have two types of iterations:

We do a straight iteration through all of the slots and group them by slot ranges (see CLUSTER SLOTS)

We want a result which is the ordered list of nodes and then we want to iterate through those node's slots (see CLUSTER NODES.

Need to make sure the iterator supports both, or still makes 2 easily attainable.

@madolson Good point. The version posted below only iterates over a single node (i.e. it covers 2 but not 1) but I guess it can be adapted, or possibly 1 is different enough to have a separate iteration function for that.

int nextSlotRange(clusterNode *node, int *begin, int *end);

/* Iterate over the slot ranges of a node. Use like this: * * int start = 0, end = 0; * while (nextSlotRange(node, &start, &end) { * // This node has range start..end. * } */ int nextSlotRange(clusterNode *node, int *begin, int *end) { int j; int start = -1; for (j = (*end == 0) ? 0 : *end + 1; j < CLUSTER_SLOTS; j++) { int bit = clusterNodeGetSlotBit(node, j); if (bit && start == -1) start = j; if (start != -1 && (!bit || j == CLUSTER_SLOTS-1)) { if (bit && j == CLUSTER_SLOTS-1) j++; *begin = start; *end = j - 1; return 1; } } return 0; }

@zuiderkwast I have added a conditional check if the node provided is null, we move on to the next slot. WDYT ?

int nextSlotRange(clusterNode *node, int *begin, int *end) { int j; int start = -1; if (node == NULL) { *begin = *begin + 1; *end = *begin; if (*end == CLUSTER_SLOTS) { return 0; } else { return 1; } } for (j = (*end == 0) ? 0 : *end + 1; j < CLUSTER_SLOTS; j++) { int bit = clusterNodeGetSlotBit(node, j); if (bit && start == -1) start = j; if (start != -1 && (!bit || j == CLUSTER_SLOTS - 1)) { if (bit && j == CLUSTER_SLOTS - 1) j++; *begin = start; *end = j - 1; return 1; } } return 0; }

int start = 0, end = 0; clusterNode *n = server.cluster->slots[start]; while (nextSlotRange(n, &start, &end)) { if (n != NULL) { sds name = sdsnewlen(n->name, CLUSTER_NAMELEN); dictEntry *de = dictFind(nodeToSlotPair, name); sdsfree(name); list *slotPair; if (de == NULL) { slotPair = listCreate(); dictAdd(nodeToSlotPair, sdsnewlen(n->name, CLUSTER_NAMELEN), slotPair); } else { slotPair = dictGetVal(de); } if (start != end) { listAddNodeTail(slotPair, sdscatfmt(sdsempty(), "%i-%i", start, end)); } else { listAddNodeTail(slotPair, sdscatfmt(sdsempty(), "%i", start)); } } if ((end + 1) != CLUSTER_SLOTS) { n = server.cluster->slots[end + 1]; } }

@hpatro if n == NULL, the function returns a range with a single slot for no node? I think it is strange. It relies on the caller to make sure that begin + 1 has no node. Maybe you meant *begin = *end + 1? I think the next range starts where the previous range ends. Or did you change this?

Anyway, this function cannot check this slot has no node, so I think it is strange. Better check if n == NULL and find node with a slot before calling the iterator function.

Maybe we need another function nextSlotRangeAndNode(*begin, *end, *node) which also returns the node of the next range...? This logic is different and it is less complex to make them separate functions IMO.

int start = 0, end = 0; clusterNode *n; while (nextSlotRangeAndNode(&start, &end, &n)) { if (n == NULL) continue; // use node and range }

Implementation (untested):

/* Finds the next slot range and its node, starting at end + 1. The returned node * is NULL if the slot range is unassigned. */ int nextSlotRangeAndNode(int *begin, int *end, clusterNode **node) { int start = (*end == 0) ? 0 : *end + 1; if (start >= CLUSTER_SLOTS) return 0; clusterNode *n = server.cluster->slots[start]; int j = start; while (j < CLUSTER_SLOTS && server.cluster->slots[j] == n) j++; *begin = start; *end = j - 1; *node = n; return 1; }

One question: Why do we need to return the slots in ascending order? Can't we just use the order of the master nodes?

For now, I'm not picking this up. I have shared the code between CLUSTER NODES and CLUSTER SHARDS. May be we can come back later to improve the redundancy for CLUSTER SLOTS

Probably fine, @zuiderkwast maybe make an issue for improvement?

src/cluster.c

zuiderkwast · 2022-02-13T22:40:56Z

I think the slot ranges should be represented as two integers (as @madolson suggested in the issue) rather than a string on the form "start-end". This way we use resp for all structure and no extra parsing of a string is needed.

hpatro · 2022-02-13T23:10:46Z

I think the slot ranges should be represented as two integers (as @madolson suggested in the issue) rather than a string on the form "start-end". This way we use resp for all structure and no extra parsing of a string is needed.

@zuiderkwast Actually Madelyn and I discussed this offline and thought with an extremely defragmented slot distribution, this would be cheaper for network out and has minimal parsing for the client.

PingXie · 2022-02-13T23:55:44Z

I think the slot ranges should be represented as two integers (as @madolson suggested in the issue) rather than a string on the form "start-end". This way we use resp for all structure and no extra parsing of a string is needed.

@zuiderkwast Actually Madelyn and I discussed this offline and thought with an extremely defragmented slot distribution, this would be cheaper for network out and has minimal parsing for the client.

I think @zuiderkwast has a good point. Dealing with exceptions always adds complexity.

Based on my quick back-of-the-envelope calculation, I think we could achieve an overall similar footprint (if not better) using the original proposal with one trick, which is to encode the length of the slot range as opposed to the end slot number. Here is an example

Encode a slot range of [1000, 1000] using the special string with "-"

$9\r\n1000-1000\r\n

The total length is 15 bytes.

Encode the same range using the original proposal but with the range length

*2\r\n:1000\r\n:1\r\n

The total length is also 15 bytes.

Note that a 4-digit slot number is where the two encoding schemes result in the same footprint. When the slot number is greater than 9999, the second encoding scheme yields a smaller footprint while, when the slot number is smaller than 1000, the first encoding scheme yields a smaller footprint. Therefore, in the most extreme case when no two slots are next to each other in the same shard, the second encoding scheme would actually yield an overall smaller footprint while remaining friendly to generic RESP parsing.

madolson · 2022-02-14T00:48:36Z

~~I like @PingXie's idea. It also a nice property of being easier to understand whether or not the boundary is inclusive.~~

Edit: After looking at the code some more, it seems inconsistent to have different mechanisms return with different values. The intention is this a "replacement" of sorts for cluster nodes, which actually gives slots in the form "a-b" or "a", so I think it's actually reasonable to re-use the code that generates the nodes output.

src/cluster.c

PingXie · 2022-02-14T06:40:00Z

The intention is this a "replacement" of sorts for cluster nodes, which actually gives slots in the form "a-b" or "a", so I think it's actually reasonable to re-use the code that generates the nodes output.

@madolson did you actually mean a replacement for CLUSTER SLOTS? I thought one of the issues called out in #10168 was about the CLUSTER NODES output not being RESP compliant. Also, the output format implemented here is modeled after CLUSTER SLOTS than CLUSTER NODES.

That said, the use of "length" instead of the "end slot number" does break away from the CLUSTER SLOTS norm. We are probably looking at an additional ~55KB (6,4K*4 + 9K*3 + .9K*2) overhead in the most fragmented case if we were to go with full RESP compatibility. Assuming each node taking up 300B in the output, a 100-node cluster, and 15B per slot range with the "-" representation, the total output size is about 300B*100 + 15B * 16K = 270KB. 55KB is about 20% increase but again this is an extreme case. So here is what I see the tough call lies

how likely would one run into this extreme case? it is probably on a long tail if I could guess
if something has to give, would it be
a. breaking away from the norm established by CLUSTER SLOTS and using "slot range length"; or
b. using a non-RESP compliant representation (the "-"), which by the way also diverges from the CLUSTER SLOTS norm, though in a different way

hpatro · 2022-02-14T09:06:19Z

@PingXie

In case a node owns a single slot, the network output would be something like this for the dash - representation.
$4\r\r1000\r\ninstead of $9\r\n1000-1000\r\n. Dash - comes into the picture once there are more than one contiguous slot owned by a node.

zuiderkwast · 2022-02-14T10:39:18Z

I think the difference of 5-10 bytes per slot range is not very important, even if the difference is 50K.

The real difference is when compared to CLUSTER SLOTS, where the complete nodes are repeated for each slot range. Assuming a maximally fragmented cluster with one master and two replicas per shard, assuming 100B per node info, CLUSTER SLOTS is 3*100*16K = 4.8M.

So, even if CLUSTER SHARDS is 100K, we save 98% compared to CLUSTER SLOTS.

Among the options we have:

	range	single slot
1	`"1000-1340"`	`"1000"`	Yossi's suggestion
2	`1) 1000 2) 1034`	`1) 1000 2) 1000`	Madelyn's original suggestion
3	`1) 1000 2) 35`	`1) 1000 2) 1`	Ping's suggestion: start and length

I think 1 and 2 are acceptable, but 3 is rather obscure.

zuiderkwast · 2022-02-14T16:50:41Z

src/cluster.c

+    int start = -1;
+    void *shard_replylen = addReplyDeferredLen(c);
+    int shard_cnt = 0;
+    dict *nodeToSlotPair = dictCreate(&clusterNodesToSlotDictType);


Creating a temporary dict seems unnecessary. Maybe we can avoid it?

I think it's cheaper to iterate over the structures and add replies directly without allocating any extra structures, even if we might need to iterate over the nodes one extra time e.g. once just to count the shards to addReplyArrayLen(c, numshards);.

If we need to create a structure like this one, then perhaps it can be stored permanently or maybe we can change the existing structures to be more optimized for CLUSTER SHARDS (if possible).

I think we could re-use the iterator logic for CLUSTER NODES that loops through all the slots an caches the response on each node. We can then re-iterate through the cached nodes to add the output.

PingXie · 2022-02-14T18:13:00Z

I think the difference of 5-10 bytes per slot range is not very important, even if the difference is 50K.

+1

I think 1 and 2 are acceptable, but 3 is rather obscure.

Agreed and 2) would be my preference.

hpatro · 2022-02-14T18:33:20Z

I think the difference of 5-10 bytes per slot range is not very important, even if the difference is 50K.

+1

I think 1 and 2 are acceptable, but 3 is rather obscure.

Agreed and 2) would be my preference.

@zuiderkwast / @madolson Do you guys have any preference ?

madolson · 2022-02-15T02:33:52Z

I'm fine with 1 or 2. I have a slight preference towards option 2 because it's the simplest for clients to implement.

@PingXie I did mean cluster nodes. Anyone that didn't like the cluster slots output switched to using CLUSTER NODES, so we were trying to provide a better output to get them back.

yossigo

I haven't looked at the details yet, will do shortly, but I want to point out a potential ambiguity using the term "shard".

It has not been used by Redis Cluster until now, and in other contexts it has been used as either a synonym to "cluster node", or (as used in this PR), to describe a master and set of replicas sharing the same slots. To avoid confusion, I think we should avoid that using it here.

hpatro · 2022-02-17T11:03:11Z

I haven't looked at the details yet, will do shortly, but I want to point out a potential ambiguity using the term "shard".

It has not been used by Redis Cluster until now, and in other contexts it has been used as either a synonym to "cluster node", or (as used in this PR), to describe a master and set of replicas sharing the same slots. To avoid confusion, I think we should avoid that using it here.

@yossigo With the new pubsub feature we decided to use "Sharded PubSub" as the terminology to describe a master/set of replica sharing the same slot. Shall we continue using the same definition here ?

yossigo · 2022-02-17T11:59:45Z

@hpatro I think "sharded" and "sharding" is not a problem, my only concern is around the current ambiguity around "shard" and whether it refers to a single cluster node or a set of nodes with the same hash slots (master + replicas).

zuiderkwast · 2022-02-17T17:16:02Z

@yossigo I think the term shard in this sense is very convenient. The result of this command is an array of info per shard. Do you prefer TOPOLOGY or do have another suggestion(s)? Then we can try to compare pros and cons...

Another possible term is "slice". The term "network slicing" is used in telecom with a similar meaning.

PingXie · 2022-02-17T17:58:24Z

The notion of "shard" was probably unofficially introduced already based on the documentation:

Sharded pubsub helps to scale the usage of pubsub in cluster mode. It restricts the propagation of message to be within the shard of a cluster.

I feel that it is very natural to go ahead and make it official. Besides, "shard" is a well established term in distributed systems and I don't see the use of "shard" in this context diverge from the norm. The other (less) popular term that I have seen is "partition" but that is quite foreign to Redis, to my best knowledge. I have not seen "slice" used in distributed storage/database systems.

PingXie · 2022-02-18T20:01:27Z

While we are here, can we also consider having the node's epoch included in the output as well? The client application can bounce between nodes at run time so there are times it'd receive conflicting "shard" topology. An embedded epoch value would go a long way to help the client application resolve the collisions. @zuiderkwast this is somewhat related to our discussion on your topology change pubsub proposal #10150.

madolson · 2022-02-20T04:02:22Z

Sorry for taking so long to weigh in, but I just also wanted voice that I think it's good to make the term "shard" more official. The only downside I can think of is it will become more complex if we ever want to introduce more flexible configurations. For example, consider the scenario if we wanted to allow masters to be replicas for other masters, so that on node failure we have the data replicated but all nodes are able to take writes.

PingXie · 2022-03-09T04:35:48Z

We touched briefly on epoch, and no one had a strong opinion about it but me. We diverged briefly into a conversation about how we want to make config generations include all of the state information, so I think we settled on adding it in a future version once we have that. This would be independent of epoch though, so we don't want to proactively add it.

@madolson trying to get a better understanding of "make config generations include all of the state information". Are you talking about adding the remaining states like ping-sent, pong-recv, link-state,etc along with config-epoch in an all or none way? Or this is something else?

I don't know we can defend against people matching against something incorrectly, they should be doing exact matches and not partial matches.

Sure. I don't have a strong opinion here and I haven't seen updates to #10195 either. Agreed we can revisit this later when there is a real use case.

So, when moving to the CLUSTER SHARDS command, failed nodes should be ignored when building the cluster mapping, right?

Forgot to mention that I think we'd also need my changes (#10163) to ensure that failed nodes are included after a replica takes over the primary-ship in the shard. I can port it over after this PR is merged.

madolson · 2022-03-14T02:26:00Z

@madolson trying to get a better understanding of "make config generations include all of the state information". Are you talking about adding the remaining states like ping-sent, pong-recv, link-state,etc along with config-epoch in an all or none way? Or this is something else?

@PingXie Include other metadata such as which replicas are serving data from which masters, updates for announced hostnames and IP addresses, any other future config state changes.

madolson · 2022-03-14T03:58:49Z

Added documentation: redis/redis-doc#1815

PingXie · 2022-03-14T22:09:34Z

We touched briefly on epoch, and no one had a strong opinion about it but me. We diverged briefly into a conversation about how we want to make config generations include all of the state information, so I think we settled on adding it in a future version once we have that. This would be independent of epoch though, so we don't want to proactively add it.

@PingXie Include other metadata such as which replicas are serving data from which masters, updates for announced hostnames and IP addresses, any other future config state changes.

Thanks @madolson. I would like to clarify that there is a legit and real scenario for having epochs in CLUSTER SHARD as it further reduces the likelihood of getting stale topology. This information is currently available via CLUSTER NODES, whose replacement (#10168) is also one of the reasons why the new CLUSTER SHARD command is introduced. With epoch, applications do not need to invoke both CLUSTER SHARD and CLUSTER NODES any more. Happy to add the epoch along with the change to preserve failed nodes next.

oranagra

Approved in a core-team meeting. please merge away (and make sure the docs are updated too)

enjoy-binbin · 2022-03-16T06:36:10Z

tests/cluster/tests/28-cluster-shards.tcl

+
+set node_0_id [R 0 CLUSTER MYID]
+
+test "Kill a node and tell the replica to immediately takeover" {


This fails in my GH cluster test
https://github.com/enjoy-binbin/redis/runs/5563545978?check_suite_focus=true#step:9:681

02:18:22> Kill a node and tell the replica to immediately takeover: FAILED: caught an error in the test Port 30000 does not return available after killing instance. Port 30000 does not return available after killing instance.

Will take a look here.

Bane of my existence, for some reason this test only fails when you run the entire suite but passes individually. I'm committing to deprecating this test suite now.

enjoy-binbin · 2022-03-17T02:31:50Z

src/cluster.c

+    int tls_port = server.cluster_announce_tls_port ? server.cluster_announce_tls_port : server.tls_port;
+    if (tls_port) {
+        addReplyBulkCString(c, "tls-port");
+        addReplyLongLong(c, tls_port);


please take a look here (tls_port and the above port in tls mode)
the cluster-shards test failed in tls. (https://github.com/redis/redis/runs/5578694306?check_suite_focus=true#step:9:682)

with this patch, the test pass, but I'm not very familiar with tls mode

int port = server.cluster_announce_port ? server.cluster_announce_port : server.port; int tls_port = server.cluster_announce_tls_port ? server.cluster_announce_tls_port : server.tls_port; if (port) { addReplyBulkCString(c, "port"); if (tls_port) addReplyLongLong(c, node->pport); else addReplyLongLong(c, node->port); reply_count++; } if (tls_port) { addReplyBulkCString(c, "tls-port"); addReplyLongLong(c, node->port); reply_count++; }

in case anyone will fix it, i would like to mention:

since we deprecated cluster-slots, maybe we need to add a doc_flags in cluster-slots.json like others

i notice cluster-shards.json, it's indented by two spaces, we use four spaces elsewhere

and the three wait_for_condition 50 100 in 28-cluster-shards.tcl, there may be a timing issue, at least I've encountered it locally (centos7), so maybe change it to 500 100

For 1, we marked it as deprecated. Was there something else that you meant?

i mean the doc_flags, like other commands?

"deprecated_since": "6.2.0", "replaced_by": "`GEOSEARCH` and `GEOSEARCHSTORE` with the `BYRADIUS` argument", "doc_flags": [ "DEPRECATED" ],

Oh, I guess I didn't realize that was a thing.

@hpatro Do you have time to address this? Otherwise I can take a look tomorrow.

@enjoy-binbin Regarding the port and tls-port logic: server.tls_port is always a TLS port and server.port is always a plaintext (AKA cleartext) port if specified. For the cluster bus, it depends on the server.tls_cluster variable:

if (server.tls_cluster) { // node->port is TLS port (always set) // node->pport is plaintext port, optional (if non-zero) } else { // node->port is plaintext port (always set) // There is no TLS port }

There are a few places that use a hard coded const of 128 to allocate a buffer for d2string. Replace these with a clear macro. Note that In theory, converting double into string could take as much as nearly 400 chars, but since d2string uses `%g` and not `%f`, it won't pass some 40 chars. unrelated: restore some changes to auto generated commands.c that got accidentally reverted in #10293

This commit improve malloc efficiency of the slots_info_pairs mechanism in cluster.c by changing adlist into an array being realloced with greedy growth mechanism Recently the cluster tests are consistently failing when executed with ASAN in the CI. I tried to track down the commit that started it, and it appears to be #10293. Looking at the commit, i realize it didn't affect this test / flow, other than the replacement of the slots_info_pairs from sds to list. I concluded that what could be happening is that the slot range is very fragmented, and that results in many allocations. with sds, it results in one allocation and also, we have a greedy growth mechanism, but with adlist, we just have many many small allocations. this probably causes stress on ASAN, and causes it to be slow at termination.

enjoy-binbin · 2022-05-05T14:45:28Z

src/cluster.c

+        while((ln = listNext(&li))) {
+            addReplyBulkLongLong(c, (unsigned long)listNodeValue(ln));
+            ln = listNext(&li);
+            addReplyBulkLongLong(c, (unsigned long)listNodeValue(ln));


the slots here is not integers, see #10680

inconsistent with top comment or CLUSTER SHARDS docs

127.0.0.1:5001> cluster shards 1) 1) "slots" 2) 1) "0" 2) "16383"

It used to returns slots as strings, like: ``` redis> cluster shards 1) 1) "slots" 2) 1) "10923" 2) "16383" ``` CLUSTER SHARDS docs and the top comment of redis#10293 says that it returns integers. Note other commands like CLUSTER SLOTS, it returns slots as integers. Use addReplyLongLong instead of addReplyBulkLongLong, now it returns slots as integers: ``` redis> cluster shards 1) 1) "slots" 2) 1) (integer) 10923 2) (integer) 16383 ``` This is a small breaking change, introduced in 7.0.0 (7.0 RC3, redis#10293) Fixes redis#10680

It used to returns slots as strings, like: ``` redis> cluster shards 1) 1) "slots" 2) 1) "10923" 2) "16383" ``` CLUSTER SHARDS docs and the top comment of #10293 says that it returns integers. Note other commands like CLUSTER SLOTS, it returns slots as integers. Use addReplyLongLong instead of addReplyBulkLongLong, now it returns slots as integers: ``` redis> cluster shards 1) 1) "slots" 2) 1) (integer) 10923 2) (integer) 16383 ``` This is a small breaking change, introduced in 7.0.0 (7.0 RC3, #10293) Fixes #10680

…0683) It used to returns slots as strings, like: ``` redis> cluster shards 1) 1) "slots" 2) 1) "10923" 2) "16383" ``` CLUSTER SHARDS docs and the top comment of redis#10293 says that it returns integers. Note other commands like CLUSTER SLOTS, it returns slots as integers. Use addReplyLongLong instead of addReplyBulkLongLong, now it returns slots as integers: ``` redis> cluster shards 1) 1) "slots" 2) 1) (integer) 10923 2) (integer) 16383 ``` This is a small breaking change, introduced in 7.0.0 (7.0 RC3, redis#10293) Fixes redis#10680

hpatro added 2 commits February 13, 2022 17:30

Initial commit for cluster shards support.

bf80a2d

Minor comment updates.

c8eda98

PingXie reviewed Feb 13, 2022

View reviewed changes

Verify if the assertion took place.

05a9b1e

madolson reviewed Feb 14, 2022

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

src/cluster.c Outdated Show resolved Hide resolved

Add tests for health as LOADING

47077a7

Include changes for port/tls_port. Add test for hostname/endpoint.

e7316c7

zuiderkwast reviewed Feb 14, 2022

View reviewed changes

madolson linked an issue Feb 15, 2022 that may be closed by this pull request

New Redis Cluster topology command #10168

Closed

oranagra added the 7.0-must-have label Feb 15, 2022

zuiderkwast added the state:needs-doc-pr requires a PR to redis-doc repository label Feb 15, 2022

Store cluster topology in node. Reuse functionality in nodes/shards cmd.

6858f7f

hpatro marked this pull request as ready for review February 16, 2022 18:46

yossigo reviewed Feb 16, 2022

View reviewed changes

Merge remote-tracking branch 'upstream' into cluster_shards

ea387c4

madolson mentioned this pull request Mar 14, 2022

Added cluster shard command documentation redis/redis-doc#1815

Merged

yossigo approved these changes Mar 14, 2022

View reviewed changes

oranagra approved these changes Mar 15, 2022

View reviewed changes

oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Mar 15, 2022

madolson merged commit 45ccae8 into redis:unstable Mar 16, 2022

enjoy-binbin reviewed Mar 16, 2022

View reviewed changes

madolson mentioned this pull request Mar 16, 2022

Fixed incorrect parsing of hostname information from nodes.conf #10435

Merged

PingXie mentioned this pull request Mar 16, 2022

Retain a down primary's slot info after automatic fail over #10155

Open

enjoy-binbin reviewed Mar 17, 2022

View reviewed changes

PingXie mentioned this pull request Mar 23, 2022

Introduce shard ID to Redis cluster #10474

Closed

This was referenced Mar 28, 2022

improve malloc efficiency for cluster slots_info_pairs #10488

Merged

introduce MAX_D2STRING_CHARS instead of 128 const #10487

Merged

oranagra mentioned this pull request Apr 5, 2022

Release 7.0 rc3 #10532

Merged

f20500909 mentioned this pull request Apr 5, 2022

improve malloc efficiency: reduce call times of zrealloc #10533

Merged

enjoy-binbin reviewed May 5, 2022

View reviewed changes

enjoy-binbin mentioned this pull request May 5, 2022

[BUG] cluster shards returns slots as strings, not integers #10680

Closed

enjoy-binbin mentioned this pull request May 5, 2022

CLUSTER SHARDS should returns slots as integers, not strings #10683

Merged

ShooterIT mentioned this pull request Jun 7, 2022

Eliminate some infos from the show cluster command apache/kvrocks-controller#20

Closed


		set node_0_id [R 0 CLUSTER MYID]

		test "Kill a node and tell the replica to immediately takeover" {

Add cluster shards support. #10293

Add cluster shards support. #10293

Uh oh!

Conversation

hpatro commented Feb 13, 2022 • edited by madolson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Open considerations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madolson Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zuiderkwast commented Feb 13, 2022

Uh oh!

hpatro commented Feb 13, 2022

Uh oh!

PingXie commented Feb 13, 2022

Uh oh!

madolson commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PingXie commented Feb 14, 2022

Uh oh!

hpatro commented Feb 14, 2022

Uh oh!

zuiderkwast commented Feb 14, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PingXie commented Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hpatro commented Feb 14, 2022

Uh oh!

madolson commented Feb 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossigo left a comment

Choose a reason for hiding this comment

Uh oh!

hpatro commented Feb 17, 2022

Uh oh!

yossigo commented Feb 17, 2022

Uh oh!

zuiderkwast commented Feb 17, 2022

Uh oh!

PingXie commented Feb 17, 2022

Uh oh!

PingXie commented Feb 18, 2022

Uh oh!

madolson commented Feb 20, 2022

Uh oh!

PingXie commented Mar 9, 2022

Uh oh!

madolson commented Mar 14, 2022

Uh oh!

madolson commented Mar 14, 2022

hpatro commented Feb 13, 2022 •

edited by madolson

Loading

madolson Feb 14, 2022 •

edited

Loading

zuiderkwast Feb 14, 2022 •

edited

Loading

madolson commented Feb 14, 2022 •

edited

Loading

PingXie commented Feb 14, 2022 •

edited

Loading

madolson commented Feb 15, 2022 •

edited

Loading

enjoy-binbin Mar 17, 2022 •

edited

Loading

enjoy-binbin May 5, 2022 •

edited

Loading