Skip to content

Add extensions supported to cluster link layer to propagate extension faster during handshake#2310

Merged
enjoy-binbin merged 8 commits into
valkey-io:unstablefrom
enjoy-binbin:link_support_extension
Jul 16, 2025
Merged

Add extensions supported to cluster link layer to propagate extension faster during handshake#2310
enjoy-binbin merged 8 commits into
valkey-io:unstablefrom
enjoy-binbin:link_support_extension

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented Jul 5, 2025

Copy link
Copy Markdown
Member

Add support_extension to cluster link, so that when myself receives a message from sender,
even if myself does not know the sender, for example during handshake the sender node is
still treated as random node, in thi case, the packet we reply to sender can also carry the
extension, since the sender link explicitly indicates that it supports the extension, by doing
this, we can speed up the spread of the extension.

See #2283 (comment) and
#2279 (comment) for more details.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@codecov

codecov Bot commented Jul 5, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.45%. Comparing base (57b1761) to head (cafd613).
⚠️ Report is 154 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2310      +/-   ##
============================================
+ Coverage     71.41%   71.45%   +0.04%     
============================================
  Files           123      123              
  Lines         67069    67135      +66     
============================================
+ Hits          47898    47973      +75     
+ Misses        19171    19162       -9     
Files with missing lines Coverage Δ
src/cluster_legacy.c 87.13% <100.00%> (+0.19%) ⬆️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hpatro hpatro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is during handshake the sender node is still treated as random node and is still returned as NULL from the below method.

valkey/src/cluster_legacy.c

Lines 3119 to 3137 in dcf3222

static clusterNode *getNodeFromLinkAndMsg(clusterLink *link, clusterMsg *hdr) {
clusterNode *sender;
if (link->node && !nodeInHandshake(link->node)) {
/* If the link has an associated node, use that so that we don't have to look it
* up every time, except when the node is still in handshake, the node still has
* a random name thus not truly "known". */
sender = link->node;
} else {
/* Otherwise, fetch sender based on the message */
sender = clusterLookupNode(hdr->sender, CLUSTER_NAMELEN);
/* We know the sender node but haven't associate it with the link. This must
* be an inbound link because only for inbound links we didn't know which node
* to associate when they were created. */
if (sender && !link->node) {
setClusterNodeToInboundClusterLink(sender, link);
}
}
return sender;
}

I think we can store these information at two places but use the link->extension explicitly to determine if we can send extensions or not and node->flags to gossip about extension support to others.

WDYT @enjoy-binbin ?

Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
@enjoy-binbin

Copy link
Copy Markdown
Member Author

I think the problem is during handshake the sender node is still treated as random node and is still returned as NULL from the below method.

  1. Node B cluster reset soft, and remove all the nodes in its cluster_nodes_dict
  2. Node A sends the PING, and Node B will reply a PONG even it does not know A, and in here, since Node B does not know Node A, it won't send the extension. But the link has mf_flags that we all know Node A can handle extension. So in these packets, Node A will know B change the role but not able to learn the shard-id

Let me confirm, we are saying the same thing? Node B does not know Node A (during handshake, random node), so just two different ways of saying it (yours is a little more specific)?

I think we can store these information at two places but use the link->extension explicitly to determine if we can send extensions or not and node->flags to gossip about extension support to others.

SGTM, i will make the change

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 9, 2025

@hpatro hpatro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hpatro hpatro requested a review from madolson July 9, 2025 20:52
@hpatro

hpatro commented Jul 9, 2025

Copy link
Copy Markdown
Contributor

@madolson Could you also give a look at this?

@hpatro

hpatro commented Jul 9, 2025

Copy link
Copy Markdown
Contributor

2. Node A sends the PING, and Node B will reply a PONG even it does not know A, and in here, since Node B does not know Node A, it won't send the extension. But the link has mf_flags that we all know Node A can handle extension. So in these packets, Node A will know B change the role but not able to learn the shard-id

Didn't follow the mf_flags part.

I was explaining from the code point of view where until we haven't finished the handshake and settled on the actual nodename, we don't retrieve the sender node object, which leads us to not send extensions message. Now with this change, the information moves to the link object and the support_extension field gets set on the first PING/MEET message itself and we are able to send extensions in the first PONG response itself.

@enjoy-binbin

Copy link
Copy Markdown
Member Author

It suddenly occurred to me that we might need to keep this node check? Like when myself add a unknown node from the gossip part, in this case, aren't we need this node condition to send the packet in the very first time (before myself recevie the packet)

if (link->support_extension || (link->node && nodeSupportsExtensions(link->node))) {

@enjoy-binbin

Copy link
Copy Markdown
Member Author

Didn't follow the mf_flags part.

Sorry i gave the wrong info, i mean if (hdr->mflags[0] & CLUSTERMSG_FLAG0_EXT_DATA) {

I was explaining from the code point of view where until we haven't finished the handshake and settled on the actual nodename, we don't retrieve the sender node object, which leads us to not send extensions message. Now with this change, the information moves to the link object and the support_extension field gets set on the first PING/MEET message itself and we are able to send extensions in the first PONG response itself.

great, good to know, i just want to be sure we are in the same page, that is a good explanation.

@hpatro

hpatro commented Jul 10, 2025

Copy link
Copy Markdown
Contributor

It suddenly occurred to me that we might need to keep this node check? Like when myself add a unknown node from the gossip part, in this case, aren't we need this node condition to send the packet in the very first time (before myself recevie the packet)

if (link->support_extension || (link->node && nodeSupportsExtensions(link->node))) {

That's right 👍 .

Comment thread src/cluster_legacy.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin changed the title Add support_extension to cluster link to propagate extension faster Add extensions supported to cluster link layer to propagate extension faster during handshake Jul 11, 2025
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/cluster_legacy.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin merged commit 78e7208 into valkey-io:unstable Jul 16, 2025
98 of 101 checks passed
@enjoy-binbin enjoy-binbin deleted the link_support_extension branch July 16, 2025 02:26
sungming2 pushed a commit to sungming2/valkey that referenced this pull request Jul 29, 2025
…xtension faster during handshake (valkey-io#2310)"

This reverts commit 78e7208.
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Oct 20, 2025
This clusterLink->flags was added in valkey-io#2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <binloveplay1314@qq.com>
enjoy-binbin added a commit that referenced this pull request Oct 21, 2025
This clusterLink->flags was added in #2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <binloveplay1314@qq.com>
diego-ciciani01 pushed a commit to diego-ciciani01/valkey that referenced this pull request Oct 21, 2025
This clusterLink->flags was added in valkey-io#2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <binloveplay1314@qq.com>
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 15, 2025
This clusterLink->flags was added in valkey-io#2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <binloveplay1314@qq.com>
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 15, 2025
This clusterLink->flags was added in valkey-io#2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <binloveplay1314@qq.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
This clusterLink->flags was added in valkey-io#2310, during the review, we at
the end chose to add a new CLUSTER_LINK_XXX flag instead of sharing the
old CLUSTER_NODE_XXX flag.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cluster run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants