Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Aug 6, 2024

This PR is based on the commits from PR valkey-io/valkey#52.
Ref: #12760
Close #13401
This PR will replace #13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:

  1. Always set CLUSTERMSG_FLAG0_EXT_DATA, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and send back its ext data if it has.
  2. If another node does not support ext data, we will not send it ext data to avoid the handshake failure due to the incorrect payload length.

Note: A successful PING/PONG is required as a sender for a given
node to be marked as CLUSTERMSG_FLAG0_EXT_DATA and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).


Signed-off-by: Harkrishn Patro harkrisp@amazon.com
Co-authored-by: Harkrishn Patro harkrisp@amazon.com

hpatro and others added 3 commits August 6, 2024 20:01
Ref: redis#12760

enabled by default) with older Redis cluster (< 7.0 - extensions not
handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix introduces a flag `extensions_supported` on the clusterMsg
indicating the sender node can handle extensions parsing. Once, a
receiver nodes receives a message with this flag set to 1, it would
update clusterNode new field extensions_supported and start sending out
extensions if it has any.

This PR also introduces a DEBUG sub command to enable/disable cluster
message extensions `process-clustermsg-extensions` feature.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `extensions_supported` and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

TCL test verifying the cluster state is healthy irrespective of
enabling/disabling cluster message extensions feature.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@sundb
Copy link
Collaborator Author

sundb commented Aug 6, 2024

note that i mark the hflag as CLUSTERMSG_FLAG0_EXT_DATA in the clusterProcessPingExtensions instead of marking it as it's being used.
see the commit.

@sundb sundb requested a review from oranagra August 6, 2024 17:23
oranagra
oranagra previously approved these changes Aug 7, 2024
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +3542 to +3543
hdr->mflags[0] |= CLUSTERMSG_FLAG0_EXT_DATA; /* Always make other nodes know that
* this node supports extension data. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not familiar with this area in the code, maybe you can elaborate.

it looks like before your commit, we set that flag only when the other node doesn't support extensions.
which seems odd. if anything we should set new flags only when talking to new nodes?
although it does make mode sense to always set the flag (assuming old nodes ignore it, rather than breaking on unknown flags).

do you know why Valkey didn't do that too?
did you test the new code against a wide range of versions?

Copy link
Collaborator Author

@sundb sundb Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, Valkey now always sets this flag, but adds it when writing epingext.
I prefer to build hdr and always set this flag.
Valkey code:

if (link->node && nodeSupportsExtensions(link->node)) {
    totlen += writePingExt(hdr, gossipcount); <- always set the mark
} else {
    serverLog(LL_DEBUG, "Unable to send extensions data, however setting ext data flag to true");
    hdr->mflags[0] |= CLUSTERMSG_FLAG0_EXT_DATA; <- if node doesn't support, we still set the flag.
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why Valkey didn't do that too? did you test the new code against a wide range of versions?

yes, i tested it between 6.2, 7.0,and 7.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but they still set it conditionally. only when the node doesn't support extensions (or is NULL?)

Copy link
Collaborator Author

@sundb sundb Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason of the issue is that the new version of the node sends ext data to nodes that do not support it.
This causes a failure due to a mismatch between the expected length (explen) and the actual total length of the received data.

redis/src/cluster_legacy.c

Lines 2731 to 2736 in 731f2dc

if ((totlen - explen) < extlen) {
serverLog(LL_WARNING, "Received invalid %s packet with extension data that exceeds "
"total packet length (%lld)", clusterGetMessageTypeString(type),
(unsigned long long) totlen);
return 1;
}

this PR does the follow things:

  1. always set CLUSTERMSG_FLAG0_EXT_DATA, because during the meet phase, we do not know whether the connected node supports ext data, we need to make sure that it knows and brings ext data.
  2. if another node does not support ext data, we will not send it ext data to avoid the handshake failure mentioned above due to explen inconsistencies.

Copy link
Collaborator Author

@sundb sundb Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but they still set it conditionally. only when the node doesn't support extensions (or is NULL?)

no, if node support ext data, we always set the flag in the writePingExt no matter the extensions is 0 or not.

    if (hdr != NULL) {
        hdr->mflags[0] |= CLUSTERMSG_FLAG0_EXT_DATA;
        hdr->extensions = htons(extensions);
    }

so Valkey also sets this flag unconditionally.

@sundb
Copy link
Collaborator Author

sundb commented Aug 7, 2024

@oranagra i updated the top comment, the original top comment is incorrect.

@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 8, 2024
@sundb sundb merged commit 6f0ddc9 into redis:unstable Aug 8, 2024
@sundb sundb deleted the cluster_upgrade branch August 8, 2024 02:48
@YaacovHazan YaacovHazan mentioned this pull request Sep 11, 2024
YaacovHazan added a commit that referenced this pull request Sep 12, 2024
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- #12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- #12395 Optimize `SCAN` command when matching data type

### Bug fixes

- #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- #13489 `ACL CAT` - return module commands
- #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- #13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- #13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- #13470 On `HDEL` of last field - update the global hash field
expiration data structure
- #13465 Cluster: Pass extensions to node if extension processing is
handled by it
- #13443 Cluster: Ensure validity of myself when loading cluster config
- #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- #13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- #13505 Optimize `STRING` datatype write commands
- #13499 Optimize `SMEMBERS` command
- #13494 Optimize `GEO*` commands reply
- #13490 Optimize `HELLO` command
- #13488 Optimize client query buffer
- #12395 Optimize `SCAN` command when matching data type
- #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- #13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- #13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- #13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- #13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
YaacovHazan pushed a commit that referenced this pull request Oct 30, 2024
)

This PR is based on the commits from PR
valkey-io/valkey#52.
Ref: #12760
Close #13401
This PR will replace #13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
This was referenced Oct 30, 2024
YaacovHazan pushed a commit that referenced this pull request Nov 4, 2024
)

This PR is based on the commits from PR
valkey-io/valkey#52.
Ref: #12760
Close #13401
This PR will replace #13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
This was referenced Nov 4, 2024
@YaacovHazan YaacovHazan mentioned this pull request Jan 6, 2025
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
)

This PR is based on the commits from PR
valkey-io/valkey#52.
Ref: #12760
Close #13401
This PR will replace #13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
)

This PR is based on the commits from PR
valkey-io/valkey#52.
Ref: #12760
Close #13401
This PR will replace #13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
sundb added a commit to sundb/redis that referenced this pull request Jan 12, 2025
…is#13465)

This PR is based on the commits from PR
valkey-io/valkey#52.
Ref: redis#12760
Close redis#13401
This PR will replace redis#13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
@YaacovHazan YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 21, 2025
@YaacovHazan YaacovHazan moved this from In Progress to Done in Redis 7.2 Backport Apr 21, 2025
@sundb sundb added this to Redis 8.0 Aug 15, 2025
@sundb sundb removed this from Redis 8.2 Aug 15, 2025
@sundb sundb moved this from Todo to Done in Redis 8.0 Aug 15, 2025
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…is#13465)

This PR is based on the commits from PR
valkey-io/valkey#52.
Ref: redis#12760
Close redis#13401
This PR will replace redis#13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- redis#12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- redis#12395 Optimize `SCAN` command when matching data type

### Bug fixes

- redis#13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- redis#13489 `ACL CAT` - return module commands
- redis#13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- redis#13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- redis#13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- redis#13470 On `HDEL` of last field - update the global hash field
expiration data structure
- redis#13465 Cluster: Pass extensions to node if extension processing is
handled by it
- redis#13443 Cluster: Ensure validity of myself when loading cluster config
- redis#13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- redis#13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- redis#13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- redis#13505 Optimize `STRING` datatype write commands
- redis#13499 Optimize `SMEMBERS` command
- redis#13494 Optimize `GEO*` commands reply
- redis#13490 Optimize `HELLO` command
- redis#13488 Optimize client query buffer
- redis#12395 Optimize `SCAN` command when matching data type
- redis#13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- redis#13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- redis#13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- redis#13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- redis#13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] Can't add a v7.2 node into v6.2 cluster

3 participants