Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jul 25, 2024

First, we need to ensure that curmaster in clusterUpdateSlotsConfigWith() is not NULL in the line

curmaster = clusterNodeIsMaster(myself) ? myself : myself->slaveof;
otherwise, it will crash in the
if (newmaster && curmaster->numslots == 0 &&

So when loading cluster node config, we need to ensure that the following conditions are met:

  1. A node must be at least one of the master or replica.
  2. If a node is replica, its master can't be NULL.

In the following two nodes conf, we did not specify myself as master or slave, or set it as slave but did not set the master node, which could cause the above crash below.

this crash is same as #12441, but i'm sure if this is also caused by the same reason.

invalid conf1:

bbe6dea077542cae00bfb065f8d994b6a2ed6dab 127.0.0.1:30001@40001 master - 0 1721879042695 0 connected
3db5d0bf79d3621b1471b548e3d1c08260185f57 127.0.0.1:7001@17001 myself - 0 0 1 connected
vars currentEpoch 1 lastVoteEpoch 0

invalid conf2:

bbe6dea077542cae00bfb065f8d994b6a2ed6dab 127.0.0.1:30001@40001 master - 0 1721879042695 0 connected
3db5d0bf79d3621b1471b548e3d1c08260185f57 127.0.0.1:7001@17001 myself,slave - 0 0 1 connected
vars currentEpoch 1 lastVoteEpoch 0

Reproduce steps:

  1. Start a cluster

nodes_master.conf

bbe6dea077542cae00bfb065f8d994b6a2ed6dab 127.0.0.1:30001@40001 myself,master - 0 0 1 connected 0-16383
vars currentEpoch 1 lastVoteEpoch 0

start a master node

./src/redis-server --cluster-enabled yes --port 30001 --cluster-config-file nodes_master.conf
  1. start a slave node

nodes_slave.conf

bbe6dea077542cae00bfb065f8d994b6a2ed6dab 127.0.0.1:30001@40001 master - 0 1721879042695 0 connected
3db5d0bf79d3621b1471b548e3d1c08260185f57 127.0.0.1:7001@17001 myself - 0 0 1 connected
vars currentEpoch 1 lastVoteEpoch 0

start the slave node

./src/redis-server --port 7001 --cluster-enabled yes  --cluster-config-file nodes_slave.conf

then the slave will crash:

140617:M 25 Jul 2024 14:11:44.202 * Node configuration loaded, I'm 3db5d0bf79d3621b1471b548e3d1c08260185f57
140617:M 25 Jul 2024 14:11:44.202 * Server initialized
140617:M 25 Jul 2024 14:11:44.202 . The AOF manifest file appendonly.aof.manifest doesn't exist
140617:M 25 Jul 2024 14:11:44.202 * Loading RDB produced by version 255.255.255
140617:M 25 Jul 2024 14:11:44.202 * RDB age 35 seconds
140617:M 25 Jul 2024 14:11:44.202 * RDB memory usage when created 2.18 Mb
140617:M 25 Jul 2024 14:11:44.202 * Done loading RDB, keys loaded: 0, keys expired: 0.
140617:M 25 Jul 2024 14:11:44.202 * DB loaded from disk: 0.000 seconds
140617:M 25 Jul 2024 14:11:44.202 * Ready to accept connections tcp
140617:M 25 Jul 2024 14:11:44.203 . 0 clients connected (0 replicas), 2281704 bytes in use
140617:M 25 Jul 2024 14:11:44.203 . Connecting with Node bbe6dea077542cae00bfb065f8d994b6a2ed6dab at 127.0.0.1:40001
140617:M 25 Jul 2024 14:11:44.203 . --- Processing packet of type pong, 2304 bytes
140617:M 25 Jul 2024 14:11:44.203 . pong packet received: bbe6dea077542cae00bfb065f8d994b6a2ed6dab


=== REDIS BUG REPORT START: Cut & paste starting from here ===
140617:M 25 Jul 2024 14:11:44.204 # Redis 255.255.255 crashed by signal: 11, si_code: 1
140617:M 25 Jul 2024 14:11:44.204 # Accessing address: 0x874
140617:M 25 Jul 2024 14:11:44.204 # Crashed running the instruction at: 0x5f3c2749f482

------ STACK TRACE ------
EIP:
./src/redis-server 127.0.0.1:7001 [cluster](clusterUpdateSlotsConfigWith+0x142)[0x5f3c2749f482]

140618 bio_close_file
/lib/x86_64-linux-gnu/libc.so.6(+0x98d61)[0x7b821fa98d61]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7b821fa9b7dd]
./src/redis-server 127.0.0.1:7001 [cluster](bioProcessBackgroundJobs+0x1c2)[0x5f3c274b3e22]
/lib/x86_64-linux-gnu/libc.so.6(+0x9ca94)[0x7b821fa9ca94]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7b821fb29c3c]

140620 bio_lazy_free
/lib/x86_64-linux-gnu/libc.so.6(+0x98d61)[0x7b821fa98d61]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7b821fa9b7dd]
./src/redis-server 127.0.0.1:7001 [cluster](bioProcessBackgroundJobs+0x1c2)[0x5f3c274b3e22]
/lib/x86_64-linux-gnu/libc.so.6(+0x9ca94)[0x7b821fa9ca94]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7b821fb29c3c]

@sundb sundb marked this pull request as ready for review July 25, 2024 06:16
@sundb sundb changed the title Ensure validity of myself as master or replica Ensure validity of myself as master or replica when loading cluster config Aug 5, 2024
@sundb sundb requested a review from oranagra August 5, 2024 09:21
@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 5, 2024
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm since i'm not very keen on the details.

could this be a breaking change? i.e. is there a configuration that used to work (user may not have noticed the invalid config), and will now fail?

@sundb
Copy link
Collaborator Author

sundb commented Aug 6, 2024

@oranagra it's not break change, i verified in 6.2.4 and it still crashed.

@sundb sundb merged commit bf643a6 into redis:unstable Aug 6, 2024
@sundb sundb deleted the cluster_invalid_conf branch August 6, 2024 12:40
@YaacovHazan YaacovHazan mentioned this pull request Sep 11, 2024
YaacovHazan added a commit that referenced this pull request Sep 12, 2024
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- #12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- #12395 Optimize `SCAN` command when matching data type

### Bug fixes

- #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- #13489 `ACL CAT` - return module commands
- #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- #13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- #13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- #13470 On `HDEL` of last field - update the global hash field
expiration data structure
- #13465 Cluster: Pass extensions to node if extension processing is
handled by it
- #13443 Cluster: Ensure validity of myself when loading cluster config
- #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- #13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- #13505 Optimize `STRING` datatype write commands
- #13499 Optimize `SMEMBERS` command
- #13494 Optimize `GEO*` commands reply
- #13490 Optimize `HELLO` command
- #13488 Optimize client query buffer
- #12395 Optimize `SCAN` command when matching data type
- #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- #13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- #13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- #13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- #13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
YaacovHazan pushed a commit that referenced this pull request Oct 30, 2024
…onfig (#13443)

First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2320
otherwise, it will crash in the
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2395

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
This was referenced Oct 30, 2024
YaacovHazan pushed a commit that referenced this pull request Nov 4, 2024
…onfig (#13443)

First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2320
otherwise, it will crash in the
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2395

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
This was referenced Nov 4, 2024
@YaacovHazan YaacovHazan mentioned this pull request Jan 6, 2025
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
…onfig (#13443)

First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2320
otherwise, it will crash in the
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2395

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
…onfig (#13443)

First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2320
otherwise, it will crash in the
https://github.com/redis/redis/blob/82f00f5179720c8cee6cd650763d184ba943be92/src/cluster_legacy.c#L2395

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
@YaacovHazan YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 21, 2025
@YaacovHazan YaacovHazan moved this from In Progress to Done in Redis 7.2 Backport Apr 21, 2025
@sundb sundb added this to Redis 8.0 Aug 15, 2025
@sundb sundb removed this from Redis 8.2 Aug 15, 2025
@sundb sundb moved this from Todo to Done in Redis 8.0 Aug 15, 2025
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…onfig (redis#13443)

First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
https://github.com/redis/redis/blob/cf6dd279f04878b44acbb3b12a0973975e5b6c8a/src/cluster_legacy.c#L2320
otherwise, it will crash in the
https://github.com/redis/redis/blob/cf6dd279f04878b44acbb3b12a0973975e5b6c8a/src/cluster_legacy.c#L2395

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- redis#12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- redis#12395 Optimize `SCAN` command when matching data type

### Bug fixes

- redis#13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- redis#13489 `ACL CAT` - return module commands
- redis#13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- redis#13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- redis#13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- redis#13470 On `HDEL` of last field - update the global hash field
expiration data structure
- redis#13465 Cluster: Pass extensions to node if extension processing is
handled by it
- redis#13443 Cluster: Ensure validity of myself when loading cluster config
- redis#13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- redis#13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- redis#13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- redis#13505 Optimize `STRING` datatype write commands
- redis#13499 Optimize `SMEMBERS` command
- redis#13494 Optimize `GEO*` commands reply
- redis#13490 Optimize `HELLO` command
- redis#13488 Optimize client query buffer
- redis#12395 Optimize `SCAN` command when matching data type
- redis#13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- redis#13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- redis#13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- redis#13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- redis#13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants