Replace slots_to_channels radix tree with slot specific dictionaries for shard channels. #12804

CharlesChen888 · 2023-11-23T11:57:10Z

We have achieved replacing slots_to_keys radix tree with key->slot linked list (#9356), and then replacing the list with slot specific dictionaries for keys (#11695).

Shard channels behave just like keys in many ways, and we also need a slots->channels mapping. Currently this is still done by using a radix tree. So we should split server.pubsubshard_channels into 16384 dicts and drop the radix tree, just like what we did to DBs.

Some benefits (basically the benefits of what we've done to DBs):

Optimize counting channels in a slot. This is currently used only in removing channels in a slot. But this is potentially more useful: sometimes we need to know how many channels there are in a specific slot when doing slot migration. Counting is now implemented by traversing the radix tree, and with this PR it will be as simple as calling dictSize, from O(n) to O(1).
The radix tree in the cluster has been removed. The shard channel names no longer require additional storage, which can save memory.
Potentially useful in slot migration, as shard channels are logically split by slots, thus making it easier to migrate, remove or add as a whole.
Avoid rehashing a big dict when there is a large number of channels.

Drawbacks:

Takes more memory than using radix tree when there are relatively few shard channels.

What this PR does:

in cluster mode, split server.pubsubshard_channels into 16384 dicts, in standalone mode, still use only one dict.
drop the slots_to_channels radix tree.
to save memory (to solve the drawback above), all 16384 dicts are created lazily, which means only when a channel is about to be inserted to the dict will the dict be initialized, and when all channels are deleted, the dict would delete itself.
use server.shard_channel_count to keep track of the number of all shard channels.

src/server.c

madolson · 2023-11-27T23:06:21Z

The only drawback that I can think of is the increased fixed memory overhead, especially for nodes that don't use sharded pubsub at all. Can we initialize this array of dicts lazily? We can let it be NULL until the first shard channel is created, then we allocate the array.

Yeah, this was the discussion we had at the time related to sharded pubsub. We also expect the number of channels to remain relatively low compared to the number of keys (although maybe this is a bad assumption). I at the very least agree with viktors suggestion we should lazily create the dictionaries so we aren't always paying the cost.

soloestoy · 2023-11-28T05:44:19Z

The only drawback that I can think of is the increased fixed memory overhead, especially for nodes that don't use sharded pubsub at all. Can we initialize this array of dicts lazily? We can let it be NULL until the first shard channel is created, then we allocate the array.

Sounds reasonable. Please follow up @CharlesChen888 .

src/server.c

src/pubsub.c

src/server.c

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

zuiderkwast

OK, I'm happy now. 😀

src/cluster_legacy.c

src/pubsub.c

src/server.c

src/pubsub.c

soloestoy · 2023-12-13T03:11:05Z

we discussed in core team meeting, and think it is ready to be merged, @CharlesChen888 please update the top comment to describe the change, thanks.

src/pubsub.c

src/cluster_legacy.c

src/pubsub.c

soloestoy · 2023-12-15T02:45:09Z

Seems we should add more test cases...

src/pubsub.c

oranagra · 2023-12-27T10:00:16Z

@soloestoy i see this doesn't come with any interface changes, but maybe there's another reason to mention it in the release notes?

soloestoy · 2023-12-27T10:50:50Z

maybe we can mention it as this:

slots_to_channels radix tree dropped, the shard channel names no longer require additional storage, which can save memory. And in cluster mode, split server.pubsubshard_channels into 16384 dicts, counting shard channels in specific slot reduced from O(n) to O(1).

oranagra · 2023-12-27T11:54:22Z

counting shard channels in specific slot reduced from O(n) to O(1).

correct me if i'm wrong, this isn't exposed in any specific command. it's (a small) part of slot migrate (resharding), right?

soloestoy · 2023-12-27T12:33:26Z

you are right.

…ing the client The code does not delete the corresponding node when traversing clients, resulting in a loop, causing the dictDelete() == DICT_OK assertion to fail. In addition, did a cleanup, in the dictCreate scenario, we can avoid a dictFind call since the dict is empty. Issue was introduced in redis#12804.

…ing the client (#12896) The code does not delete the corresponding node when traversing clients, resulting in a loop, causing the dictDelete() == DICT_OK assertion to fail. In addition, did a cleanup, in the dictCreate scenario, we can avoid a dictFind call since the dict is empty. Issue was introduced in #12804.

…re (#12822) # Description Gather most of the scattered `redisDb`-related code from the per-slot dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e. it's a class that represents an array of dictionaries. # Motivation The main motivation is code cleanliness, the idea of using an array of dictionaries is very well-suited to becoming a self-contained data structure. This allowed cleaning some ugly code, among others: loops that run twice on the main dict and expires dict, and duplicate code for allocating and releasing this data structure. # Notes 1. This PR reverts the part of #12848 where the `rehashing` list is global (handling rehashing `dict`s is under the responsibility of `kvstore`, and should not be managed by the server) 2. This PR also replaces the type of `server.pubsubshard_channels` from `dict**` to `kvstore` (original PR: #12804). After that was done, server.pubsub_channels was also chosen to be a `kvstore` (with only one `dict`, which seems odd) just to make the code cleaner by making it the same type as `server.pubsubshard_channels`, see `pubsubtype.serverPubSubChannels` 3. the keys and expires kvstores are currenlty configured to allocate the individual dicts only when the first key is added (unlike before, in which they allocated them in advance), but they won't release them when the last key is deleted. Worth mentioning that due to the recent change the reply of DEBUG HTSTATS changed, in case no keys were ever added to the db. before: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ``` after: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] [Expires HT] ```

…for shard channels. (redis#12804) We have achieved replacing `slots_to_keys` radix tree with key->slot linked list (redis#9356), and then replacing the list with slot specific dictionaries for keys (redis#11695). Shard channels behave just like keys in many ways, and we also need a slots->channels mapping. Currently this is still done by using a radix tree. So we should split `server.pubsubshard_channels` into 16384 dicts and drop the radix tree, just like what we did to DBs. Some benefits (basically the benefits of what we've done to DBs): 1. Optimize counting channels in a slot. This is currently used only in removing channels in a slot. But this is potentially more useful: sometimes we need to know how many channels there are in a specific slot when doing slot migration. Counting is now implemented by traversing the radix tree, and with this PR it will be as simple as calling `dictSize`, from O(n) to O(1). 2. The radix tree in the cluster has been removed. The shard channel names no longer require additional storage, which can save memory. 3. Potentially useful in slot migration, as shard channels are logically split by slots, thus making it easier to migrate, remove or add as a whole. 4. Avoid rehashing a big dict when there is a large number of channels. Drawbacks: 1. Takes more memory than using radix tree when there are relatively few shard channels. What this PR does: 1. in cluster mode, split `server.pubsubshard_channels` into 16384 dicts, in standalone mode, still use only one dict. 2. drop the `slots_to_channels` radix tree. 3. to save memory (to solve the drawback above), all 16384 dicts are created lazily, which means only when a channel is about to be inserted to the dict will the dict be initialized, and when all channels are deleted, the dict would delete itself. 5. use `server.shard_channel_count` to keep track of the number of all shard channels. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

…ing the client (redis#12896) The code does not delete the corresponding node when traversing clients, resulting in a loop, causing the dictDelete() == DICT_OK assertion to fail. In addition, did a cleanup, in the dictCreate scenario, we can avoid a dictFind call since the dict is empty. Issue was introduced in redis#12804.

…re (redis#12822) # Description Gather most of the scattered `redisDb`-related code from the per-slot dict PR (redis#11695) and turn it to a new data structure, `kvstore`. i.e. it's a class that represents an array of dictionaries. # Motivation The main motivation is code cleanliness, the idea of using an array of dictionaries is very well-suited to becoming a self-contained data structure. This allowed cleaning some ugly code, among others: loops that run twice on the main dict and expires dict, and duplicate code for allocating and releasing this data structure. # Notes 1. This PR reverts the part of redis#12848 where the `rehashing` list is global (handling rehashing `dict`s is under the responsibility of `kvstore`, and should not be managed by the server) 2. This PR also replaces the type of `server.pubsubshard_channels` from `dict**` to `kvstore` (original PR: redis#12804). After that was done, server.pubsub_channels was also chosen to be a `kvstore` (with only one `dict`, which seems odd) just to make the code cleaner by making it the same type as `server.pubsubshard_channels`, see `pubsubtype.serverPubSubChannels` 3. the keys and expires kvstores are currenlty configured to allocate the individual dicts only when the first key is added (unlike before, in which they allocated them in advance), but they won't release them when the last key is deleted. Worth mentioning that due to the recent change the reply of DEBUG HTSTATS changed, in case no keys were ever added to the db. before: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ``` after: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] [Expires HT] ```

…for shard channels. (redis#12804) We have achieved replacing `slots_to_keys` radix tree with key->slot linked list (redis#9356), and then replacing the list with slot specific dictionaries for keys (redis#11695). Shard channels behave just like keys in many ways, and we also need a slots->channels mapping. Currently this is still done by using a radix tree. So we should split `server.pubsubshard_channels` into 16384 dicts and drop the radix tree, just like what we did to DBs. Some benefits (basically the benefits of what we've done to DBs): 1. Optimize counting channels in a slot. This is currently used only in removing channels in a slot. But this is potentially more useful: sometimes we need to know how many channels there are in a specific slot when doing slot migration. Counting is now implemented by traversing the radix tree, and with this PR it will be as simple as calling `dictSize`, from O(n) to O(1). 2. The radix tree in the cluster has been removed. The shard channel names no longer require additional storage, which can save memory. 3. Potentially useful in slot migration, as shard channels are logically split by slots, thus making it easier to migrate, remove or add as a whole. 4. Avoid rehashing a big dict when there is a large number of channels. Drawbacks: 1. Takes more memory than using radix tree when there are relatively few shard channels. What this PR does: 1. in cluster mode, split `server.pubsubshard_channels` into 16384 dicts, in standalone mode, still use only one dict. 2. drop the `slots_to_channels` radix tree. 3. to save memory (to solve the drawback above), all 16384 dicts are created lazily, which means only when a channel is about to be inserted to the dict will the dict be initialized, and when all channels are deleted, the dict would delete itself. 5. use `server.shard_channel_count` to keep track of the number of all shard channels. --------- Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

…ing the client (redis#12896) The code does not delete the corresponding node when traversing clients, resulting in a loop, causing the dictDelete() == DICT_OK assertion to fail. In addition, did a cleanup, in the dictCreate scenario, we can avoid a dictFind call since the dict is empty. Issue was introduced in redis#12804.

…re (redis#12822) # Description Gather most of the scattered `redisDb`-related code from the per-slot dict PR (redis#11695) and turn it to a new data structure, `kvstore`. i.e. it's a class that represents an array of dictionaries. # Motivation The main motivation is code cleanliness, the idea of using an array of dictionaries is very well-suited to becoming a self-contained data structure. This allowed cleaning some ugly code, among others: loops that run twice on the main dict and expires dict, and duplicate code for allocating and releasing this data structure. # Notes 1. This PR reverts the part of redis#12848 where the `rehashing` list is global (handling rehashing `dict`s is under the responsibility of `kvstore`, and should not be managed by the server) 2. This PR also replaces the type of `server.pubsubshard_channels` from `dict**` to `kvstore` (original PR: redis#12804). After that was done, server.pubsub_channels was also chosen to be a `kvstore` (with only one `dict`, which seems odd) just to make the code cleaner by making it the same type as `server.pubsubshard_channels`, see `pubsubtype.serverPubSubChannels` 3. the keys and expires kvstores are currenlty configured to allocate the individual dicts only when the first key is added (unlike before, in which they allocated them in advance), but they won't release them when the last key is deleted. Worth mentioning that due to the recent change the reply of DEBUG HTSTATS changed, in case no keys were ever added to the db. before: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ``` after: ``` 127.0.0.1:6379> DEBUG htstats 9 [Dictionary HT] [Expires HT] ```

Replace rax for shard channels with splitting dict into 16384 dicts.

d8ee0e3

zuiderkwast reviewed Nov 23, 2023

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

Lazy init dict.

11d6429

zuiderkwast reviewed Nov 28, 2023

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

zuiderkwast reviewed Nov 28, 2023

View reviewed changes

src/pubsub.c Outdated Show resolved Hide resolved

src/pubsub.c Outdated Show resolved Hide resolved

src/server.c Show resolved Hide resolved

CharlesChen888 and others added 2 commits November 28, 2023 19:12

Simplify function channelList.

ccd4722

Use an assert to limit the use of pubsubshard_channels.

0341071

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

zuiderkwast approved these changes Nov 28, 2023

View reviewed changes

hpatro reviewed Nov 29, 2023

View reviewed changes

CharlesChen888 added 4 commits November 30, 2023 16:29

Minor fixes. Release dict when empty.

d1104c4

Rename a variable to make it more clear.

126c780

Avoid unnecessary key slot computing.

feb507d

Simplify slot calculation.

2b6dd1d

hpatro approved these changes Dec 11, 2023

View reviewed changes

soloestoy approved these changes Dec 12, 2023

View reviewed changes

Rename variable.

ac053b6

Merge branch 'unstable' into divide-shard-channel-rax

acd6203

soloestoy reviewed Dec 13, 2023

View reviewed changes

src/pubsub.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

CharlesChen888 added 2 commits December 13, 2023 19:49

Simplify removeChannelsInSlot. Release shard channel dict when empty.

d9bec45

Move pubsub logic to pubsub.c

f26712c

soloestoy reviewed Dec 14, 2023

View reviewed changes

src/pubsub.c Outdated Show resolved Hide resolved

CharlesChen888 added 4 commits December 14, 2023 20:10

Fix c->slot may eq -1 bug.

d9be12e

Add type convert.

e530017

Rewrite pubsubShardUnsubscribeAllClientsInSlot.

808741d

Fix type and naming.

8d2525e

Add unsubscribe all test.

2389d06

Typo.

b0c2bf0

soloestoy reviewed Dec 27, 2023

View reviewed changes

src/pubsub.c Show resolved Hide resolved

CharlesChen888 added 2 commits December 27, 2023 17:02

Check if pubsub_channels[i] is NULL, and add tests.

b3a4142

Merge branch 'unstable' into divide-shard-channel-rax

2907a8f

soloestoy merged commit 8527959 into redis:unstable Dec 27, 2023

enjoy-binbin mentioned this pull request Dec 28, 2023

Fix crash caused by pubsubShardUnsubscribeAllChannelsInSlot not deleting the client #12896

Merged

guybe7 mentioned this pull request Jan 23, 2024

Refactor the per-slot dict-array db.c into a new kvstore data structure #12822

Merged

Replace slots_to_channels radix tree with slot specific dictionaries for shard channels. #12804

Replace slots_to_channels radix tree with slot specific dictionaries for shard channels. #12804

Uh oh!

Conversation

CharlesChen888 commented Nov 23, 2023 • edited by soloestoy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

madolson commented Nov 27, 2023

Uh oh!

soloestoy commented Nov 28, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soloestoy commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soloestoy commented Dec 15, 2023

Uh oh!

Uh oh!

oranagra commented Dec 27, 2023

Uh oh!

soloestoy commented Dec 27, 2023

Uh oh!

oranagra commented Dec 27, 2023

Uh oh!

soloestoy commented Dec 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CharlesChen888 commented Nov 23, 2023 •

edited by soloestoy

Loading

soloestoy commented Dec 13, 2023 •

edited

Loading