Skip to content

Conversation

@hpatro
Copy link
Contributor

@hpatro hpatro commented Oct 28, 2023

Currently based on activerehashing config it's determined if incremental rehash will be performed or not. With #11695, a list of dict* are maintained to incrementally rehash all the dictionaries (slots) for a db.

If the config activerehashing is disabled, the list might contain dict* and would not be processed leading unnecessary memory usage.
If the config activerehashing is enabled on runtime, the incremental rehashing logic won't pick up the dictionaries which were already under rehashing.

This PR addresses the above two gaps and handles the change in the config to cleanup/readd the dictionaries to the rehashing list.

@hpatro hpatro requested a review from oranagra October 28, 2023 20:57
@hpatro
Copy link
Contributor Author

hpatro commented Oct 28, 2023

Note: This list can contain duplicate items (so turning the config on/off repetitively) would lead to adding the same dict* multiple times.

Comment on lines +2487 to +2489
while ((d = dbIteratorNextDict(dbit))) {
if (dictIsRehashing(d)) {
listAddNodeTail(server.db->sub_dict[subdict].rehashing, d);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have a concern about this loop causing a short freeze when the config is changed.
although i suppose it can't be that bad, considering that until #12697 we have a similar loop in every INFO call.

still, considering that this config is mainly meant to avoid latency issues from long cron, maybe we can let redis keep track of the rehashed dicts even when disabled?
i.e. the config will reduce latency issues, but not the memory overhead.

i suppose it depends if users that disable this config do it only temporarily (when the time latency sensitive period, and re-enable it later), or do they keep it permanently disabled.

does anyone have any suggestion / opinion / info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not adding the missed out dictionaries to the list could prolong the rehash for quite a long period of time. I believe it would be good to update the list (note: it skips the dictionaries which doesn't have any data).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judeng Do you have any insights on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we always disable the config to avoid latency and cpu overload. IMO, temporary operation is too late because rehashing doesn't last long.

Maybe we can separate rehashing lists from incrementallyRehash function, we could rotate the list and check if dicts' rehashing is completed, no matter activerehashing is disabled or enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can separate rehashing lists from incrementallyRehash function, we could rotate the list and check if dicts' rehashing is completed, no matter activerehashing is disabled or enabled.

i don't think that's needed, the changes in #12697 would still clean this list when the dict reshard ends (regardless of this config).

Now we always disable the config to avoid latency and cpu overload. IMO, temporary operation is too late because rehashing doesn't last long.

i meant that if you have periods of low traffic, or low latency demands, it can be enabled for these periods.

Not adding the missed out dictionaries to the list could prolong the rehash for quite a long period of time. I believe it would be good to update the list (note: it skips the dictionaries which doesn't have any data).

do you mean that it's a good idea to keep maintaining it even when reshard is disabled (small memory overhead), or you think the latency spike of re-computing it (current code) is ok?
maybe you can measure it?

@oranagra
Copy link
Member

closing in favor of #12848

@oranagra oranagra closed this Dec 10, 2023
soloestoy added a commit that referenced this pull request Dec 15, 2023
After #11695, we added two functions `rehashingStarted` and
`rehashingCompleted` to the dict structure. We also registered two
handlers for the main database's dict and expire structures. This allows
the main database to record the dict in `rehashing` list when rehashing
starts. Later, in `serverCron`, the `incrementallyRehash` function is
continuously called to perform the rehashing operation. However,
currently, when rehashing is completed, `rehashingCompleted` does not
remove the dict from the `rehashing` list. This results in the
`rehashing` list containing many invalid dicts. Although subsequent cron
checks and removes dicts that don't require rehashing, it is still
inefficient.

This PR implements the functionality to remove the dict from the
`rehashing` list in `rehashingCompleted`. This is achieved by adding
`metadata` to the dict structure, which keeps track of its position in
the `rehashing` list, allowing for quick removal. This approach avoids
storing duplicate dicts in the `rehashing` list.

Additionally, there are other modifications:

1. Whether in standalone or cluster mode, the dict in database is
inserted into the rehashing linked list when rehashing starts. This
eliminates the need to distinguish between standalone and cluster mode
in `incrementallyRehash`. The function only needs to focus on the dicts
in the `rehashing` list that require rehashing.
2. `rehashing` list is moved from per-database to Redis server level.
This decouples `incrementallyRehash` from the database ID, and in
standalone mode, there is no need to iterate over all databases,
avoiding unnecessary access to databases that do not require rehashing.
In the future, even if unsharded-cluster mode supports multiple
databases, there will be no risk involved.
3. The insertion and removal operations of dict structures in the
`rehashing` list are decoupled from `activerehashing` config.
`activerehashing` only controls whether `incrementallyRehash` is
executed in serverCron. There is no need for additional steps when
modifying the `activerehashing` switch, as in #12705.
roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
After redis#11695, we added two functions `rehashingStarted` and
`rehashingCompleted` to the dict structure. We also registered two
handlers for the main database's dict and expire structures. This allows
the main database to record the dict in `rehashing` list when rehashing
starts. Later, in `serverCron`, the `incrementallyRehash` function is
continuously called to perform the rehashing operation. However,
currently, when rehashing is completed, `rehashingCompleted` does not
remove the dict from the `rehashing` list. This results in the
`rehashing` list containing many invalid dicts. Although subsequent cron
checks and removes dicts that don't require rehashing, it is still
inefficient.

This PR implements the functionality to remove the dict from the
`rehashing` list in `rehashingCompleted`. This is achieved by adding
`metadata` to the dict structure, which keeps track of its position in
the `rehashing` list, allowing for quick removal. This approach avoids
storing duplicate dicts in the `rehashing` list.

Additionally, there are other modifications:

1. Whether in standalone or cluster mode, the dict in database is
inserted into the rehashing linked list when rehashing starts. This
eliminates the need to distinguish between standalone and cluster mode
in `incrementallyRehash`. The function only needs to focus on the dicts
in the `rehashing` list that require rehashing.
2. `rehashing` list is moved from per-database to Redis server level.
This decouples `incrementallyRehash` from the database ID, and in
standalone mode, there is no need to iterate over all databases,
avoiding unnecessary access to databases that do not require rehashing.
In the future, even if unsharded-cluster mode supports multiple
databases, there will be no risk involved.
3. The insertion and removal operations of dict structures in the
`rehashing` list are decoupled from `activerehashing` config.
`activerehashing` only controls whether `incrementallyRehash` is
executed in serverCron. There is no need for additional steps when
modifying the `activerehashing` switch, as in redis#12705.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants