Skip to content

Conversation

@hpatro
Copy link
Contributor

@hpatro hpatro commented Oct 26, 2023

As part of #11695 independent dictionaries were introduced per slot. Time complexity to discover total no. of buckets across all dictionaries increased to O(N) with straightforward implementation of iterating over all dictionaries and adding dictBuckets of each.

To optimize the time complexity, we could maintain a global counter at db level to keep track of the count of buckets and update it on the start and end of rehashing.

Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
@hpatro
Copy link
Contributor Author

hpatro commented Oct 27, 2023

For slot not owned by the node, the above solution wouldn't account any buckets for it. However, for slots active on a node and later migrated I believe we don't reduce the size to 0 bucket. So, additional 4 bucket for such slot would be accounted which I believe is the true state of the node.

@oranagra let me know what you think about it.

@oranagra
Copy link
Member

If the memory for the dict is still allocated, then that ok. If we released it, we should update the counter

@hpatro
Copy link
Contributor Author

hpatro commented Oct 27, 2023

@oranagra I think we need to improve the logic introduced in #11695 to not add to the list when activerehashing is disabled, which is reasonable.

However, when activerehashing is re-enabled we should scan through all the dictionaries and add them to rehashing list to allow incrementally rehash logic to work.

@oranagra
Copy link
Member

Ok. Let's merge this one and handle that issue separately.

@oranagra oranagra merged commit 4145d62 into redis:unstable Oct 27, 2023
oranagra pushed a commit that referenced this pull request Nov 10, 2023
Introduced in #12697 , should reset bucket_count when empty db, or the overhead memory usage of db can be miscalculated.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Nov 14, 2023
The change in dbSwapDatabases seems harmless. Because in non-clustered
mode, dbBuckets calculations are strictly accurate and in cluster mode,
we only have one DB. Modify it for uniformity (just like resize_cursor).

The change in swapMainDbWithTempDb is needed in case we swap with the
temp db, otherwise the overhead memory usage of db can be miscalculated.

Introduced in redis#12697.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Dec 8, 2023
…tion

In the old dictRehashingInfo implementation, for the initialization scenario,
it mistakenly directly set to_size to DICTHT_SIZE(DICT_HT_INITIAL_EXP), which
is 4 in our code by default.

In scenarios where dictExpand directly passes the target size as initialization,
the code will calculate bucket_count incorrectly. For example, in DEBUG POPULATE
or RDB load scenarios, it will cause the final bucket_count to be initialized to
65536 (16384 * 4), see:
```
before:
DB 0: 10000000 keys (0 volatile) in 65536 slots HT.

it should be:
DB 0: 10000000 keys (0 volatile) in 16777216 slots HT.
```

In PR, new ht will also be initialized before calling rehashingStarted in
_dictExpand, so that the calls in dictRehashingInfo can be unified.

This PR also cleans up dictRehashingStarted* and dictRehashingCompleted*,
eliminating some duplicate code.

Bug was introduced in redis#12697.
oranagra pushed a commit that referenced this pull request Dec 10, 2023
…on (#12846)

In the old dictRehashingInfo implementation, for the initialization
scenario,
it mistakenly directly set to_size to DICTHT_SIZE(DICT_HT_INITIAL_EXP),
which
is 4 in our code by default.

In scenarios where dictExpand directly passes the target size as
initialization,
the code will calculate bucket_count incorrectly. For example, in DEBUG
POPULATE
or RDB load scenarios, it will cause the final bucket_count to be
initialized to
65536 (16384 * 4), see:
```
before:
DB 0: 10000000 keys (0 volatile) in 65536 slots HT.

it should be:
DB 0: 10000000 keys (0 volatile) in 16777216 slots HT.
```

In PR, new ht will also be initialized before calling rehashingStarted
in
_dictExpand, so that the calls in dictRehashingInfo can be unified.

Bug was introduced in #12697.
oranagra pushed a commit that referenced this pull request Dec 10, 2023
…2763)

The change in dbSwapDatabases seems harmless. Because in non-clustered
mode, dbBuckets calculations are strictly accurate and in cluster mode,
we only have one DB. Modify it for uniformity (just like resize_cursor).

The change in swapMainDbWithTempDb is needed in case we swap with the
temp db, otherwise the overhead memory usage of db can be miscalculated.

In addition we will swap all fields (including rehashing list), just for
completeness (and reduce the chance of surprises in the future).

Introduced in #12697.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants