Reduce dbBuckets operation time complexity from O(N) to O(1) #12697

hpatro · 2023-10-26T01:57:49Z

As part of #11695 independent dictionaries were introduced per slot. Time complexity to discover total no. of buckets across all dictionaries increased to O(N) with straightforward implementation of iterating over all dictionaries and adding dictBuckets of each.

To optimize the time complexity, we could maintain a global counter at db level to keep track of the count of buckets and update it on the start and end of rehashing.

Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>

src/server.c

src/db.c

hpatro · 2023-10-27T05:50:24Z

For slot not owned by the node, the above solution wouldn't account any buckets for it. However, for slots active on a node and later migrated I believe we don't reduce the size to 0 bucket. So, additional 4 bucket for such slot would be accounted which I believe is the true state of the node.

@oranagra let me know what you think about it.

oranagra · 2023-10-27T11:56:16Z

If the memory for the dict is still allocated, then that ok. If we released it, we should update the counter

src/server.c

hpatro · 2023-10-27T17:12:17Z

@oranagra I think we need to improve the logic introduced in #11695 to not add to the list when activerehashing is disabled, which is reasonable.

However, when activerehashing is re-enabled we should scan through all the dictionaries and add them to rehashing list to allow incrementally rehash logic to work.

oranagra · 2023-10-27T19:02:29Z

Ok. Let's merge this one and handle that issue separately.

Introduced in #12697 , should reset bucket_count when empty db, or the overhead memory usage of db can be miscalculated.

The change in dbSwapDatabases seems harmless. Because in non-clustered mode, dbBuckets calculations are strictly accurate and in cluster mode, we only have one DB. Modify it for uniformity (just like resize_cursor). The change in swapMainDbWithTempDb is needed in case we swap with the temp db, otherwise the overhead memory usage of db can be miscalculated. Introduced in redis#12697.

…tion In the old dictRehashingInfo implementation, for the initialization scenario, it mistakenly directly set to_size to DICTHT_SIZE(DICT_HT_INITIAL_EXP), which is 4 in our code by default. In scenarios where dictExpand directly passes the target size as initialization, the code will calculate bucket_count incorrectly. For example, in DEBUG POPULATE or RDB load scenarios, it will cause the final bucket_count to be initialized to 65536 (16384 * 4), see: ``` before: DB 0: 10000000 keys (0 volatile) in 65536 slots HT. it should be: DB 0: 10000000 keys (0 volatile) in 16777216 slots HT. ``` In PR, new ht will also be initialized before calling rehashingStarted in _dictExpand, so that the calls in dictRehashingInfo can be unified. This PR also cleans up dictRehashingStarted* and dictRehashingCompleted*, eliminating some duplicate code. Bug was introduced in redis#12697.

…on (#12846) In the old dictRehashingInfo implementation, for the initialization scenario, it mistakenly directly set to_size to DICTHT_SIZE(DICT_HT_INITIAL_EXP), which is 4 in our code by default. In scenarios where dictExpand directly passes the target size as initialization, the code will calculate bucket_count incorrectly. For example, in DEBUG POPULATE or RDB load scenarios, it will cause the final bucket_count to be initialized to 65536 (16384 * 4), see: ``` before: DB 0: 10000000 keys (0 volatile) in 65536 slots HT. it should be: DB 0: 10000000 keys (0 volatile) in 16777216 slots HT. ``` In PR, new ht will also be initialized before calling rehashingStarted in _dictExpand, so that the calls in dictRehashingInfo can be unified. Bug was introduced in #12697.

…2763) The change in dbSwapDatabases seems harmless. Because in non-clustered mode, dbBuckets calculations are strictly accurate and in cluster mode, we only have one DB. Modify it for uniformity (just like resize_cursor). The change in swapMainDbWithTempDb is needed in case we swap with the temp db, otherwise the overhead memory usage of db can be miscalculated. In addition we will swap all fields (including rehashing list), just for completeness (and reduce the chance of surprises in the future). Introduced in #12697.

Optimize dbBuckets operation from O(N) to O(1)

dca853a

Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>

hpatro requested review from enjoy-binbin, madolson and oranagra October 26, 2023 01:58

hpatro mentioned this pull request Oct 26, 2023

Fix test, disable expiration until empty buckets are formed #12689

Merged

oranagra reviewed Oct 26, 2023

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

hpatro added 2 commits October 26, 2023 20:16

Fix non CME scenario

d2f7fcf

Merge branch 'unstable' into optimize-db-buckets

1ff0182

zuiderkwast reviewed Oct 26, 2023

View reviewed changes

src/db.c Outdated Show resolved Hide resolved

hpatro added 6 commits October 27, 2023 05:27

Fix CME scenario

44977ec

Add message to server crash on invalid key type

3b24b52

Fix code comment

9e60c36

Fix CME scenario

31de92e

Remove unnecessary code

cd2684b

Fix spellcheck

77bda39

oranagra reviewed Oct 27, 2023

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

hpatro added 2 commits October 27, 2023 14:41

Remove unnecesary conditional check of rehashing on completion cb

3defb2a

Handle bucket increase while rehashing is disabled

0e246e9

oranagra approved these changes Oct 27, 2023

View reviewed changes

oranagra merged commit 4145d62 into redis:unstable Oct 27, 2023

oranagra mentioned this pull request Oct 29, 2023

Update active rehashing list on change in activerehashing config #12705

Closed

soloestoy mentioned this pull request Nov 10, 2023

reset bucket_count when empty db #12750

Merged

oranagra pushed a commit that referenced this pull request Nov 10, 2023

reset bucket_count when empty db (#12750)

6258ede

Introduced in #12697 , should reset bucket_count when empty db, or the overhead memory usage of db can be miscalculated.

enjoy-binbin mentioned this pull request Nov 14, 2023

Handle missing fields in dbSwapDatabases and swapMainDbWithTempDb #12763

Merged

enjoy-binbin mentioned this pull request Dec 8, 2023

Fix rehashingStarted miscalculating bucket_count in dict initialization #12846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce dbBuckets operation time complexity from O(N) to O(1) #12697

Reduce dbBuckets operation time complexity from O(N) to O(1) #12697

Uh oh!

hpatro commented Oct 26, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hpatro commented Oct 27, 2023

Uh oh!

oranagra commented Oct 27, 2023

Uh oh!

Uh oh!

hpatro commented Oct 27, 2023 •

edited

Loading

Uh oh!

oranagra commented Oct 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce dbBuckets operation time complexity from O(N) to O(1) #12697

Reduce dbBuckets operation time complexity from O(N) to O(1) #12697

Uh oh!

Conversation

hpatro commented Oct 26, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hpatro commented Oct 27, 2023

Uh oh!

oranagra commented Oct 27, 2023

Uh oh!

Uh oh!

hpatro commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Oct 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hpatro commented Oct 27, 2023 •

edited

Loading