Limit the main db and expires dictionaries to expand #7954

ShooterIT · 2020-10-24T15:41:54Z

As we know, redis may reject user's requests or evict some keys if used memory is over maxmemory. Dictionaries expanding may make things worse, some big dictionaries, such as main db and expires dict, may eat huge memory at once for allocating a new big hash table and be far more than maxmemory after expanding. There are related issues: #4213 #4583

More details, when expand dict in redis, we will allocate a new big ht[1] that generally is double of ht[0], The size of ht[1] will be very big if ht[0] already is big. For db dict, if we have more than 64 million keys, we need to cost 1GB for ht[1] when dict expands.

If the sum of used memory and new hash table of dict needed exceeds maxmemory, we shouldn't allow the dict to expand. Because, if we enable keys eviction, we still couldn't add much more keys after eviction and rehashing, what's worse, redis will keep less keys when redis only remains a little memory for storing new hash table instead of users' data. Moreover users can't write data in redis if disable keys eviction.

What this commit changed ?

Add a new member function expandAllowed for dict type, it provide a way for caller to allow expand or not. We expose two parameters for this function: more memory needed for expanding and dict current load factor, users can implement a function to make a decision by them.
For main db dict and expires dict type, these dictionaries may be very big and cost huge memory for expanding, so we implement a judgement function: we can stop dict to expand provisionally if used memory will be over maxmemory after dict expands, but to guarantee the performance of redis, we still allow dict to expand if dict load factor exceeds the safe load factor.
Add test cases to verify we don't allow main db to expand when left memory is not enough, so that avoid keys eviction.

Other changes:

For new hash table size when expand. Before this commit, the size is that double used of dict and later _dictNextPower. Actually we aim to control a dict load factor between 0.5 and 1.0. Now we replace *2 with +1, since the first check is that used >= size, the outcome of before will usually be the same as _dictNextPower(used+1). The only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e. _dictNextPower(1025*2) will return 4096).
Fix rehash test cases due to changing algorithm of new hash table size when expand.

oranagra

@ShooterIT thanks you for this PR and the idea it brings.
I think we need to work on it a bit more to make sure it doesn't have any undesired side effects.
We can easily think of cases, the rehash is causing damage, I think I would rather try to limit the implications of this change to such cases, to avoid surprises.

src/dict.c

tests/unit/maxmemory.tcl

src/dict.c

madolson

Going to repeat one of my specific points, but I think we need a way to make sure we don't end up with a poorly balanced DB. We can prevent rehashing but keep adding in more items, at some point we need to force a rehash or give visibility into it. A real usecase I can see is someone has a redis database that they keep bumping up maxmemory as they need it, using a policy without eviction, but the dictionary never rehashes.

src/dict.c

oranagra · 2020-11-11T17:08:01Z

Had a discuss this with @yossigo @guybe7 @MeirShpilraien @inbaryuval @yaacovhazan-Redislabs, and here's the conclusions:

It's hard to predict the impact of such a mechanism when enabled on all dicts, we feel we should focus on just solving the major known problem of the main dict, in which is easier to realize all the implications of this change. for that, a new field can be added in the dictType struct, so it doesn't have a cost per instance of the dictionary.
this is essentially a balance between memory and execution performance, currently redis aims to hit a dict saturation ratio between 0.5 and 1.0 (as soon at it reaches 1.0 it bring it back to 0.5), but these two constants can in theory be set to different ones (0.8 and 1.2 or anything else).
the code that does both used*2 and later _dictNextPower seems like a bug. since the check is that used >= size, the outcome of this will usually be the same as _dictNextPower(used+1) (no need for the *2). the only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e. _dictNextPower(1025*2) will give you 4096).

proposals:

replace the *2 with +1 (will only affect wakeup after fork)
when there is enough memory, keep the logic like it is (saturation between 0.5 and 1.0), but when the memory is in shortage (rehash is going to cause eviction), we can relax the saturation threshold, but only up to a limit. let's say up to saturation of 1.61803 (after that we will prefer to rehash, despite the eviction it'll cause).
we can probably revise the rehashing code to be a little bit less memory hungry during rehashing (i'll open another issue to describe the design).

ShooterIT · 2020-11-12T17:16:27Z

Thanks @oranagra @madolson
After more thought, I agree with you, and we can only implement this strategy on main DB dict, we can't even on expire dict. Because we don't know users' eviction policy, but we definitely add key and evict key on main DB dict. just as Madelyn said, we may delete key from main db but add members on hash object.

We already know new hash table size maybe *4 or *8 of old size when add many keys during having forked child. I feel good to replace the *2 with +1, and we needn't to try to check again if memory is not enough.

For hard saturation 1.61803 (golden ratio?), saturation will be 0.8 after rehash when reach 1.61803, we can add some new keys without performance loss after eviction, right?

ShooterIT · 2020-11-16T14:46:33Z

But if we set a hard hash saturation limit, can we apply this strategy on all dict when has less memory？

If we add a new field into the dictType struct, we may need to modify all dictType variables. Currently, my implementation is like setting dict_can_resize, does it break encapsulation? @madolson

oranagra · 2020-11-16T15:23:16Z

i rather modify 20 structs in many places in redis code base, than add a new member per dict instance (considering it's memory consumption).

but maybe considering that the current plan (if we take what i posted) is just a matter of changing the threshold (and not a yes/no boolean decision), we can apply that on all dictionaries.

p.s. if we do that, i suppose the selection between the threshold should not be if used_memory + new_malloc > maxmemory, but rather check that we're close to the limit (over used+new_malloc > 95% or alike), otherwise, considering that the last eviction could have evicted a large object, we have a big chunk of unused memory, and small dicts will not use the right threshold.

madolson · 2020-11-16T23:51:07Z

I would also put it on the 20 structs still, I don't think it's that bad and gives us more flexibility.

"but maybe considering that the current plan (if we take what i posted) is just a matter of changing the threshold (and not a yes/no boolean decision), we can apply that on all dictionaries." I agree we could.

ShooterIT · 2020-11-17T06:29:39Z

Yes, adding a member function into dictType can give us more flexibility, I feel ok, it just need to modify many structs.

I hope we can apply this mechanism on more dicts. For correctness, whatever eviction policy, it is ok to apply this mechanism on main db. But we need to pay attention on expire dict. I experienced one case, redis immediately eat extra 1GB memory when user keys number were more than 32M. For main db rehash, it only eat 512MB(64M * 8Byte). And I also found all keys have expired time, so I thought another 512MB was eaten by expire dict rehash.

ShooterIT · 2020-11-25T08:53:06Z

hi @oranagra @madolson

I did several things in current commit.

Add member function in dict type to check whether dict can expand, and expose two parameters: more memory rehash need and dict current load factor, users can implement function to make a decision by them.
For main db dict and expires dict type, I implement a expand function.
For new hash table size after expanding, i use next power of two over ht[0].used.

There are some things i am not sure, need your help

the name of new member function (so currently i don't add comments for every dictType, i will add if we confirm).
I don't add expanding check implementations for other dicts.

Please review

src/dict.h

src/evict.c

src/server.c

tests/unit/maxmemory.tcl

oranagra · 2020-11-25T09:16:17Z

thanks @ShooterIT.
this approach looks good to me (good to be merged). posted a few minor comments.

oranagra · 2020-11-25T21:27:10Z

@redis/core-team i think this one is ready to be merged.
@ShooterIT please update the top comment of the PR for an easy review, and also to be used as a commit comment when squash-merging.

itamarhaber

LGTM but didn't review

src/dict.c

ShooterIT · 2020-11-28T15:01:13Z

Hi @oranagra done, please have a look

src/server.c

itamarhaber · 2020-12-01T21:47:42Z

FYI @filipecosta90 because performance, duh.

src/config.c

src/evict.c

src/server.c

JimB123 · 2020-12-03T18:51:48Z

Overall, I think this is a good change, and is complementary with the recent change to support incremental eviction.

I like the rename of the dictType for the expires table. The current name is very general and the new usage is specific.

I concur with @ShooterIT regarding his analysis of performance. freeMemoryGetNotCountedMemory is already called once on all writes when memory is high - so it can't be a significant performance problem or it would be impacting current performance. In this case, it would be called a 2nd time for the subset of writes which are "adds". I've looked at the code path and it doesn't seem to be significant (now) and can't become significant (future) without impacting general write performance when memory is high. I don't think it would be worth the complexity to cache this value.

oranagra · 2020-12-03T19:04:54Z

@JimB123 which dictType for eviction table?

JimB123 · 2020-12-03T20:38:31Z

@JimB123 which dictType for eviction table?

Sorry - expires, not evict. keyptrDictType -> dbExpiresDictType

madolson

I'm good with the change

oranagra · 2020-12-04T09:58:56Z

@JimB123 we did rename it (IIRC i asked for it for the exact same reason as you)

As we know, redis may reject user's requests or evict some keys if used memory is over maxmemory. Dictionaries expanding may make things worse, some big dictionaries, such as main db and expires dict, may eat huge memory at once for allocating a new big hash table and be far more than maxmemory after expanding. There are related issues: redis#4213 redis#4583 More details, when expand dict in redis, we will allocate a new big ht[1] that generally is double of ht[0], The size of ht[1] will be very big if ht[0] already is big. For db dict, if we have more than 64 million keys, we need to cost 1GB for ht[1] when dict expands. If the sum of used memory and new hash table of dict needed exceeds maxmemory, we shouldn't allow the dict to expand. Because, if we enable keys eviction, we still couldn't add much more keys after eviction and rehashing, what's worse, redis will keep less keys when redis only remains a little memory for storing new hash table instead of users' data. Moreover users can't write data in redis if disable keys eviction. What this commit changed ? Add a new member function expandAllowed for dict type, it provide a way for caller to allow expand or not. We expose two parameters for this function: more memory needed for expanding and dict current load factor, users can implement a function to make a decision by them. For main db dict and expires dict type, these dictionaries may be very big and cost huge memory for expanding, so we implement a judgement function: we can stop dict to expand provisionally if used memory will be over maxmemory after dict expands, but to guarantee the performance of redis, we still allow dict to expand if dict load factor exceeds the safe load factor. Add test cases to verify we don't allow main db to expand when left memory is not enough, so that avoid keys eviction. Other changes: For new hash table size when expand. Before this commit, the size is that double used of dict and later _dictNextPower. Actually we aim to control a dict load factor between 0.5 and 1.0. Now we replace *2 with +1, since the first check is that used >= size, the outcome of before will usually be the same as _dictNextPower(used+1). The only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e. _dictNextPower(1025*2) will return 4096). Fix rehash test cases due to changing algorithm of new hash table size when expand.

In the past, we did not call _dictNextExp frequently. It was only called when the dictionary was expanded. Later, dictTypeExpandAllowed was introduced in redis#7954, which is 6.2. For the data dict and the expire dict, we can check maxmemory before actually expanding the dict. This is a good optimization to avoid maxmemory being exceeded due to the dict expansion. And in redis#11692, we moved the dictTypeExpandAllowed check before the threshold check, this caused a bit of performance degradation, every time a key is added to the dict, dictTypeExpandAllowed is called to check. The main reason for degradation is that in a large dict, we need to call _dictNextExp frequently, that is, every time we add a key, we need to call _dictNextExp once. Then the threshold is checked to see if the dict needs to be expanded. We can see that the order of checks here can be optimized. So we moved the dictTypeExpandAllowed check back to after the threshold check in redis#12789. In this way, before the dict is actually expanded (that is, before the threshold is reached), we will not do anything extra compared to before, that is, we will not call _dictNextExp frequently. But note we'll still hit the degradation when we over the thresholds. When the threshold is reached, because redis#7954, we may delay the dict expansion due to maxmemory limitations. In this case, we will call _dictNextExp every time we add a key during this period. This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count leading zeros) can easily give you the next power of two. It should be noted that we have actually introduced the use of __builtin_clzl in redis#8687, which is 7.0. So i suppose all the platforms we use have it (even if the CPU doesn't have an instruction). We build 67108864 (2**26) keys through DEBUG POPULTE, which will use approximately 5.49G memory (used_memory:5898522936). If expansion is triggered, the additional hash table will consume approximately 1G memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G), which will be less than 5.49G + 1G, so we will delay the dict rehash while addint the keys. After that, each time an element is added to the dict, an allow check will be performed, that is, we can frequently call _dictNextExp to test the comparison before and after the optimization. Using DEBUG HTSTATS 0 to check and make sure that our dict expansion is dealyed. Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`, After ten rounds of testing: ``` unstable: this PR: 769585.94 816860.00 771724.00 818196.69 775674.81 822368.44 781983.12 822503.69 783576.25 828088.75 784190.75 828637.75 791389.69 829875.50 794659.94 835660.69 798212.00 830013.25 801153.62 833934.56 ``` We can see there is about 4-5% performance improvement in this case.

In the past, we did not call _dictNextExp frequently. It was only called when the dictionary was expanded. Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2. For the data dict and the expire dict, we can check maxmemory before actually expanding the dict. This is a good optimization to avoid maxmemory being exceeded due to the dict expansion. And in #11692, we moved the dictTypeExpandAllowed check before the threshold check, this caused a bit of performance degradation, every time a key is added to the dict, dictTypeExpandAllowed is called to check. The main reason for degradation is that in a large dict, we need to call _dictNextExp frequently, that is, every time we add a key, we need to call _dictNextExp once. Then the threshold is checked to see if the dict needs to be expanded. We can see that the order of checks here can be optimized. So we moved the dictTypeExpandAllowed check back to after the threshold check in #12789. In this way, before the dict is actually expanded (that is, before the threshold is reached), we will not do anything extra compared to before, that is, we will not call _dictNextExp frequently. But note we'll still hit the degradation when we over the thresholds. When the threshold is reached, because #7954, we may delay the dict expansion due to maxmemory limitations. In this case, we will call _dictNextExp every time we add a key during this period. This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count leading zeros) can easily give you the next power of two. It should be noted that we have actually introduced the use of __builtin_clzl in #8687, which is 7.0. So i suppose all the platforms we use have it (even if the CPU doesn't have an instruction). We build 67108864 (2**26) keys through DEBUG POPULTE, which will use approximately 5.49G memory (used_memory:5898522936). If expansion is triggered, the additional hash table will consume approximately 1G memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G), which will be less than 5.49G + 1G, so we will delay the dict rehash while addint the keys. After that, each time an element is added to the dict, an allow check will be performed, that is, we can frequently call _dictNextExp to test the comparison before and after the optimization. Using DEBUG HTSTATS 0 to check and make sure that our dict expansion is dealyed. Using `./src/redis-server redis.conf --save "" --maxmemory 6871947673`. Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`. After ten rounds of testing: ``` unstable: this PR: 769585.94 816860.00 771724.00 818196.69 775674.81 822368.44 781983.12 822503.69 783576.25 828088.75 784190.75 828637.75 791389.69 829875.50 794659.94 835660.69 798212.00 830013.25 801153.62 833934.56 ``` We can see there is about 4-5% performance improvement in this case.

In the past, we did not call _dictNextExp frequently. It was only called when the dictionary was expanded. Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2. For the data dict and the expire dict, we can check maxmemory before actually expanding the dict. This is a good optimization to avoid maxmemory being exceeded due to the dict expansion. And in #11692, we moved the dictTypeExpandAllowed check before the threshold check, this caused a bit of performance degradation, every time a key is added to the dict, dictTypeExpandAllowed is called to check. The main reason for degradation is that in a large dict, we need to call _dictNextExp frequently, that is, every time we add a key, we need to call _dictNextExp once. Then the threshold is checked to see if the dict needs to be expanded. We can see that the order of checks here can be optimized. So we moved the dictTypeExpandAllowed check back to after the threshold check in #12789. In this way, before the dict is actually expanded (that is, before the threshold is reached), we will not do anything extra compared to before, that is, we will not call _dictNextExp frequently. But note we'll still hit the degradation when we over the thresholds. When the threshold is reached, because #7954, we may delay the dict expansion due to maxmemory limitations. In this case, we will call _dictNextExp every time we add a key during this period. This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count leading zeros) can easily give you the next power of two. It should be noted that we have actually introduced the use of __builtin_clzl in #8687, which is 7.0. So i suppose all the platforms we use have it (even if the CPU doesn't have an instruction). We build 67108864 (2**26) keys through DEBUG POPULTE, which will use approximately 5.49G memory (used_memory:5898522936). If expansion is triggered, the additional hash table will consume approximately 1G memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G), which will be less than 5.49G + 1G, so we will delay the dict rehash while addint the keys. After that, each time an element is added to the dict, an allow check will be performed, that is, we can frequently call _dictNextExp to test the comparison before and after the optimization. Using DEBUG HTSTATS 0 to check and make sure that our dict expansion is dealyed. Using `./src/redis-server redis.conf --save "" --maxmemory 6871947673`. Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`. After ten rounds of testing: ``` unstable: this PR: 769585.94 816860.00 771724.00 818196.69 775674.81 822368.44 781983.12 822503.69 783576.25 828088.75 784190.75 828637.75 791389.69 829875.50 794659.94 835660.69 798212.00 830013.25 801153.62 833934.56 ``` We can see there is about 4-5% performance improvement in this case. (cherry picked from commit 22cc9b5)

ShooterIT mentioned this pull request Oct 24, 2020

[NEW] Client "eviction" - drop clients when their total buffer overhead is over a limit #7676

Closed

oranagra reviewed Oct 24, 2020

View reviewed changes

src/dict.c Outdated Show resolved Hide resolved

src/dict.c Outdated Show resolved Hide resolved

tests/unit/maxmemory.tcl Outdated Show resolved Hide resolved

ShooterIT commented Oct 25, 2020

View reviewed changes

src/dict.c Outdated Show resolved Hide resolved

oranagra added this to the Next minor backlog milestone Oct 28, 2020

madolson reviewed Nov 9, 2020

View reviewed changes

src/dict.c Outdated Show resolved Hide resolved

ShooterIT force-pushed the rehash branch from 915859c to 66a0716 Compare November 25, 2020 08:22

oranagra reviewed Nov 25, 2020

View reviewed changes

ShooterIT force-pushed the rehash branch from 66a0716 to 2ec5470 Compare November 25, 2020 11:34

ShooterIT changed the title ~~Can't rehash if used memory exceeds maxmemory after rehashing~~ Limit the main db and expires dictionaries to expand Nov 25, 2020

Limit the main db and expires dictionaries to expand

d083b8c

ShooterIT force-pushed the rehash branch from 2ec5470 to d083b8c Compare November 25, 2020 11:48

Optimize comments and function names

e74ba8e

oranagra previously approved these changes Nov 25, 2020

View reviewed changes

oranagra added state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten release-notes indication that this issue needs to be mentioned in the release notes labels Nov 25, 2020

itamarhaber reviewed Nov 26, 2020

View reviewed changes

sundb reviewed Nov 26, 2020

View reviewed changes

src/dict.c Show resolved Hide resolved

yossigo previously approved these changes Dec 1, 2020

View reviewed changes

src/server.c Show resolved Hide resolved

itamarhaber previously approved these changes Dec 3, 2020

View reviewed changes

JimB123 reviewed Dec 3, 2020

View reviewed changes

src/config.c Show resolved Hide resolved

JimB123 reviewed Dec 3, 2020

View reviewed changes

src/evict.c Show resolved Hide resolved

JimB123 reviewed Dec 3, 2020

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

madolson previously approved these changes Dec 4, 2020

View reviewed changes

typo in comment

504932e

oranagra dismissed stale reviews from madolson, itamarhaber, yossigo, and themself via 504932e December 6, 2020 09:28

oranagra added 2 commits December 6, 2020 11:29

Merge branch 'unstable' into rehash

c7b4288

rename again after rebase.

20d8a8e

oranagra approved these changes Dec 6, 2020

View reviewed changes

oranagra merged commit 75f9dec into redis:unstable Dec 6, 2020

oranagra mentioned this pull request Jan 13, 2021

Redis 6.2 RC1. #8187

Merged

ShooterIT deleted the rehash branch March 11, 2021 07:33

sundb mentioned this pull request Jun 29, 2021

The dict may expand with fourfold size #5220

Closed

yoav-steinberg mentioned this pull request Dec 14, 2021

dict expand take more space than expected #5984

Closed

enjoy-binbin mentioned this pull request Nov 28, 2023

Use CLZ in _dictNextExp to get the next power of two #12815

Merged

Limit the main db and expires dictionaries to expand #7954

Limit the main db and expires dictionaries to expand #7954

Uh oh!

Conversation

ShooterIT commented Oct 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oranagra commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShooterIT commented Nov 12, 2020

Uh oh!

ShooterIT commented Nov 16, 2020

Uh oh!

oranagra commented Nov 16, 2020

Uh oh!

madolson commented Nov 16, 2020

Uh oh!

ShooterIT commented Nov 17, 2020

Uh oh!

ShooterIT commented Nov 25, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra commented Nov 25, 2020

Uh oh!

oranagra commented Nov 25, 2020

Uh oh!

itamarhaber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ShooterIT commented Nov 28, 2020

Uh oh!

Uh oh!

itamarhaber commented Dec 1, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JimB123 commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Dec 3, 2020

Uh oh!

JimB123 commented Dec 3, 2020

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ShooterIT commented Oct 24, 2020 •

edited

Loading

madolson left a comment •

edited

Loading

oranagra commented Nov 11, 2020 •

edited

Loading

JimB123 commented Dec 3, 2020 •

edited

Loading