-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Optimize resizing hash table to resize not only non-empty dicts. #12819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize resizing hash table to resize not only non-empty dicts. #12819
Conversation
|
it's better to add a test case, such as using memory stats to check |
54cd573 to
0ad566d
Compare
|
@CharlesChen888 please avoid force-pushes. |
|
Another tip: Currently, the function |
|
good point. let's do it. |
|
I think #12850 should focus on the dictionary itself. The changes to |
|
@CharlesChen888 to test our last change, please remove the excessive line in the please also list this change and the reason behind it in the top comment, and make sure other details from this PR are described there. thank you! |
997ddd1 to
cc6a77e
Compare
|
Sorry about the force push again... I was trying my git plugin and it didn't work properly as I thought. |
|
we discussed this PR in the core-team meeting, and we had the following conclusions:
|
When we insert entries into dict, it may autonomously expand if needed. However, when we delete entries from dict, it doesn't shrink to the proper size. If there are few entries in a very large dict, it may cause huge waste of memory and inefficiency when iterating. The main keyspace dicts (keys and expires), are shrinked by cron (`tryResizeHashTables` calls `htNeedsResize` and `dictResize`), And some data structures such as zset and hash also do that (call `htNeedsResize`) right after a loop of calls to `dictDelete`, But many other dicts are completely missing that call (they can only expand). In this PR, we provide the ability to automatically shrink the dict when deleting. The conditions triggering the shrinking is the same as `htNeedsResize` used to have. i.e. we expand when we're over 100% utilization, and shrink when we're below 10% utilization. Additionally: * Add `dictPauseAutoResize` so that flows that do mass deletions, will only trigger shrinkage at the end. * Rename `dictResize` to `dictShrinkToFit` (same logic as it used to have, but better name describing it) * Rename `_dictExpand` to `_dictResize` (same logic as it used to have, but better name describing it) related to discussion #12819 (comment) --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
|
@CharlesChen888 now that #12850 was merged, can you look into the approach from the previous comment? |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soloestoy please take another look
|
@CharlesChen888 i've edited the top comment, please take a look. |
|
@oranagra Thanks for editing the top comment. I am OK with it. I also mentioned the specific APIs that I changed. Some comments are added according to @soloestoy 's advise. |
tests consistently fail on timeout (sleep that's too short). it now takes more time because in #12819 we iterate on all dicts, not just non-empty ones. it passed the PR's CI because it skips the `slow` tag, which might have been misplaced, but now it is probably required. with the fix, the tests take quite a lot of time: ``` [ok]: Redis can trigger resizing (1860 ms) [ok]: Redis can rewind and trigger smaller slot resizing (744 ms) ``` before #12819: ``` [ok]: Redis can trigger resizing (309 ms) [ok]: Redis can rewind and trigger smaller slot resizing (295 ms) ``` failure: https://github.com/redis/redis/actions/runs/7704158180/job/20995931735 ``` *** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl scan didn't handle slot skipping logic. *** [err]: Redis can trigger resizing in tests/unit/other.tcl Expected '[Dictionary HT] Hash table 0 stats (main hash table): table size: 128 number of elements: 5 [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ' to match '*table size: 8*' (context: type eval line 29 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test) *** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl Expected '[Dictionary HT] Hash table 0 stats (main hash table): table size: 256 number of elements: 10 [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ' to match '*table size: 16*' (context: type eval line 27 cmd {assert_match "*table size: 16*" [r debug HTSTATS 0]} proc ::test) ```
Ci report this failure: ``` *** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl Expected '4098' to equal or match '4002' WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083) ``` It can be seen from the log that used_memory changed before we set maxmemory. The reason is that in redis#12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we set maxmemory, causing the test to fail. The fix is to disable resize before reaching 1:1 raido, and wait until maxmemory is set before enabling resize.
Ci report this failure: ``` *** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl Expected '4098' to equal or match '4002' WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083) ``` It can be seen from the log that used_memory changed before we set maxmemory. The reason is that in #12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we set maxmemory, causing the test to fail. Before setting maxmemory, we only add 4095 keys to avoid triggering resize.
|
hi @CharlesChen888 a few comments:
|
|
|
note that other than the difference of deleting all keys instead of most keys, and the difference in measuring memory instead of bucket count, these two tests also differ in the fact one of them runs in cluster mode and the other without. |
|
@CharlesChen888 can you try doing what i suggested above? |
|
@oranagra just to make sure, you are suggesting adding tests, so we can test that the new code work for both empty and non-empty dicts? |
|
@CharlesChen888 IIRC i'm suggesting to delete the new test you added and instead run the other test twice (once when it removes most of the keys in the dict and once when it removes all of them) you can also take this opportunity to mirror the |
The reason is the same as redis#13016. The reason is that in redis#12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we trigger the bgsave since we do have the enough keys (4096) to hit the radio. Before the bgsave, we only add 4095 keys to avoid this issue. Also removing local-process, don't remember we are using it.
The reason is the same as #13016. The reason is that in #12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we trigger the bgsave since we do have the enough keys (4096) to hit the radio. Before the bgsave, we only add 4095 keys to avoid this issue.
When we insert entries into dict, it may autonomously expand if needed. However, when we delete entries from dict, it doesn't shrink to the proper size. If there are few entries in a very large dict, it may cause huge waste of memory and inefficiency when iterating. The main keyspace dicts (keys and expires), are shrinked by cron (`tryResizeHashTables` calls `htNeedsResize` and `dictResize`), And some data structures such as zset and hash also do that (call `htNeedsResize`) right after a loop of calls to `dictDelete`, But many other dicts are completely missing that call (they can only expand). In this PR, we provide the ability to automatically shrink the dict when deleting. The conditions triggering the shrinking is the same as `htNeedsResize` used to have. i.e. we expand when we're over 100% utilization, and shrink when we're below 10% utilization. Additionally: * Add `dictPauseAutoResize` so that flows that do mass deletions, will only trigger shrinkage at the end. * Rename `dictResize` to `dictShrinkToFit` (same logic as it used to have, but better name describing it) * Rename `_dictExpand` to `_dictResize` (same logic as it used to have, but better name describing it) related to discussion redis#12819 (comment) --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
…is#12819) The function `tryResizeHashTables` only attempts to shrink the dicts that has keys (change from redis#11695), this was a serious problem until the change in redis#12850 since it meant if all keys are deleted, we won't shrink the dick. But still, both dictShrink and dictExpand may be blocked by a fork child process, therefore, the cron job needs to perform both dictShrink and dictExpand, for not just non-empty dicts, but all dicts in DBs. What this PR does: 1. Try to resize all dicts in DBs (not just non-empty ones, as it was since redis#12850) 2. handle both shrink and expand (not just shrink, as it was since forever) 3. Refactor some APIs about dict resizing (get rid of `htNeedsShrink` `htNeedsShrink` `dictShrinkToFit`, and expose `dictShrinkIfNeeded` `dictExpandIfNeeded` which already contains all the code of those functions we get rid of, to make APIs more neat) 4. In the `Don't rehash if redis has child process` test, now that cron would do resizing, we no longer need to write to DB after the child process got killed, and can wait for the cron to expand the hash table.
…redis#13009) tests consistently fail on timeout (sleep that's too short). it now takes more time because in redis#12819 we iterate on all dicts, not just non-empty ones. it passed the PR's CI because it skips the `slow` tag, which might have been misplaced, but now it is probably required. with the fix, the tests take quite a lot of time: ``` [ok]: Redis can trigger resizing (1860 ms) [ok]: Redis can rewind and trigger smaller slot resizing (744 ms) ``` before redis#12819: ``` [ok]: Redis can trigger resizing (309 ms) [ok]: Redis can rewind and trigger smaller slot resizing (295 ms) ``` failure: https://github.com/redis/redis/actions/runs/7704158180/job/20995931735 ``` *** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl scan didn't handle slot skipping logic. *** [err]: Redis can trigger resizing in tests/unit/other.tcl Expected '[Dictionary HT] Hash table 0 stats (main hash table): table size: 128 number of elements: 5 [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ' to match '*table size: 8*' (context: type eval line 29 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test) *** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl Expected '[Dictionary HT] Hash table 0 stats (main hash table): table size: 256 number of elements: 10 [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ' to match '*table size: 16*' (context: type eval line 27 cmd {assert_match "*table size: 16*" [r debug HTSTATS 0]} proc ::test) ```
Ci report this failure: ``` *** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl Expected '4098' to equal or match '4002' WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083) ``` It can be seen from the log that used_memory changed before we set maxmemory. The reason is that in redis#12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we set maxmemory, causing the test to fail. Before setting maxmemory, we only add 4095 keys to avoid triggering resize.
The reason is the same as redis#13016. The reason is that in redis#12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we trigger the bgsave since we do have the enough keys (4096) to hit the radio. Before the bgsave, we only add 4095 keys to avoid this issue.
When we insert entries into dict, it may autonomously expand if needed. However, when we delete entries from dict, it doesn't shrink to the proper size. If there are few entries in a very large dict, it may cause huge waste of memory and inefficiency when iterating. The main keyspace dicts (keys and expires), are shrinked by cron (`tryResizeHashTables` calls `htNeedsResize` and `dictResize`), And some data structures such as zset and hash also do that (call `htNeedsResize`) right after a loop of calls to `dictDelete`, But many other dicts are completely missing that call (they can only expand). In this PR, we provide the ability to automatically shrink the dict when deleting. The conditions triggering the shrinking is the same as `htNeedsResize` used to have. i.e. we expand when we're over 100% utilization, and shrink when we're below 10% utilization. Additionally: * Add `dictPauseAutoResize` so that flows that do mass deletions, will only trigger shrinkage at the end. * Rename `dictResize` to `dictShrinkToFit` (same logic as it used to have, but better name describing it) * Rename `_dictExpand` to `_dictResize` (same logic as it used to have, but better name describing it) related to discussion redis#12819 (comment) --------- Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
…is#12819) The function `tryResizeHashTables` only attempts to shrink the dicts that has keys (change from redis#11695), this was a serious problem until the change in redis#12850 since it meant if all keys are deleted, we won't shrink the dick. But still, both dictShrink and dictExpand may be blocked by a fork child process, therefore, the cron job needs to perform both dictShrink and dictExpand, for not just non-empty dicts, but all dicts in DBs. What this PR does: 1. Try to resize all dicts in DBs (not just non-empty ones, as it was since redis#12850) 2. handle both shrink and expand (not just shrink, as it was since forever) 3. Refactor some APIs about dict resizing (get rid of `htNeedsShrink` `htNeedsShrink` `dictShrinkToFit`, and expose `dictShrinkIfNeeded` `dictExpandIfNeeded` which already contains all the code of those functions we get rid of, to make APIs more neat) 4. In the `Don't rehash if redis has child process` test, now that cron would do resizing, we no longer need to write to DB after the child process got killed, and can wait for the cron to expand the hash table.
…redis#13009) tests consistently fail on timeout (sleep that's too short). it now takes more time because in redis#12819 we iterate on all dicts, not just non-empty ones. it passed the PR's CI because it skips the `slow` tag, which might have been misplaced, but now it is probably required. with the fix, the tests take quite a lot of time: ``` [ok]: Redis can trigger resizing (1860 ms) [ok]: Redis can rewind and trigger smaller slot resizing (744 ms) ``` before redis#12819: ``` [ok]: Redis can trigger resizing (309 ms) [ok]: Redis can rewind and trigger smaller slot resizing (295 ms) ``` failure: https://github.com/redis/redis/actions/runs/7704158180/job/20995931735 ``` *** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl scan didn't handle slot skipping logic. *** [err]: Redis can trigger resizing in tests/unit/other.tcl Expected '[Dictionary HT] Hash table 0 stats (main hash table): table size: 128 number of elements: 5 [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ' to match '*table size: 8*' (context: type eval line 29 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test) *** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl Expected '[Dictionary HT] Hash table 0 stats (main hash table): table size: 256 number of elements: 10 [Expires HT] Hash table 0 stats (main hash table): No stats available for empty dictionaries ' to match '*table size: 16*' (context: type eval line 27 cmd {assert_match "*table size: 16*" [r debug HTSTATS 0]} proc ::test) ```
Ci report this failure: ``` *** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl Expected '4098' to equal or match '4002' WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083) ``` It can be seen from the log that used_memory changed before we set maxmemory. The reason is that in redis#12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we set maxmemory, causing the test to fail. Before setting maxmemory, we only add 4095 keys to avoid triggering resize.
The reason is the same as redis#13016. The reason is that in redis#12819, in cron, in addition to trying to shrink, we will also tyring to expand. The dict was expanded by cron before we trigger the bgsave since we do have the enough keys (4096) to hit the radio. Before the bgsave, we only add 4095 keys to avoid this issue.
The function
tryResizeHashTablesonly attempts to shrink the dicts that has keys (change from #11695), this was a serious problem until the change in #12850 since it meant if all keys are deleted, we won't shrink the dick.But still, both dictShrink and dictExpand may be blocked by a fork child process, therefore, the cron job needs to perform both dictShrink and dictExpand, for not just non-empty dicts, but all dicts in DBs.
What this PR does:
htNeedsShrinkhtNeedsShrinkdictShrinkToFit, and exposedictShrinkIfNeededdictExpandIfNeededwhich already contains all the code of those functions we get rid of, to make APIs more neat)Don't rehash if redis has child processtest, now that cron would do resizing, we no longer need to write to DB after the child process got killed, and can wait for the cron to expand the hash table.