-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Use Reservoir Sampling for random sampling of dict, and fix hang during fork #12276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat! LGTM.
calling @inbaryuval to comment about the algorithm fairness.
@sundb don't we also wanna fix the rehash condition that we discussed here #12200 (comment)
i'd imagine that needs to be backported to anywhere were the issue was taken to, but considering the fix here is rather safe, maybe we can backport that as well.
|
btw, maybe it would be nice to create a test (one that takes the dict to the extreme using SCAN+DEL) and checks the random fairness. |
Because adding this test requires a fix for #12200 (comment) first, but I think this fix for #issuecomment-1563003032 should require a separate PR for backport, anyway I'll add the test in this PR first. |
|
@oranagra Added the test, please wait for me to run it for several hours. |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't review the test yet
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've added more details to the top comment, feel free to fix / extend.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
|
@sundb this is ready right? i can remove the |
|
the top comment is ok and it's ready. |
|
For the reference, there are more efficient sampling algorithms than reservoir sampling - see here. I'm not sure if it worth the trouble here. |
|
@LiorKogan Thank you for figue out. |
This was introduced by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Co-authored-by: Oran Agra <oran@redislabs.com> This is a partial cherry pick of: (cherry picked from commit b00a235) (cherry picked from commit d4c37320382edb342292a3e30250d46896a12016)
This was introduced by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Co-authored-by: Oran Agra <oran@redislabs.com> This is a partial cherry pick of: (cherry picked from commit b00a235) (cherry picked from commit d4c37320382edb342292a3e30250d46896a12016)
…ng fork (#12276) ## Issue: When a dict has a long chain or the length of the chain is longer than the number of samples, we will never be able to sample the elements at the end of the chain using dictGetSomeKeys(). This could mean that SRANDMEMBER can be hang in and endless loop. The most severe case, is the pathological case of when someone uses SCAN+DEL or SSCAN+SREM creating an unevenly distributed dict. This was amplified by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork. ## Solution 1. Before, we will stop sampling when we reach the maximum number of samples, even if there is more data after the current chain. Now when we reach the maximum we use the Reservoir Sampling algorithm to fairly sample the end of the chain that cannot be sampled 2. Fix the rehashing code, so that the same as it allows rehashing for up-sizing during fork when the ratio is extreme, it will allow it for down-sizing as well. Issue was introduced (or became more severe) by #11692 Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit b00a235)
Issue
When a dict has a long chain or the length of the chain is longer than the number of samples, we will never be able to sample the elements at the end of the chain using dictGetSomeKeys().
This means that there's an unfair random sampling affecting: RANDOMKEY, SRANDMEMBER, SPOP, ZRANDMEMBER, HRANDFIELD, and the eviction mechanism.
Severe issue
This could mean that SRANDMEMBER, ZRANDMEMBER and HRANDFIELD (when used with
<count>) can be hang in and endless loop.The most severe case, is the pathological case of when someone uses SCAN+DEL or SSCAN+SREM creating an unevenly distributed dict.
Recent regression
The above was amplified by the recent change in #11692 which prevented a down-sizing rehashing while there is a fork.
So in a pathological case of SSCAN+SREM causing an uneven hash table, followed by down-sizing of the hash table, if the rehashing would be suspended during a fork, and then calling SRANDMEMBER with
<count>can lead to a hang.Solution
Now when we reach the maximum we use the Reservoir Sampling algorithm to fairly sample the end of the chain that cannot be sampled
Fix #12200