Fix resize hash table dictionary iterator #12660

hpatro · 2023-10-16T22:18:23Z

Dictionary iterator logic in the tryResizeHashTables method is picking the next (incorrect) dictionary while the cursor is at a given slot. This could lead to some dictionary/slot getting skipped from resizing.

Saw failure in recent run: d27c741

Instead of affirming for resize message, rather validate for the final dictionary size.

On freebsd:
https://github.com/redis/redis/actions/runs/6540680776/job/17761013478
https://pipelinesghubeus22.actions.githubusercontent.com/2oDd4EuUudJqGKlOAB2KXZpKHTtseqbUa63unZUQGqUSgXNthI/_apis/pipelines/1/runs/40953/signedlogcontent/14?urlExpires=2023-10-17T02%3A07%3A01.6196538Z&urlSigningMethod=HMACV1&urlSignature=yOYvJogzR%2FbpT02XI2vr2Zc0DGBprgft3zKjtIIpfAk%3D

2023-10-17T00:11:35.6970430Z [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
2023-10-17T00:11:35.7072700Z Expected '102' to be equal to '2' (context: type eval line 17 cmd {assert_equal 102 [r dbsize]} proc ::test)

Removed the no. of keys/dbsize assertion

problem introduced recently in #11695

tests/unit/expire.tcl

hpatro · 2023-10-18T01:29:41Z

@oranagra I've removed the assertions around rehashing (it's causing the flakiness). The remaining validation still covers the expiration skip logic if there are too many empty buckets and then upon resize it gets cleaned up.

oranagra

it feels a bit odd that this test talks so much about rehashing, but doesn't validate any of that.
however, i suppose it still covers it's purpose, and any rehashing checks can add coupling with dict internals.
it looks like the test doesn't depend on rehashing, but rather the luck of rehashing, so maybe that's ok.
i'm gonna merge it now since the CI failures are painful. we can iterate over it in a follow up PR if we want.

oranagra · 2023-10-18T11:07:57Z

looks like this test still has issues:
https://github.com/redis/redis/actions/runs/6559891739/job/17816423222#step:6:98

oranagra · 2023-10-18T11:16:48Z

still fails with valgrind, even with my increased timeout.
@hpatro can you look into it?

oranagra · 2023-10-18T12:11:02Z

@hpatro did you look into the new failures of the defrag tests?

hpatro · 2023-10-18T15:49:11Z

@hpatro did you look into the new failures of the defrag tests?

Yes, I'm looking at it.

hpatro · 2023-10-18T18:04:30Z

@madolson Could you please trigger a daily run on this branch ?

madolson · 2023-10-18T18:45:31Z

Test run: https://github.com/redis/redis/actions/runs/6565221591

hpatro · 2023-10-18T20:40:53Z

@oranagra @madolson Could we disable some of the defrag tests to avoid repetitive failures on daily run ?
#12672

hpatro · 2023-10-19T02:52:09Z

Thanks @madolson for the re-run. Recent most run with increased timeout:

Valgrind test also succeeded.

https://github.com/redis/redis/actions/runs/6567848801

@oranagra Please have a look.

oranagra

so it wasn't just a test issue..
please have a look to make sure other places with similar pattern don't have this issue.
thanks,

oranagra · 2023-10-23T07:23:41Z

i see this test still fails in freebsd CI (which is notoriously slow)

*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
  scan didn't handle slot skipping logic.

https://github.com/redis/redis/actions/runs/6607001072/job/17943822780#step:4:8901
@hpatro can you take a look?

hpatro · 2023-10-23T17:20:22Z

@oranagra Taking a look.

hpatro · 2023-10-23T19:51:11Z

@oranagra The thought which I've from reading the code is

https://github.com/redis/redis/blob/unstable/tests/unit/expire.tcl#L858-L861

        # delete data to have lot's (99%) of empty buckets (slot 12182 should be skipped)
        for {set j 1} {$j <= 99} {incr j} {
            r del "{foo}$j"
        }

If the above delete operation(s) takes longer than 500ms (TTL set on the keys), empty buckets wouldn't have been formed and the large no. of empty bucket skip logic won't be valid and all the keys would get expired as part of the expiry scan.

Does freebsd runs on smaller/slower instance due to some reason?

hpatro · 2023-10-23T20:22:36Z

@oranagra I've put a fix here #12689. PTAL. (hopefully it's the last one 😶‍🌫️)

oranagra · 2023-11-05T13:20:20Z

i see the test sometimes fails on valgrind
https://github.com/redis/redis/actions/runs/6757912207/job/18368950839#step:6:1525
maybe 20 seconds isn't enough?

hpatro · 2023-11-07T22:16:48Z

@oranagra Let me take a look if there are any other potential issue or else will submit a PR with an increased wait.

oranagra · 2023-11-09T09:50:35Z

FYI: it keeps failing on valgrind
https://github.com/redis/redis/actions/runs/6805492700/job/18505125252

hpatro · 2023-11-10T21:51:17Z

@oranagra #12752

Dictionary iterator logic in the `tryResizeHashTables` method is picking the next (incorrect) dictionary while the cursor is at a given slot. This could lead to some dictionary/slot getting skipped from resizing. Also stabilize the test. problem introduced recently in redis#11695

Make expiry.tcl test check more accurate

0f958a1

hpatro requested a review from madolson October 16, 2023 22:18

Remove dbsize assertion on volatile keys

0df35b8

oranagra reviewed Oct 17, 2023

View reviewed changes

tests/unit/expire.tcl Show resolved Hide resolved

tests/unit/expire.tcl Outdated Show resolved Hide resolved

Remove rehashing validation to avoid flakiness

cd9e428

oranagra approved these changes Oct 18, 2023

View reviewed changes

increase timeout

69dda7e

Fix resize table non empty dict iteration

44cf163

hpatro changed the title ~~Make expiry.tcl test check more accurate~~ Fix resize hash table dictionary iterator Oct 18, 2023

Increase timeout

08929fb

oranagra approved these changes Oct 19, 2023

View reviewed changes

oranagra merged commit f3bf848 into redis:unstable Oct 19, 2023

Fix resize hash table dictionary iterator #12660

Fix resize hash table dictionary iterator #12660

Uh oh!

Conversation

hpatro commented Oct 16, 2023 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hpatro commented Oct 18, 2023

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Oct 18, 2023

Uh oh!

oranagra commented Oct 18, 2023

Uh oh!

oranagra commented Oct 18, 2023

Uh oh!

hpatro commented Oct 18, 2023

Uh oh!

hpatro commented Oct 18, 2023

Uh oh!

madolson commented Oct 18, 2023

Uh oh!

hpatro commented Oct 18, 2023

Uh oh!

hpatro commented Oct 19, 2023

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Oct 23, 2023

Uh oh!

hpatro commented Oct 23, 2023

Uh oh!

hpatro commented Oct 23, 2023

Uh oh!

hpatro commented Oct 23, 2023

Uh oh!

oranagra commented Nov 5, 2023

Uh oh!

hpatro commented Nov 7, 2023

Uh oh!

oranagra commented Nov 9, 2023

Uh oh!

hpatro commented Nov 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hpatro commented Oct 16, 2023 •

edited by oranagra

Loading