Skip to content

Conversation

@hpatro
Copy link
Contributor

@hpatro hpatro commented Oct 16, 2023

Dictionary iterator logic in the tryResizeHashTables method is picking the next (incorrect) dictionary while the cursor is at a given slot. This could lead to some dictionary/slot getting skipped from resizing.

Saw failure in recent run: d27c741

Instead of affirming for resize message, rather validate for the final dictionary size.

On freebsd:
https://github.com/redis/redis/actions/runs/6540680776/job/17761013478
https://pipelinesghubeus22.actions.githubusercontent.com/2oDd4EuUudJqGKlOAB2KXZpKHTtseqbUa63unZUQGqUSgXNthI/_apis/pipelines/1/runs/40953/signedlogcontent/14?urlExpires=2023-10-17T02%3A07%3A01.6196538Z&urlSigningMethod=HMACV1&urlSignature=yOYvJogzR%2FbpT02XI2vr2Zc0DGBprgft3zKjtIIpfAk%3D

2023-10-17T00:11:35.6970430Z [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
2023-10-17T00:11:35.7072700Z Expected '102' to be equal to '2' (context: type eval line 17 cmd {assert_equal 102 [r dbsize]} proc ::test)

Removed the no. of keys/dbsize assertion

problem introduced recently in #11695

@hpatro hpatro requested a review from madolson October 16, 2023 22:18
@hpatro
Copy link
Contributor Author

hpatro commented Oct 18, 2023

@oranagra I've removed the assertions around rehashing (it's causing the flakiness). The remaining validation still covers the expiration skip logic if there are too many empty buckets and then upon resize it gets cleaned up.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels a bit odd that this test talks so much about rehashing, but doesn't validate any of that.
however, i suppose it still covers it's purpose, and any rehashing checks can add coupling with dict internals.
it looks like the test doesn't depend on rehashing, but rather the luck of rehashing, so maybe that's ok.
i'm gonna merge it now since the CI failures are painful. we can iterate over it in a follow up PR if we want.

@oranagra
Copy link
Member

looks like this test still has issues:
https://github.com/redis/redis/actions/runs/6559891739/job/17816423222#step:6:98

@oranagra
Copy link
Member

still fails with valgrind, even with my increased timeout.
@hpatro can you look into it?

@oranagra
Copy link
Member

@hpatro did you look into the new failures of the defrag tests?

@hpatro
Copy link
Contributor Author

hpatro commented Oct 18, 2023

@hpatro did you look into the new failures of the defrag tests?

Yes, I'm looking at it.

@hpatro hpatro changed the title Make expiry.tcl test check more accurate Fix resize hash table dictionary iterator Oct 18, 2023
@hpatro
Copy link
Contributor Author

hpatro commented Oct 18, 2023

@madolson Could you please trigger a daily run on this branch ?

@madolson
Copy link
Contributor

@hpatro
Copy link
Contributor Author

hpatro commented Oct 18, 2023

@oranagra @madolson Could we disable some of the defrag tests to avoid repetitive failures on daily run ?
#12672

@hpatro
Copy link
Contributor Author

hpatro commented Oct 19, 2023

Thanks @madolson for the re-run. Recent most run with increased timeout:

Valgrind test also succeeded.

https://github.com/redis/redis/actions/runs/6567848801

@oranagra Please have a look.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it wasn't just a test issue..
please have a look to make sure other places with similar pattern don't have this issue.
thanks,

@oranagra oranagra merged commit f3bf848 into redis:unstable Oct 19, 2023
@oranagra
Copy link
Member

i see this test still fails in freebsd CI (which is notoriously slow)

*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
  scan didn't handle slot skipping logic.

https://github.com/redis/redis/actions/runs/6607001072/job/17943822780#step:4:8901
@hpatro can you take a look?

@hpatro
Copy link
Contributor Author

hpatro commented Oct 23, 2023

@oranagra Taking a look.

@hpatro
Copy link
Contributor Author

hpatro commented Oct 23, 2023

@oranagra The thought which I've from reading the code is

https://github.com/redis/redis/blob/unstable/tests/unit/expire.tcl#L858-L861

        # delete data to have lot's (99%) of empty buckets (slot 12182 should be skipped)
        for {set j 1} {$j <= 99} {incr j} {
            r del "{foo}$j"
        }

If the above delete operation(s) takes longer than 500ms (TTL set on the keys), empty buckets wouldn't have been formed and the large no. of empty bucket skip logic won't be valid and all the keys would get expired as part of the expiry scan.

Does freebsd runs on smaller/slower instance due to some reason?

@hpatro
Copy link
Contributor Author

hpatro commented Oct 23, 2023

@oranagra I've put a fix here #12689. PTAL. (hopefully it's the last one 😶‍🌫️)

@oranagra
Copy link
Member

oranagra commented Nov 5, 2023

i see the test sometimes fails on valgrind
https://github.com/redis/redis/actions/runs/6757912207/job/18368950839#step:6:1525
maybe 20 seconds isn't enough?

@hpatro
Copy link
Contributor Author

hpatro commented Nov 7, 2023

@oranagra Let me take a look if there are any other potential issue or else will submit a PR with an increased wait.

@oranagra
Copy link
Member

oranagra commented Nov 9, 2023

FYI: it keeps failing on valgrind
https://github.com/redis/redis/actions/runs/6805492700/job/18505125252

@hpatro
Copy link
Contributor Author

hpatro commented Nov 10, 2023

@oranagra #12752

enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 25, 2025
Dictionary iterator logic in the `tryResizeHashTables` method is picking the next
(incorrect) dictionary while the cursor is at a given slot. This could lead to some
dictionary/slot getting skipped from resizing.

Also stabilize the test.

problem introduced recently in redis#11695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants