Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jun 3, 2024

this PR fixes two crashes:

  1. Fix missing slotToKeyInit() when using flushdb async under cluster mode.
    [CRASH] Redis 7.2.3 crashed in slotToKeyReplaceEntry #13205

  2. Fix missing expires_cursor reset when stopping active defrag in the middle of defragment.
    [CRASH] Redis 7.2.x crashes in activeDefragCycle when activedefrag disabled while running and re-enabled #13307
    If we stop active defrag in the middle of defragging db->expires, if expires_cursor is not reset to 0, the next time we enable active defrag again, defragLaterStep(db, ...) will be entered. However, at this time, db has been reset to NULL, which results in crash.

The affected code were removed by #11695 and #13058 in usntable, so we just need backport this to 7.2.

@sundb sundb requested a review from oranagra June 3, 2024 04:19
@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Jun 3, 2024

run_solo {defrag} {
start_server {tags {"defrag external:skip"} overrides {appendonly yes auto-aof-rewrite-percentage 0 save ""}} {
proc test_active_defrag {type} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now all the defrag tests are running twice?
isn't that a little excessive?
if we keep it, or find a better one, do we want to reflect that change in unstable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but these two are not the same, one is for cluster, and another is for standalone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know.
i'm saying that in the past we had a bunch of tests: [plain, aof, large keys, edge case, eval]
and now we run nearly all of them twice.
i think it's excessive, it's probably enough to just run one or two of them twice, or just add a dedicated test for cluster defrag (which i think we already have in unstable)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just add a dedicated test for cluster defrag (which i think we already have in unstable)

i'll do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundb so to be sure: with this PR getting merged, the tests in both 7.2 and 7.4 are now able to detect a bug similar to the one being fixed here (which doesn't exist in 7.4)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, 7.4 doesn't covert it unless we add the same test for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, now i actually looked at the test code:

                # It repeatedly enables and disables active defragmentation,
                # and checks if it crashes, see issue #13307.

well, i suppose it's pointless to test the expires_cursor reset thing, since that code is now simplified.

and the flushdb async bug isn't really related to the defrag, so if we wanted, we could have added another simpler explicit check for it.
truth be told, that emptyDbAsync does have some duplicated logic in it that can someday get out of sync.
but i suppose that's completely unrelated to this PR.

@stevelipinski
Copy link
Contributor

Is there a plan to get a 7.2.x release out with this fix (for issues #13205 and #13307) included?

@sundb
Copy link
Collaborator Author

sundb commented Jul 20, 2024

@stevelipinski yes, this fix will appear in 7.2.6.

@sundb sundb deleted the init_slot_emptydb_async branch July 20, 2024 05:14
@stevelipinski
Copy link
Contributor

Thanks - any ETA on timeframe for 7.2.6 release?

@stevelipinski
Copy link
Contributor

can we request a 7.2.6 be released?

@sundb
Copy link
Collaborator Author

sundb commented Sep 19, 2024

@stevelipinski sorry for late reply, we are not sure about the release date yet, but i will call you if any news, thanks.

@oranagra oranagra mentioned this pull request Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

3 participants